This document discusses managing open source communities and projects. It notes that open source communities involve not just developers but also users, installers, documentation writers, and support staff. Contributions come from new code, bug fixes, documentation, training materials, and feature requests. Projects need coordination, communication through mailing lists and meetings, and quality assurance through testing. Both incentives like acknowledging contributions and treats like involvement opportunities help encourage participation and "herd the cats" of an open source community.
2. Herding Cats
• Managing an open source
community is like herding
cats.
• ‘Cat, come with me.'
'Nenni!' said the Cat. 'I am
the Cat who walks by
himself, and all places are
alike to me. I will not come.
But all the same, he
followed' (Rudyard Kipling,
Just So Stories)
3. A brief history of open source software
– EMACS
– Linux
– GPL
– Apache
4. Open source licensing
•
•
•
•
•
Source code provided
Users free to inspect, modify and redistribute
Restrictions may be applied
Freedoms may be guaranteed
Several licenses may be combined
– If they are compatible
5. GNU General Public License
• Originally written for the EMACS editor and
the GNU project
• Based on copyleft
– Copyright holder usually restricts rights
– In GPL, copyright holder requires all further
distributions to ensure free access
– No further restrictions may be imposed
– “Free as in speech, not as in beer”
6. Lesser General Public License
• The full GNU General Public License makes it
difficult to combine with other licenses as the
whole binary is covered by GPL.
• The Lesser (Library) GPL only preserves the
interface and requires LGPL library source
code to be made available.
• Applications can be under any license
• GPL code requires unlinked interfaces (APIs)
7. Other Open Source Licenses
• Apache 2.0 allows modified code to use
another license (including proprietary).
“Indemnity clause” can be scary but is safe
• Perl artistic license has issues with
redistributed code
• BSD license imposed a “restriction” requiring
citing the original authors, usually removed in
several “modified BSD” versions
8. A brief history of bioinformatics
• 1980 Staden package
– in support of Fred Sanger
• 1982 EMBL/GenBank
– Free sequence databases, later also SwissProt
• 1984 Genetics Computer Group
– Free (initially) sequence analysis package
• 1990 Sequence Retrieval System
• 1990 BLAST
• 1997 EMBOSS
9. Copyright and Ownership
• The Staden package was developed from 1987
to 2003 by Rodger Staden at the MRC-funded
Laboratory for Molecular Biology
• To get a copy of the software, users mailed a
cheque for £100 to the Medical Research
Council
• In 2003, renewal of funding was rejected
10. Copyright and Ownership
• The software was still owned by the funders
• The authors had no right to apply for
alternative funding
• … nor did anyone else
• Two years later it was formally re-released as
open source, but developers had left.
11. Multiple licensing
• The HMMER package provides standard
Hidden Markov Model applications for
multiple alignments of protein sequences
• HMMER 2 had a dual licensing model
– GNU General Public License
– Commercial license
• Only one of these can include third-party
contributions. The commercial license cannot.
12. From academia to commercial
• The Sequence Retrieval System was developed
by Thure Etzold as a PhD project, then at
EMBL Heidelberg and the European
Bioinformatics Institute.
• LION Bioscience in Cambridge started up to
maintain and develop SRS commercially
• LION merged with competitors (e.g.
NetGenics)
13. From academia to commercial
• NetGenics software was withdrawn
– Customers had to purchase an SRS license instead
• LION merged with BioWisdom
• BioWisdom merged with Instem
• Lesson: commercial software is high quality,
well supported, but can disappear at any time.
• Open source software avoids this risk
14. Branching
• BLAST was developed at NCBI as a successor
to FASTA
• Development split into BLAST and WU-BLAST
(Washington University) providing new
features
• WU-BLAST in turn became commercial ABBLAST
15. Competition
• BLAST and the NCBI Toolkit were an early
example of open source bioinformatics
• Most software at the time was commercial
• In 1990 the commercial providers wrote to
Congress asking for withdrawal of funding for
NCBI software because it competed with US
industry.
• They failed.
16. Competition
• The GCG package was developed by the
Genetics Computer Group at the University of
Wisconsin
• One of the most cited papers in biology
– If you change more than 25% of the code, you can
remove the GCG copyright
• Changed to an annual source code license
model
• Extensions (EGCG) distributed as source code
by EMBL Heidelberg and then by Sanger
17. Competition
• Social scientists have reported in detail on
GCG as an example of he development of
bioinformatics.
• Intelligenetics Inc objected to GCG’s unfair
competition
• Wisconsin spun off GCG Inc
• Software license fee doubled
• Usage continued
• EGCG developed to 50% of the GCG code base
18. Competition
•
•
•
•
•
•
•
GCG Inc looked for a new owner
Source code deemed to be their major asset
Source code distribution was withdrawn
Increased fee for source code
Very restrictive terms of distribution
EGCG was abandoned with 150 applications
EMBOSS written from scratch to replace both
– GPL/LGPL licensing
– Created by the former EGCG community
19. Competition
• So, to summarise
– 1984 GCG started as open source
– 1990 Became GCG Inc
– 1997 Acquired by Oxford Molecular
– 2000 EMBOSS 1.0 released as open source
Harvey, M. and McMeekin, A. (2007) “Public or
Private Economies of Knowledge? Turbulence in the
Biological Sciences”
21. Managing an Open Source Community
• The developers are only the beginning
– Users
– Installers
– Technical authors
– Helpdesk and support
– Communication
– Quality assurance
– Competitors
22. Contributions by developers
• New source code, new functionality
• Maintaining source code
– Bug fixes, coding standards
• Interfaces
– APIs, third party integration
• Competititors (including open source)
– New features and functionality
– Integration and active collaboration
23. Contributions (continued)
• Branches
– Someone needs to merge branches
• Original developers should agree to help
• Often merged by others wanting to use new features
– Ideally, merge with a single core
– Useful to merge any set of branches
– Combine with test suite(s)
25. Contributions (continued)
• Documentation
– Users are good at writing/updating manuals
• Training
– Shared examples with public data and common
standards
• Support
• Feature requests
26. Hosting solutions
•
•
•
•
Git: Github etc.
Sourceforge
Open Bio Foundation
Locally hosted solutions:
– CVS or SubVersion
• Wiki
– Documentation: developers, users, installers
27. Coordination
• Projects need a coordinator
– Linux: Linus Torvalds
– Emacs: Richard Stallman
– GCG: John Devereux
– EMBOSS: Peter Rice
28. Coordination
• Maintaining a standard code base
– github.com/transmart
• Tracking branches and modified copies
elsewhere
• Selecting best solutions from available
branches
• Merging conflicting changes
• Continuous testing
29. Communication
• Community meetings (London, Amsterdam,
Paris, …) for developers and users
• Regular technical developer meetings / TCs
• Mailing lists
– Provide a useful archive
• Trackers (JIRA, Pivotal, …)
– Defining tasks/issues and resolving them
• Wiki
– Community documentation
30. Efficiency
• Quality assurance
– More tests are always helpful
• Automated documentation
– Creating screenshots from test outputs
• Create tests for documented examples
• Automated update when results change
• Ensure documented functionality still functions
31. Cat incentives
• In a small community, sanctions work
– Financial penalties for breaking the code
– Small fines for bugs
– Put back e.g. funding Xmas drinks
32. Cat treats
• Acknowledge contributions
• Benefit from sharing code in other branches
• Developers need to support one another
– Put out any flame wars
• Involve the user community
– Encourage non-developers to contribute
• Keep everything public
– Support the community
– Attract new cats