This document discusses key dimensions for sustaining a tool building community-driven effort based on experiences developing modeling tools. It covers onboarding users and contributors, governance models, community health analysis using graph techniques, and optimization strategies. The document advocates an entrepreneurial path for tool development by releasing prototypes as open source software and improving them for real use cases to build a community and offer commercial services.
4. Our mission
We are interested in the broad
area of systems and software
engineering, especially
promoting the rigorous use of
software models and
engineering principles in all
software engineering tasks
while keeping an eye on the
most unpredictable element in
any project: the people
involved in it.
Flickr/clement127
22. Researcher vs Practitioners
Happy with whatever
you throw at them
“Infinite” time (PhD
Students)
Used to low quality tools
Need documentation,
support, nice UIs…
They are “spoiled”
They don’t get why it’s so
complicated for us to build
good tools
THEY WILL NOT PAY/HELP
23. Force researchers to use your tool
• If the tool/lang/repo is a community effort
you can force people to use it
• Teaching is a good starting point (if you
provide the teaching material)
• Asking authors to provide the models as
replicability package or artefact evaluation
32. Unpaid / Junior programmers
• Should not be assigned any critical task
• Need a lot of supervision
• May end up generating useless code
• Disappear shortly after (IP issues!)
• Great for them, not so much for “you”
E.g. in Xatkit interns do not contribute
anything to the main repo. They work on
“lab repos”
33. 1 organization – N repos
• Research repos (“labs”) can
eventually be part of the
main tool (after rework)
• They can also stay and be
used as labs for the people
taking the risk
• Feedback from the tool can
generate labs to come up
with the right solution
34. Onboarding contributors from
industry
Unpopular opinion (1): You cannot get industrial users without a mature tool (docs,
UI, support,…)
Unpopular opinion (2): You cannot get a mature tool without an industrial contributor
35. Research-Industry collaboration models
• Direct Transfer contracts
• Industrial PhDs
• Large Consortium projects (e.g. EU ECSEL)
• Co-production models
• Industrial Research Labs
• …
They all work (sometimes)
39. Commercial open source business model
Release
prototype as
OSS
Improve it to
be used in a
“kind of” real
environment
Kickstart a
community
Create
commercial
services /
extensions
Maybe some of you want to try this path?
40. Example: Xatkit.com (OSS chatbot dev platform)
External DSL
– Tree
based
External DSL
– State
machine
Internal DSL Two DSLs
First paper Real chatbots
are not trees
And replies are
not just text
Users don’t want to
learn a new tool
+ many other useful feedback, e.g. target platforms, monitoring, trolls, bot generators
42. Is this model really viable?
• Evaluation of researchers should be more based on impact than on
“bean counting”
– All universities signed the DORA declaration but…
• Researchers need to be taught business, marketing and financial
skills
– And must want to invest time on acquiring these skills
• Not easy! (but none of the other collab models is that easy either)
53. A DSL for governance rules
Enabling the Definition and Enforcement of Governance Rules in Open Source Systems. Javier Luis Cánovas Izquierdo, Jordi Cabot:
ICSE (2) 2015: 505-514
54. Project myProject {
Roles: Committers
Deadlines:
myDeadline : 7 days
Rules:
myMajorityRule :
Majority {
applied to Task
when TaskReview
people Committers
range Present
minVotes 3
deadline myDeadline
}
}
All the proposals for new development tasks will be
accepted or rejected in 7 days by the committers of the
project.
Verbalization
60. Undertanding Community = Graph Analysis
• Many types of graphs (e.g. Bipartite graphs)
• Many types of properties
– Micro-view (local properties)
– Macro-view (global properties)
– Meso-level (emerging properties)
• Analysis at different levels and on different
dimensions (e.g. non-code contributors!)
65. Bus Factor
“Number of key developers who would need to be
incapacitated (hit by a bus), to send the project into
disarray that it would not be able to proceed”
68. Nestedness -> Occasional contributors focus on the most frequently modified files. You
still need to “force” people to work on the rest (typically: backend or legacy or “not cool”
parts of the project)
Online division of labour: emergent structures in Open Source Software
Palazzi, Cabot, Cánovas, Solé-Ribalta &Borge-Holthoefer. Scientific Reports volume 9, Article number: 13890 (2019)
We have a soft side and this has helped us in trying also to understand the users of our tools
I’m one of those that only use feature models when writing a survey paper
Modeling tools we have developed
And as soon we started Building those tools, we started wonering these two questions
Nobody means:
Not the university
Even less the evaluation agencies? Have you seen a “tool impact factor section in any evaluation form”? A good tool is equivalent to how many journals? <- the question nobody is answering
The legend is because we’re forced to abandon many of them
Let’s now get into what I’ve learned in the process and what I can recommend.
But let’s take this presentation as an open discussion. Let’s all imagine we’re in the middle of a real coffee break and feel free to participate at any time
This is not a recipe of actions but a collection of discussion points that I hope you’ll find interesting
In 2016 I gave a talk on the sustainability of Papyrus trying to prevent Papyrus from dying. Unfortunately they are more in a Zombie State right now. Mostly due to the lack of industrial suport but also due to the fact they were not listening to the users
ATL was THE model transformation Language. Now only researchers still use it
The opposite is also true, if nobody cares about the Language then nobody cares about the tool...
UML situation is a combination of a complex language with a complex tool
UML is not going to disappear, too big to fall, but clearly it-s not going in the right direction
The three aspects are interrelated and just a way to try to decompose the problem.
They apply to any other OSS rtefact, being a tool, a repository or a Language spec
By the way this is a great opportunity for interdisciplinary research!!!!
But we cannot do it alone, we need powerful friends from political science, social science and ecology / complex systems.
Complex systems study how parts of a system give rise to the collective behaviors of the system, and how the system interacts with its environment
I’ll focus mostly on the onboarding one
People don’t choose a tool based on the quality of its code alone. It does it based also on the quality of the community (e.g. To get support)
Having industrial users is of course very interesting for all of us but they are challenging ones. So this is a decision you’ll need to make (and “pay the price”). Who are you targeting??
They are very demanding but hardly ever will help you (we get often feature request from people that has interesting domain names in the email address but hardly ever got anything useful from them)
The tòpic is complex. It’s even the core work of one of the latest nobel prize award winners
Do not be naïve. You must be proactive.
Yes you also need to cover the basics
if a project doesn’t make a good first impression, newcomers may wait a long time before giving it a second chance. Importance of good impression!
Up-for-grabs and good-first-bugs are curated tasks specifically for new contributors
This is also a discussion
State that we don’t propose to stop all these models but to propose a new one
There is a cost in integrating something in the main tool!!! Be sure you need it
WordPress uses this model -> they create plugins to experiment with new features they want to incorpórate. Then they decide whether the plugin stays as a plugin or gets merged in to the core
We do the same at Xatkit
Governance (to be discussed later) is key to decide each arrow
You could get some people from Innovation departments but not the real users
State that we don’t propose to stop all these models but to propose a new one
We still get feedback but it’s an indirect one
All these models share a common problem: they need to find the right company to work
1) Release the prototype as OSS.
2) Improve it to make it usable in real environments.
3) Aim to get free users to kickstart a community.
4) Try to get paying users by creating a commercial extension or services on top of the open-source core.
Learning speeds up in steps 3 and 4
In this journey you Will evaluate product-market fit, talk to users, test product under realistic conditions, …
Of couse, then new problems pop up (Iintellectual property?)
Huge gap between our first paper and the current version of Xatkit thanks to the real feedback
Also in your case, whatever you think it’s a good language could improve a lot if you manage to attract users (beyond your core community)
You can stop once you reach the plateau of diminishing returns (unless you actually want to go all the way to the end and créate the company)
It’s not for everybody
The typical reaction when I say this
And our second proposal is to have a closer look at the democracy models and see which ones could work best for open source. We have over 500 variants of democracy
Of course, once you choose one, you’ll need tool support to implement your democratic model
Democracy doesnt’ mean there are no clear responsabilities. Or that you cannot operate in an effective way
So what are we proposing?
2 things. First, to address transparency -> add to each project a governance.md file expliciting the governance rules of the project so that people know what to expect
Si és així es podria fins i tot automatitzar / assistir en la gestió del projecte. De fet tenim un plugin de Eclipse que via una eina que es diu Mylin es connecta a diversos issue and bug trackers per extreure aquesta informació, aplica les regles i actualitzar les issues.
Software Analysis is the tool we are going to use to understand what makes a project succeed. Key is to have the project itself as our target of study to learn about what it works and what it doesn’t <- New research field of software mining thanks to GitHub and its over > 30 M projects
Us posaré alguns exemples.
a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint sets {\displaystyle U} and {\displaystyle V} (that is, {\displaystyle U} and {\displaystyle V} are each independent sets) such that every edge connects a vertex in {\displaystyle U} to one in {\displaystyle V}. Vertex sets {\displaystyle U} and {\displaystyle V} are usually called the parts of the graph
Still, looking at raw community data is a mess. A good community analysis is not trivial to do.
Comment the meaning of the size of
Importància no només del codi sinó de la discussió al voltant del projecte, per exemple quines etiquetes es fan servir més
I qui s’encarrega de comentar-les / tancar-les (tipus “bus factor” però de la interacció amb els usuaris).
I will now show you three metrics that can be calculated on top of these graphs of data.
1- Bus facotr
Helps to assess the employee turnover risk
Identify the key developers
Measure the concentration of information
Positive evolution of WordPress vs Papyrus bus factor
Number of shortest paths that passes through a node. The more the higher betweenness centrality. It says if we lose nodes with high betweenness we fragment (or “delay”) the community
Related to clustering / subcommunities / modular classes algorithms
https://wiki.cs.umd.edu/cmsc734_09/index.php?title=Music_Artist_Collaborations_from_MusicBrainz
Only core people tackle the files nobody wants. Drive people to the files that nobody wants to modify