Presented at the Google diversity workshop.
Studying gender diversity in software development teams/communities requires understanding gender of individual developers. In this talk I will provide an overview of different ways of asking developers about their gender as well as inferring gender information from the ways they present themselves and artefacts they create. We conclude by discussing limitations of the inference techniques and surveying concerns related to their application.
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
Identifying Developers’ Gender: State of the Art
1. Alexander Serebrenik
Eindhoven U of Technology
The Netherlands
@aserebrenik
Identifying
Developers’ Gender:
State of the Art
1
This talk should be positioned in the broader context of studying gender diversity and its impact on software development process as well as on the outcome of the
software.
2. !2
Careful: gender is a complex social construct. Whatever one-word summary we plan to elicit from developers by whatever means, it will be necessarily an
oversimplification of gender identity and gender-related experiences of the individual.
Another word of warning is related to me continuously learning about gender and how to talk about gender. I am trying to pay attention to the words I choose but I might
make mistakes.
3. !3
There are two ways of inferring gender of an individual: either by asking them or by using some kind of algorithm based on the artefacts representing the individual
(name, profile picture) or created by them.
4. 4
Whatever technique we use, we should keep in mind that gender is privacy sensitive and should be treated as such. Open source contributors might be hiding their
gender on purpose, e.g., many women-developers prefer not to disclose their gender due to safety concerns. Some open source projects do not necessarily want us to
know the genders of their members (but some do!) and companies might be sensitive to this topic as well.
5. !5
So let us start with discussing how we can identify gender when talking to people, i.e., conducting interviews or surveys.
6. 2004
!6
A highly influential guide on questionnaire design (which I do recommend) published in 2004 recommends the two check boxes. Sex vs gender
7. 0.7-0.9%
!7
However, recent surveys of Stack Overflow and GitHub indicate that 0.7-0.9% of software developers do not identify as either men or women. This might not appear
much but
9. !9
Slightly better.
This approach has been criticised for othering, creating a sense of unease in participants, and will lead to people feeling disrespected: they might not complete surveys
and they might let fellow trans people know the research team doesn’t “get it”.
10. Bauer GR. Making sure everyone counts: considerations for inclusion, identification, and analysis of transgender and transsexual participants in health
surveys. In: Coen S, Banister E, editors. What a difference sex and gender make. Vancouver: Institute of Gender and Health, Canadian Institutes of Health
Research; 2012. pp. 59–67.
!10
A better approach has been advocated by Greta Bauer in 2012. Unfortunately, my favourite survey platform, Google forms, does not seem to support this combination of
an open answer in a multiple choice question unless it is the “other”.
11. Bauer GR. Making sure everyone counts: considerations for inclusion, identification, and analysis of transgender and transsexual participants in health
surveys. In: Coen S, Banister E, editors. What a difference sex and gender make. Vancouver: Institute of Gender and Health, Canadian Institutes of Health
Research; 2012. pp. 59–67.
Bauer GR, Braimoh J, Scheim AI, Dharma C (2017) Transgender-inclusive measures of sex/gender for population surveys: Mixed-methods evaluation and
recommendations. PLoS ONE 12(5): e0178043.
0
25
50
75
100
Male Female Other
Transfeminine (assigned male at birth, identify as women/non-binary)
Transmasculine (assigned female at birth, identify as men/non-binary)
!11
However, while this question was clear and easily answered by cisgender participants, it did not clearly identify birth-assigned sex or gender identity. In the interviews this
item was cognitively taxing for trans interview participants, who tried to figure out exactly what the researchers were asking, and reached different conclusions.
12. The GenIUSS Group. Best practices for asking questions to identify transgender and other gender minority respondents on population-based surveys. Herman
JL, editor. Los Angeles (CA): The Williams Institute; 2014 p. 1–68. !12
The GenIUSS group has proposed the schema on the slide. Some people would be offended by this phrasing as it implies that trans female are not necessarily female.
13. Do you consider yourself to be transgender?
() Yes
() No
() Questioning
Do you consider yourself to be gender non-conforming, gender diverse, gender variant, or gender expansive?
() Yes
() No
() Questioning
Are you intersex?
() Yes
() No
() I don't know
Where do you identify on the gender spectrum (check all that apply)?
[] Woman
[] Demi-girl
[] Man
[] Demi-boy
[] Non-binary
[] Demi-non-binary
[] Genderqueer
[] Genderflux
[] Genderfluid
[] Demi-fluid
[] Demi-gender
[] Bigender
[] Trigender
[] Two-Spirit
[] Multigender/polygender
[] Pangender/omnigender
[] Maxigender
[] Aporagender
[] Intergender
[] Maverique
[] Gender confusion/Gender f*ck
[] Gender indifferent
[] Graygender
[] Agender/genderless
[] Demi-agender
[] Genderless
[] Gender neutral
[] Neutrois
[] Androgynous
[] Androgyne
[] Prefer not to answer
[] Self Identify: _________________
Open
demographics
https://drnikki.github.io/open-demographics/questions/gender.html
!13
Good news: it separates a question about trans* and about gender non-conforming. Even more good news: woman/man instead of female/male; the former puts more
stress on identity as opposed to biology. More good news: the question explicitly refers to gender identity and avoids confusion reported for the survey instrument of
Bauer. And even more news: “check all that apply”, i.e., someone can be both woman and non-binary. Bad news: there are too many options and we want to keep
surveys (and particularly demographic parts of the surveys) short! Even more, some of these notions might be experienced as confusing or taxing.
Maverique (pronounced mav-reek) is a specific nonbinary gender identity "characterized by autonomy and inner conviction regarding a sense of self that is entirely
independent of male/masculinity, female/femininity or anything which derives from the two while still being neither without gender nor of a neutral gender." Maverique is
not close to a female or male gender, and is not like a mix of them; the identity goes beyond the entire scope of the gender binary or any identities within and outside of
it. Aporagender (from Greek apo, apor "separate" + "gender") is a nonbinary gender identity and umbrella term for "a gender separate from male, female, and anything in
between (unlike Androgyne) while still having a very strong and specific gendered feeling" (that is, not an absence of gender or agender). Neutrois is a non-binary gender
identity which is often associated with a "neutral" or "null" gender.
14. https://www.morgan-klaus.com/sigchi-gender-guidelines!14
A much better solution according to the HCI Guidelines for Gender Equity and Inclusivity is to ask an open ended question. This might be difficult for us as researchers to
process (code) but most of software engineering surveys are relatively small, a couple of hundreds of responses.
15. https://www.morgan-klaus.com/sigchi-gender-guidelines
Where do you identify on the gender spectrum (check all that apply)?
[] Woman
[] Man
[] Non-binary
[] Prefer not to disclose
[] Self Identify: _________________
!15
If you really want to run a huge survey and manual coding of answers is not an option, then the same HCI Guidelines for Gender Equity and Inclusivity recommend the
following phrasing. Also here, notice that *multiple* options are possible, the respondents have means not to disclose their gender or to provide their own response.
16. 10-20%
!16
However, whatever survey techniques we use and however we ask the questions, two open problems remain: scale of the data and lack of response. This is a problem if
we want to perform a large scale data analysis to tease out minor effects using traditional statistical techniques, since to apply these techniques we need a lot of data to
ensure the power of statistical tests. Our recent study that Bogdan is going to talk about tomorrow has involved ~60K individuals. To get this number we will need to
survey ~300K-600K developers; if everyone spams 600K respondents the respondents will be even more fed up with us and will not answer our questions…
18. Self-presentation
+
Artefacts created
Gender
!18
All these tools are based on the main assumption, namely, that gender can be inferred from the way developers present themselves (username, name, avatar) or artefacts
they produce (code, comments, etc.)
19. 19
Basically many of the gender detection techniques look at the names. Many popular names are traditionally associated with a specific gender
21. Andrea
21
However, the data we analyse comes from a mix of different countries, and certain names are more commonly associated with men in some countries and with women in
other countries. Andrea: IT vs DE.
22. gender
Computer
!22
This is why, for example, the tool that Bogdan has developed in the past consider location of the developer as the key to interpretation of the gender associated with a
particular name.
I am using my profile not because I am a paradigmatic developer but because I do not have permission of other GH/SO contributors to use their profiles
23. gender
Computer
!23
And of course he has also used heuristics to recognise the location based on zip codes, state abbreviations, top level domains and names of large cities.
24. Josh Terrell
et al.
gender
Computer
Bin Lin, Alexander Serebrenik: Recognizing gender of stack overflow users. MSR 2016: 425-429
Different data sets require different techniques
24
In 2016 we have evaluated several gender detection mechanisms on SO data. The ground truth was obtained by combining information from several surveys conducted
earlier. We have considered 5 basic techniques and added GH to check whether additional information helps.
25. 25
However, location as inferred by genderComputer on its own is not enough. Many of us do not live in countries where we have been born. This person’s name is Andrea
and they live in London. What do you think about the gender of this individual based on their name?
26. 26
NamSor takes surnames into account and hence can help with resolving gender of individuals that no longer live in the countries or origin. Unfortunately, NamSor is a
commercial tool using the freemium model. Moreover, NamSor works reasonably well only for “real” names as opposed to display names
27. Automatically generated
11%
No spaces
37%
Three or more spaces
1%
Two spaces
5%
One space
46%
!27
But, of course, viability of these approaches would depend on what share of GH developers or SO contributors provide information that can be interpreted as meaningful
first and last names. The grey segment indicates percentage of the SO contributors with automatically generated usernames such as user12345. For these contributors
no inference technique can be successful; both the red and the blue segments can be analysed by techniques such as genderComputer and NamSor; the red ones only
by genderComputer
28. Lucia Santamaría and Helena Mihaljević (2018), Comparison and benchmark of name-to-gender inference
services. PeerJ Comput. Sci. 4:e156; DOI 10.7717/peerj-cs.156
7,076 names
3,811 male,
1,968 female,
1,297 unknown
!28
errorCoded: % of individuals coded wrongly or not coded as opposed to the total number of predictions. errorCodedWithoutNA: % of individuals coded wrongly as
opposed to the total number of predictions. errorGenderBias: tendency to predict women as men (neg) or men as women (pos).
29. Lucia Santamaría and Helena Mihaljević (2018), Comparison and benchmark of name-to-gender inference
services. PeerJ Comput. Sci. 4:e156; DOI 10.7717/peerj-cs.156
Tends to
overpredict
women as men
Tends to
overpredict men
as women
Tends to
overreport
unknowns
7,076 names
3,811 male,
1,968 female,
1,297 unknown
!29
errorCoded: % of individuals coded wrongly or not coded as opposed to the total number of predictions. errorCodedWithoutNA: % of individuals coded wrongly as
opposed to the total number of predictions. errorGenderBias: tendency to predict women as men (neg) or men as women (pos). NamSor seems to do it quite well.
Data: different collections of authors of scientific publications (world of science, pubmed, etc)
30. Lucia Santamaría and Helena Mihaljević (2018), Comparison and benchmark of name-to-gender inference
services. PeerJ Comput. Sci. 4:e156; DOI 10.7717/peerj-cs.156!30
Closer inspection reveals a different story, however. Confidence of NamSor drops the we move to Asian names and particularly Easter and South-Eastern Asian names.
Half of the East-Asian names have a confidence score of 0!
31. !31
And this is indeed deeply problematic when trying to apply automatic gender inference techniques to software developers
32. With special thanks to Huilian Sophie Qiu (CMU)
Huilian Sophie Qiu, Alexander Nolte, Anita Brown, Alexander Serebrenik, Bogdan Vasilescu. Going Farther Together: The Impact of Social Capital
on Sustained Participation in Open Source 41st International Conference on Software Engineering (ICSE 2019), 2019, pp. 688-699!32
We also extracted features from the name itself, including the last character (e.g., in Spanish, names ending in ‘a’ tend to be female), the last two characters (e.g., in
Japan, names ending in ‘ko’ tend to be female), and tri-grams and 4-grams to capture romanized Chinese, Japanese, and Korean names.
33. With special thanks to Huilian Sophie Qiu (CMU)
Huilian Sophie Qiu, Alexander Nolte, Anita Brown, Alexander Serebrenik, Bogdan Vasilescu. Going Farther Together: The Impact of Social Capital
on Sustained Participation in Open Source 41st International Conference on Software Engineering (ICSE 2019), 2019, pp. 688-699!33
We have shared our results with NamSor and they plan on improving their accuracy when it comes to CJK names.
34. https://www.facelytics.io/en/
!34
Another way developers present themselves on social platforms is by using face recognition techniques; here we see that Facelytics has correctly identified my gender.
Age-wise it is a bit off, since I am 43.
35. !35
But things do not always go that smoothly. Daniela Petruzalek, a transgender software developer.
36. ~30%
autogenerated profile images
!36
However, not everybody has a meaningful profile picture. For instance, ca. 30% of the Stack Overflow users only have a default profile picture automatically generated
based on the MD5 hash of the users’ mail
37. Bin Lin, Alexander Serebrenik: Recognizing gender of stack overflow users. MSR 2016: 425-429
Age not
indicated
15-25 26-31 ≥32
Reputation
1-199
150 50 50 50
Reputation
200-999
150 50 50 50
Reputation
≥1000
150 50 50 50
!37
Moreover not all profile images represent faces (rather than logos or cat pictures). This is why we have carefully selected 900 non-generated profile images and classified
them manually. Reputation classes are related to different privileges associated with these classes; age intervals to the general distribution of the ages on SO
38. Bin Lin, Alexander Serebrenik: Recognizing gender of stack overflow users. MSR 2016: 425-429
53% (479/900)
!38
39. !39
Let us move to the discussion of artefacts created by software developers
40. Stefan Krüger, Ben Hermann. Can an Online Service Predict Gender? - On the State-of-the-Art in Gender Identification from
Texts. Gender Equality Workshop ICSE 2019
!40
When it comes to gender recognition based on the artefacts created most of the approaches consider blog posts and Twitter data.
41. Stefan Krüger, Ben Hermann. Can an Online Service Predict Gender? - On the State-of-the-Art in Gender Identification from
Texts. Gender Equality Workshop ICSE 2019
!41
For example the work of Company & Wanner has been designed in the first place for authorship attribution; similar authorship attribution techniques have been designed
for the source code. Is this the way to go? Do different gender code differently?
43. Krüger and
Hermann. Text.
62%-93%
Accuracy
Qiu et al.
Names
60%-84%
!43
for different datasets for different kinds of names
The accuracy of our techniques is not perfect. It can be even lower for some subcommunities, e.g., for Chinese names, when some of the gender-specific information is
lost during the romanization.
44. Bogdan Vasilescu, Vladimir Filkov, Alexander Serebrenik:
Perceptions of Diversity on Git Hub: A User Survey. CHASE@ICSE 2015: 50-56
“I have used a fake GitHub handle (my normal GitHub
handle is my first name, which is a distinctly female name)
so that people would assume I was male”
Reliability
!44
45. Krüger and Hermann. Text. 100%
Keyes. Face. 92.9-96.7%
Stefan Krüger, Ben Hermann. Can an Online Service Predict Gender? - On the State-of-the-Art in Gender Identification from
Texts. Gender Equality Workshop ICSE 2019
Os Keyes. The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition. CSCW 2018
Santamaría and Mihaljević. Names. 20%
Gender binary
!45
Most automatic techniques we discuss assume gender binary. These are percentages of papers reviewed in two meta-studies. Keyes: the first number corresponds to
the % in papers that introduce automatic gender recognition and the second one - to papers that use automatic gender recognition. The situation with names is a bit
better since the tools tend to be probabilistic and at least recognise their own lack of confidence.
46. !46
We have discussed two large groups of identifying the contributors’ gender: by asking questions and by applying algorithmic tools. None of the techniques is perfect,
choice of the technique should of course be done in function of the RQs. However, it might be equally important to discuss the limitations and problems of these
techniques (and not only their advantages that made us to choose them).
47. !47
I would like to conclude this talk by the following calls for action
48. !48
The two points I would like to make are (1) more focus on gender beyond the binary, which would require rethinking how to approach underrepresented communities, and
(2) gender in combination with other diversity attributes (age, culture, …). These narratives are missing.
Call for action: in the same way as we have adapted NamSor to include East-Asian names we need to be aware of cultural differences and take them into account when
analysing developers’ communities.
49. First
steps!49
The two points I would like to make are (1) more focus on gender beyond the binary, which would require rethinking how to approach underrepresented communities, and
(2) gender in combination with other diversity attributes (age, culture, …). These narratives are missing.
Call for action: in the same way as we have adapted NamSor to include East-Asian names we need to be aware of cultural differences and take them into account when
analysing developers’ communities.
50. Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
With special thanks to Denae Ford (NCSU)
!50
First steps
51. Control of Identity Disclosure:
The desire to be seen as presented
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
“Stack Overflow has constrained
expressions of identity. It’s up to
you what content you want to fill in.
GitHub for a while it was required
you expose your email address to
the rest of the world.”
Petruzalek: The obvious drawback of not being
passable is that you become an instant
target. So passability is not only an identity
goal, its also a mean of self-preservation
!51
52. Economically Stable Work:
Distance technical merits from identity
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
!52
A community with 58,481 members hunting for bounties and earning rewards.
30 percent of respondents to the 2015 U.S. Transgender Survey reported being mistreated in the workplace, denied a promotion, or fired because of their gender
expression or gender identity. Transgender Americans experience higher levels of unemployment (15% vs 5%), poverty (29% vs 12%) and homelessness (12% vs 0.2%)
than their non-transgender peers. http://www.engagetu.com/2018/04/12/economics-and-the-transgender-community/
53. Economically Stable Work:
Distance technical merits from identity
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
You cannot tell from my technical profiles that
I’m transgender. I don’t make a big deal that in
professional context. It’s just not relevant
Ross: “Technology has totally leveled the playing
field for someone like me. I can get on the internet
and watch tutorials. I have the drive to spend five
hours a day to teach myself a skill.”
!53
54. Control of Identity Disclosure:
The desire to be seen as presented
Economically Stable Work:
Distance technical merits from identity
Autonomy to Disengage or Reengage
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
!54
And it turns out that these advantages stem from the fact that software developers can work remotely. They can learn remotely, they can work remotely on such
platforms as BountySource. In fact, such numbers as 60% of remote workers have been mentioned by software development companies.
55. Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
We believe that remote work offers
a mechanism of control for identity
disclosure and empowerment of
software developers from any
marginalized communities.
!55
And it turns out that these advantages stem from the fact that software developers can work remotely. In fact, we believe that remote work offers a mechanism of control
for identity
disclosure and empowerment of software developers from any marginalized communities.
56. With special thanks to Margaret Burnett (Oregon State University)
!56
And here I would like to compare the discussion of remote work with the wonderful example of Margaret Burnett that shows how technological solutions supporting one
community can help many different ones. This is a picture from Amsterdam.
57. !57
With special thanks to Margaret Burnett (Oregon State University)
And here I would like to compare the discussion of remote work with the wonderful example of Margaret Burnett that shows how technological solutions supporting one
community can help many different ones.
58. !58
Summary: to achieve gender equality, diversity and inclusion goals we need to understand the experiences of people of different genders.
59. !59
Understanding their experiences requires identification of those genders; identification, manual or automatic, of an individual’s gender is a problematic and sensitive
subject. All existing solutions have their limitations.
60. @aserebrenik !60
This being said, the benefits of understanding the experiences of people of different genders is essential and, as the last study conjectures, it can be beneficial not only to
one marginalised community but to many of them.