SlideShare una empresa de Scribd logo
1 de 60
Descargar para leer sin conexión
Alexander Serebrenik

Eindhoven U of Technology

The Netherlands 

@aserebrenik
Identifying
Developers’ Gender:
State of the Art
1
This talk should be positioned in the broader context of studying gender diversity and its impact on software development process as well as on the outcome of the
software.
!2
Careful: gender is a complex social construct. Whatever one-word summary we plan to elicit from developers by whatever means, it will be necessarily an
oversimplification of gender identity and gender-related experiences of the individual.

Another word of warning is related to me continuously learning about gender and how to talk about gender. I am trying to pay attention to the words I choose but I might
make mistakes.
!3
There are two ways of inferring gender of an individual: either by asking them or by using some kind of algorithm based on the artefacts representing the individual
(name, profile picture) or created by them.
4
Whatever technique we use, we should keep in mind that gender is privacy sensitive and should be treated as such. Open source contributors might be hiding their
gender on purpose, e.g., many women-developers prefer not to disclose their gender due to safety concerns. Some open source projects do not necessarily want us to
know the genders of their members (but some do!) and companies might be sensitive to this topic as well.
!5
So let us start with discussing how we can identify gender when talking to people, i.e., conducting interviews or surveys.
2004
!6
A highly influential guide on questionnaire design (which I do recommend) published in 2004 recommends the two check boxes. Sex vs gender
0.7-0.9%
!7
However, recent surveys of Stack Overflow and GitHub indicate that 0.7-0.9% of software developers do not identify as either men or women. This might not appear
much but
0.7-0.9%
x 2
!8
it is twice as much as in the US population in general!
!9
Slightly better. 

This approach has been criticised for othering, creating a sense of unease in participants, and will lead to people feeling disrespected: they might not complete surveys
and they might let fellow trans people know the research team doesn’t “get it”.
Bauer GR. Making sure everyone counts: considerations for inclusion, identification, and analysis of transgender and transsexual participants in health
surveys. In: Coen S, Banister E, editors. What a difference sex and gender make. Vancouver: Institute of Gender and Health, Canadian Institutes of Health
Research; 2012. pp. 59–67.
!10
A better approach has been advocated by Greta Bauer in 2012. Unfortunately, my favourite survey platform, Google forms, does not seem to support this combination of
an open answer in a multiple choice question unless it is the “other”.
Bauer GR. Making sure everyone counts: considerations for inclusion, identification, and analysis of transgender and transsexual participants in health
surveys. In: Coen S, Banister E, editors. What a difference sex and gender make. Vancouver: Institute of Gender and Health, Canadian Institutes of Health
Research; 2012. pp. 59–67.
Bauer GR, Braimoh J, Scheim AI, Dharma C (2017) Transgender-inclusive measures of sex/gender for population surveys: Mixed-methods evaluation and
recommendations. PLoS ONE 12(5): e0178043. 
0
25
50
75
100
Male Female Other
Transfeminine (assigned male at birth, identify as women/non-binary)
Transmasculine (assigned female at birth, identify as men/non-binary)
!11
However, while this question was clear and easily answered by cisgender participants, it did not clearly identify birth-assigned sex or gender identity. In the interviews this
item was cognitively taxing for trans interview participants, who tried to figure out exactly what the researchers were asking, and reached different conclusions.
The GenIUSS Group. Best practices for asking questions to identify transgender and other gender minority respondents on population-based surveys. Herman
JL, editor. Los Angeles (CA): The Williams Institute; 2014 p. 1–68. !12
The GenIUSS group has proposed the schema on the slide. Some people would be offended by this phrasing as it implies that trans female are not necessarily female.
Do you consider yourself to be transgender?

() Yes

() No

() Questioning

Do you consider yourself to be gender non-conforming, gender diverse, gender variant, or gender expansive?

() Yes

() No

() Questioning

Are you intersex?

() Yes

() No

() I don't know

Where do you identify on the gender spectrum (check all that apply)?

[] Woman

[] Demi-girl

[] Man

[] Demi-boy

[] Non-binary

[] Demi-non-binary

[] Genderqueer

[] Genderflux

[] Genderfluid

[] Demi-fluid

[] Demi-gender

[] Bigender

[] Trigender

[] Two-Spirit

[] Multigender/polygender

[] Pangender/omnigender

[] Maxigender

[] Aporagender

[] Intergender

[] Maverique

[] Gender confusion/Gender f*ck

[] Gender indifferent

[] Graygender

[] Agender/genderless

[] Demi-agender

[] Genderless

[] Gender neutral

[] Neutrois

[] Androgynous

[] Androgyne

[] Prefer not to answer

[] Self Identify: _________________

Open
demographics
https://drnikki.github.io/open-demographics/questions/gender.html
!13
Good news: it separates a question about trans* and about gender non-conforming. Even more good news: woman/man instead of female/male; the former puts more
stress on identity as opposed to biology. More good news: the question explicitly refers to gender identity and avoids confusion reported for the survey instrument of
Bauer. And even more news: “check all that apply”, i.e., someone can be both woman and non-binary. Bad news: there are too many options and we want to keep
surveys (and particularly demographic parts of the surveys) short! Even more, some of these notions might be experienced as confusing or taxing.

Maverique (pronounced mav-reek) is a specific nonbinary gender identity "characterized by autonomy and inner conviction regarding a sense of self that is entirely
independent of male/masculinity, female/femininity or anything which derives from the two while still being neither without gender nor of a neutral gender." Maverique is
not close to a female or male gender, and is not like a mix of them; the identity goes beyond the entire scope of the gender binary or any identities within and outside of
it. Aporagender (from Greek apo, apor "separate" + "gender") is a nonbinary gender identity and umbrella term for "a gender separate from male, female, and anything in
between (unlike Androgyne) while still having a very strong and specific gendered feeling" (that is, not an absence of gender or agender). Neutrois is a non-binary gender
identity which is often associated with a "neutral" or "null" gender.
https://www.morgan-klaus.com/sigchi-gender-guidelines!14
A much better solution according to the HCI Guidelines for Gender Equity and Inclusivity is to ask an open ended question. This might be difficult for us as researchers to
process (code) but most of software engineering surveys are relatively small, a couple of hundreds of responses.
https://www.morgan-klaus.com/sigchi-gender-guidelines
Where do you identify on the gender spectrum (check all that apply)?

[] Woman

[] Man

[] Non-binary

[] Prefer not to disclose

[] Self Identify: _________________
!15
If you really want to run a huge survey and manual coding of answers is not an option, then the same HCI Guidelines for Gender Equity and Inclusivity recommend the
following phrasing. Also here, notice that *multiple* options are possible, the respondents have means not to disclose their gender or to provide their own response.
10-20%
!16
However, whatever survey techniques we use and however we ask the questions, two open problems remain: scale of the data and lack of response. This is a problem if
we want to perform a large scale data analysis to tease out minor effects using traditional statistical techniques, since to apply these techniques we need a lot of data to
ensure the power of statistical tests. Our recent study that Bogdan is going to talk about tomorrow has involved ~60K individuals. To get this number we will need to
survey ~300K-600K developers; if everyone spams 600K respondents the respondents will be even more fed up with us and will not answer our questions…
!17
Enter automatic gender detection mechanisms
Self-presentation
+
Artefacts created
Gender
!18
All these tools are based on the main assumption, namely, that gender can be inferred from the way developers present themselves (username, name, avatar) or artefacts
they produce (code, comments, etc.)
19
Basically many of the gender detection techniques look at the names. Many popular names are traditionally associated with a specific gender
https://previews.123rf.com/images/pavalena/pavalena1111/pavalena111100046/11314908-map-kingdom-of-belgium.jpg
!20
This practice is well established and in some countries it is even recorded in laws and administrative procedures. This is the case for Belgium, where by law the first name
should not be confusing. Most local administrations interpret it as “no girls’ names for boys, no boys’ names for girls”.
Andrea
21
However, the data we analyse comes from a mix of different countries, and certain names are more commonly associated with men in some countries and with women in
other countries. Andrea: IT vs DE.
gender
Computer
!22
This is why, for example, the tool that Bogdan has developed in the past consider location of the developer as the key to interpretation of the gender associated with a
particular name.

I am using my profile not because I am a paradigmatic developer but because I do not have permission of other GH/SO contributors to use their profiles
gender
Computer
!23
And of course he has also used heuristics to recognise the location based on zip codes, state abbreviations, top level domains and names of large cities.
Josh Terrell
et al.
gender
Computer
Bin Lin, Alexander Serebrenik: Recognizing gender of stack overflow users. MSR 2016: 425-429
Different data sets require different techniques
24
In 2016 we have evaluated several gender detection mechanisms on SO data. The ground truth was obtained by combining information from several surveys conducted
earlier. We have considered 5 basic techniques and added GH to check whether additional information helps.
25
However, location as inferred by genderComputer on its own is not enough. Many of us do not live in countries where we have been born. This person’s name is Andrea
and they live in London. What do you think about the gender of this individual based on their name?
26
NamSor takes surnames into account and hence can help with resolving gender of individuals that no longer live in the countries or origin. Unfortunately, NamSor is a
commercial tool using the freemium model. Moreover, NamSor works reasonably well only for “real” names as opposed to display names
Automatically generated
11%
No spaces
37%
Three or more spaces
1%
Two spaces
5%
One space
46%
!27
But, of course, viability of these approaches would depend on what share of GH developers or SO contributors provide information that can be interpreted as meaningful
first and last names. The grey segment indicates percentage of the SO contributors with automatically generated usernames such as user12345. For these contributors
no inference technique can be successful; both the red and the blue segments can be analysed by techniques such as genderComputer and NamSor; the red ones only
by genderComputer
Lucia Santamaría and Helena Mihaljević (2018), Comparison and benchmark of name-to-gender inference
services. PeerJ Comput. Sci. 4:e156; DOI 10.7717/peerj-cs.156
7,076 names
3,811 male,
1,968 female,
1,297 unknown
!28
errorCoded: % of individuals coded wrongly or not coded as opposed to the total number of predictions. errorCodedWithoutNA: % of individuals coded wrongly as
opposed to the total number of predictions. errorGenderBias: tendency to predict women as men (neg) or men as women (pos).
Lucia Santamaría and Helena Mihaljević (2018), Comparison and benchmark of name-to-gender inference
services. PeerJ Comput. Sci. 4:e156; DOI 10.7717/peerj-cs.156
Tends to
overpredict
women as men
Tends to
overpredict men
as women
Tends to
overreport
unknowns
7,076 names
3,811 male,
1,968 female,
1,297 unknown
!29
errorCoded: % of individuals coded wrongly or not coded as opposed to the total number of predictions. errorCodedWithoutNA: % of individuals coded wrongly as
opposed to the total number of predictions. errorGenderBias: tendency to predict women as men (neg) or men as women (pos). NamSor seems to do it quite well.

Data: different collections of authors of scientific publications (world of science, pubmed, etc)
Lucia Santamaría and Helena Mihaljević (2018), Comparison and benchmark of name-to-gender inference
services. PeerJ Comput. Sci. 4:e156; DOI 10.7717/peerj-cs.156!30
Closer inspection reveals a different story, however. Confidence of NamSor drops the we move to Asian names and particularly Easter and South-Eastern Asian names.
Half of the East-Asian names have a confidence score of 0!
!31
And this is indeed deeply problematic when trying to apply automatic gender inference techniques to software developers
With special thanks to Huilian Sophie Qiu (CMU)
Huilian Sophie Qiu, Alexander Nolte, Anita Brown, Alexander Serebrenik, Bogdan Vasilescu. Going Farther Together: The Impact of Social Capital
on Sustained Participation in Open Source 41st International Conference on Software Engineering (ICSE 2019), 2019, pp. 688-699!32
We also extracted features from the name itself, including the last character (e.g., in Spanish, names ending in ‘a’ tend to be female), the last two characters (e.g., in
Japan, names ending in ‘ko’ tend to be female), and tri-grams and 4-grams to capture romanized Chinese, Japanese, and Korean names.
With special thanks to Huilian Sophie Qiu (CMU)
Huilian Sophie Qiu, Alexander Nolte, Anita Brown, Alexander Serebrenik, Bogdan Vasilescu. Going Farther Together: The Impact of Social Capital
on Sustained Participation in Open Source 41st International Conference on Software Engineering (ICSE 2019), 2019, pp. 688-699!33
We have shared our results with NamSor and they plan on improving their accuracy when it comes to CJK names.
https://www.facelytics.io/en/
!34
Another way developers present themselves on social platforms is by using face recognition techniques; here we see that Facelytics has correctly identified my gender.
Age-wise it is a bit off, since I am 43.
!35
But things do not always go that smoothly. Daniela Petruzalek, a transgender software developer.
~30%
autogenerated profile images
!36
However, not everybody has a meaningful profile picture. For instance, ca. 30% of the Stack Overflow users only have a default profile picture automatically generated
based on the MD5 hash of the users’ mail
Bin Lin, Alexander Serebrenik: Recognizing gender of stack overflow users. MSR 2016: 425-429
Age not
indicated
15-25 26-31 ≥32
Reputation
1-199
150 50 50 50
Reputation
200-999
150 50 50 50
Reputation
≥1000
150 50 50 50
!37
Moreover not all profile images represent faces (rather than logos or cat pictures). This is why we have carefully selected 900 non-generated profile images and classified
them manually. Reputation classes are related to different privileges associated with these classes; age intervals to the general distribution of the ages on SO
Bin Lin, Alexander Serebrenik: Recognizing gender of stack overflow users. MSR 2016: 425-429
53% (479/900)
!38
!39
Let us move to the discussion of artefacts created by software developers
Stefan Krüger, Ben Hermann. Can an Online Service Predict Gender? - On the State-of-the-Art in Gender Identification from
Texts. Gender Equality Workshop ICSE 2019
!40
When it comes to gender recognition based on the artefacts created most of the approaches consider blog posts and Twitter data.
Stefan Krüger, Ben Hermann. Can an Online Service Predict Gender? - On the State-of-the-Art in Gender Identification from
Texts. Gender Equality Workshop ICSE 2019
!41
For example the work of Company & Wanner has been designed in the first place for authorship attribution; similar authorship attribution techniques have been designed
for the source code. Is this the way to go? Do different gender code differently?
However…
!42
Krüger and
Hermann. Text.
62%-93%
Accuracy
Qiu et al.
Names
60%-84%
!43
for different datasets for different kinds of names
The accuracy of our techniques is not perfect. It can be even lower for some subcommunities, e.g., for Chinese names, when some of the gender-specific information is
lost during the romanization.
Bogdan Vasilescu, Vladimir Filkov, Alexander Serebrenik:
Perceptions of Diversity on Git Hub: A User Survey. CHASE@ICSE 2015: 50-56
“I have used a fake GitHub handle (my normal GitHub
handle is my first name, which is a distinctly female name)
so that people would assume I was male”
Reliability
!44
Krüger and Hermann. Text. 100%
Keyes. Face. 92.9-96.7%
Stefan Krüger, Ben Hermann. Can an Online Service Predict Gender? - On the State-of-the-Art in Gender Identification from
Texts. Gender Equality Workshop ICSE 2019
Os Keyes. The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition. CSCW 2018
Santamaría and Mihaljević. Names. 20%
Gender binary
!45
Most automatic techniques we discuss assume gender binary. These are percentages of papers reviewed in two meta-studies. Keyes: the first number corresponds to
the % in papers that introduce automatic gender recognition and the second one - to papers that use automatic gender recognition. The situation with names is a bit
better since the tools tend to be probabilistic and at least recognise their own lack of confidence.
!46
We have discussed two large groups of identifying the contributors’ gender: by asking questions and by applying algorithmic tools. None of the techniques is perfect,
choice of the technique should of course be done in function of the RQs. However, it might be equally important to discuss the limitations and problems of these
techniques (and not only their advantages that made us to choose them).
!47
I would like to conclude this talk by the following calls for action
!48
The two points I would like to make are (1) more focus on gender beyond the binary, which would require rethinking how to approach underrepresented communities, and
(2) gender in combination with other diversity attributes (age, culture, …). These narratives are missing.

Call for action: in the same way as we have adapted NamSor to include East-Asian names we need to be aware of cultural differences and take them into account when
analysing developers’ communities.
First
steps!49
The two points I would like to make are (1) more focus on gender beyond the binary, which would require rethinking how to approach underrepresented communities, and
(2) gender in combination with other diversity attributes (age, culture, …). These narratives are missing.

Call for action: in the same way as we have adapted NamSor to include East-Asian names we need to be aware of cultural differences and take them into account when
analysing developers’ communities.
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
With special thanks to Denae Ford (NCSU)
!50
First steps
Control of Identity Disclosure:
The desire to be seen as presented
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
“Stack Overflow has constrained
expressions of identity. It’s up to
you what content you want to fill in.
GitHub for a while it was required
you expose your email address to
the rest of the world.”
Petruzalek: The obvious drawback of not being
passable is that you become an instant
target. So passability is not only an identity
goal, its also a mean of self-preservation
!51
Economically Stable Work:
Distance technical merits from identity
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
!52
A community with 58,481 members hunting for bounties and earning rewards.

30 percent of respondents to the 2015 U.S. Transgender Survey reported being mistreated in the workplace, denied a promotion, or fired because of their gender
expression or gender identity. Transgender Americans experience higher levels of unemployment (15% vs 5%), poverty (29% vs 12%) and homelessness (12% vs 0.2%)
than their non-transgender peers. http://www.engagetu.com/2018/04/12/economics-and-the-transgender-community/
Economically Stable Work:
Distance technical merits from identity
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
You cannot tell from my technical profiles that
I’m transgender. I don’t make a big deal that in
professional context. It’s just not relevant
Ross: “Technology has totally leveled the playing
field for someone like me. I can get on the internet
and watch tutorials. I have the drive to spend five
hours a day to teach myself a skill.”
!53
Control of Identity Disclosure:
The desire to be seen as presented
Economically Stable Work:
Distance technical merits from identity
Autonomy to Disengage or Reengage
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
!54
And it turns out that these advantages stem from the fact that software developers can work remotely. They can learn remotely, they can work remotely on such
platforms as BountySource. In fact, such numbers as 60% of remote workers have been mentioned by software development companies.
Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender
Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12
We believe that remote work offers
a mechanism of control for identity
disclosure and empowerment of
software developers from any
marginalized communities.
!55
And it turns out that these advantages stem from the fact that software developers can work remotely. In fact, we believe that remote work offers a mechanism of control
for identity

disclosure and empowerment of software developers from any marginalized communities.
With special thanks to Margaret Burnett (Oregon State University)
!56
And here I would like to compare the discussion of remote work with the wonderful example of Margaret Burnett that shows how technological solutions supporting one
community can help many different ones. This is a picture from Amsterdam.
!57
With special thanks to Margaret Burnett (Oregon State University)
And here I would like to compare the discussion of remote work with the wonderful example of Margaret Burnett that shows how technological solutions supporting one
community can help many different ones.
!58
Summary: to achieve gender equality, diversity and inclusion goals we need to understand the experiences of people of different genders.
!59
Understanding their experiences requires identification of those genders; identification, manual or automatic, of an individual’s gender is a problematic and sensitive
subject. All existing solutions have their limitations.
@aserebrenik !60
This being said, the benefits of understanding the experiences of people of different genders is essential and, as the last study conjectures, it can be beneficial not only to
one marginalised community but to many of them.

Más contenido relacionado

Similar a Identifying Developers’ Gender: State of the Art

Child Discipline Essay. children and discipline essay
Child Discipline Essay. children and discipline essayChild Discipline Essay. children and discipline essay
Child Discipline Essay. children and discipline essay
Angela Dougherty
 
HLP 7 VR Lesson TemplateAppendix D Lesson Plan TemplateInstru
HLP 7 VR Lesson TemplateAppendix D Lesson Plan TemplateInstruHLP 7 VR Lesson TemplateAppendix D Lesson Plan TemplateInstru
HLP 7 VR Lesson TemplateAppendix D Lesson Plan TemplateInstru
LizbethQuinonez813
 
Essay On Food Inc. Reflection Paper: Organic Foods Free Essay Example
Essay On Food Inc. Reflection Paper: Organic Foods Free Essay ExampleEssay On Food Inc. Reflection Paper: Organic Foods Free Essay Example
Essay On Food Inc. Reflection Paper: Organic Foods Free Essay Example
Jessica Turner
 
Xu 1Ling XuESL 015Ashley WeberNovember 11, 2015Annotat.docx
Xu 1Ling XuESL 015Ashley WeberNovember 11, 2015Annotat.docxXu 1Ling XuESL 015Ashley WeberNovember 11, 2015Annotat.docx
Xu 1Ling XuESL 015Ashley WeberNovember 11, 2015Annotat.docx
ericbrooks84875
 
Research Methods 101, by Elliott Hedman
Research Methods 101, by Elliott HedmanResearch Methods 101, by Elliott Hedman
Research Methods 101, by Elliott Hedman
natematias
 
Outline For An Expository Essay.pdf
Outline For An Expository Essay.pdfOutline For An Expository Essay.pdf
Outline For An Expository Essay.pdf
Jacqueline Simpson
 

Similar a Identifying Developers’ Gender: State of the Art (20)

Gender free tech momentum to mitigate biases in ai
Gender free tech   momentum to mitigate biases in aiGender free tech   momentum to mitigate biases in ai
Gender free tech momentum to mitigate biases in ai
 
Child Discipline Essay. children and discipline essay
Child Discipline Essay. children and discipline essayChild Discipline Essay. children and discipline essay
Child Discipline Essay. children and discipline essay
 
Diversity and Inclusion
Diversity and InclusionDiversity and Inclusion
Diversity and Inclusion
 
5 Ways to Conquer Unconscious Bias in Diversity Hiring
5 Ways to Conquer Unconscious Bias in Diversity Hiring 5 Ways to Conquer Unconscious Bias in Diversity Hiring
5 Ways to Conquer Unconscious Bias in Diversity Hiring
 
Nonprobability report-may-2016-final
Nonprobability report-may-2016-finalNonprobability report-may-2016-final
Nonprobability report-may-2016-final
 
How Do You Write A Compare And Contrast Essay
How Do You Write A Compare And Contrast EssayHow Do You Write A Compare And Contrast Essay
How Do You Write A Compare And Contrast Essay
 
How Do You Write A Compare And Contrast Essay.pdf
How Do You Write A Compare And Contrast Essay.pdfHow Do You Write A Compare And Contrast Essay.pdf
How Do You Write A Compare And Contrast Essay.pdf
 
How To Write A Good Hook For An English Essay - How To
How To Write A Good Hook For An English Essay - How ToHow To Write A Good Hook For An English Essay - How To
How To Write A Good Hook For An English Essay - How To
 
HLP 7 VR Lesson TemplateAppendix D Lesson Plan TemplateInstru
HLP 7 VR Lesson TemplateAppendix D Lesson Plan TemplateInstruHLP 7 VR Lesson TemplateAppendix D Lesson Plan TemplateInstru
HLP 7 VR Lesson TemplateAppendix D Lesson Plan TemplateInstru
 
Essay Writing Quiz Pdf
Essay Writing Quiz PdfEssay Writing Quiz Pdf
Essay Writing Quiz Pdf
 
655-Final
655-Final655-Final
655-Final
 
Essay On Food Inc. Reflection Paper: Organic Foods Free Essay Example
Essay On Food Inc. Reflection Paper: Organic Foods Free Essay ExampleEssay On Food Inc. Reflection Paper: Organic Foods Free Essay Example
Essay On Food Inc. Reflection Paper: Organic Foods Free Essay Example
 
COMPLETE GUIDE ON WRITING A STELLAR RESEARCH PAPER ON CRIMINAL BEHAVIOR
COMPLETE GUIDE ON WRITING A STELLAR RESEARCH PAPER ON CRIMINAL BEHAVIORCOMPLETE GUIDE ON WRITING A STELLAR RESEARCH PAPER ON CRIMINAL BEHAVIOR
COMPLETE GUIDE ON WRITING A STELLAR RESEARCH PAPER ON CRIMINAL BEHAVIOR
 
Essay On Juvenile Incarceration
Essay On Juvenile IncarcerationEssay On Juvenile Incarceration
Essay On Juvenile Incarceration
 
Xu 1Ling XuESL 015Ashley WeberNovember 11, 2015Annotat.docx
Xu 1Ling XuESL 015Ashley WeberNovember 11, 2015Annotat.docxXu 1Ling XuESL 015Ashley WeberNovember 11, 2015Annotat.docx
Xu 1Ling XuESL 015Ashley WeberNovember 11, 2015Annotat.docx
 
Research Methods 101, by Elliott Hedman
Research Methods 101, by Elliott HedmanResearch Methods 101, by Elliott Hedman
Research Methods 101, by Elliott Hedman
 
Outline For An Expository Essay.pdf
Outline For An Expository Essay.pdfOutline For An Expository Essay.pdf
Outline For An Expository Essay.pdf
 
023 Essay Example Maxresdefault Write My
023 Essay Example Maxresdefault Write My023 Essay Example Maxresdefault Write My
023 Essay Example Maxresdefault Write My
 
Essay About Sport.pdf
Essay About Sport.pdfEssay About Sport.pdf
Essay About Sport.pdf
 
Essay About Sport.pdf
Essay About Sport.pdfEssay About Sport.pdf
Essay About Sport.pdf
 

Más de Alexander Serebrenik

“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
Alexander Serebrenik
 
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
Alexander Serebrenik
 
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Alexander Serebrenik
 
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis AlarmsAn Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
Alexander Serebrenik
 
Classification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis AlarmsClassification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis Alarms
Alexander Serebrenik
 
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The NetherlandsWhat Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
Alexander Serebrenik
 
Opinion Mining for Software Engineering
Opinion Mining for Software EngineeringOpinion Mining for Software Engineering
Opinion Mining for Software Engineering
Alexander Serebrenik
 

Más de Alexander Serebrenik (20)

Software development is a human activity: understanding software requires und...
Software development is a human activity: understanding software requires und...Software development is a human activity: understanding software requires und...
Software development is a human activity: understanding software requires und...
 
Towards Continuous Performance Assessment of Java Applications With PerfBot
Towards Continuous Performance Assessment of Java Applications With PerfBotTowards Continuous Performance Assessment of Java Applications With PerfBot
Towards Continuous Performance Assessment of Java Applications With PerfBot
 
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
 
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
 
Emotion Analysis in Software Ecosystems
Emotion Analysis in Software EcosystemsEmotion Analysis in Software Ecosystems
Emotion Analysis in Software Ecosystems
 
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
 
Gender and Age in Software Engineering
Gender and Age in Software EngineeringGender and Age in Software Engineering
Gender and Age in Software Engineering
 
Alexander - intro
Alexander - introAlexander - intro
Alexander - intro
 
Diversity and inclusion in a CS classroom
Diversity and inclusion in a CS classroomDiversity and inclusion in a CS classroom
Diversity and inclusion in a CS classroom
 
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis AlarmsAn Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
 
Classification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis AlarmsClassification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis Alarms
 
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The NetherlandsWhat Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
 
Gender and Community Smells
Gender and Community SmellsGender and Community Smells
Gender and Community Smells
 
Bias in MSR Research
Bias in MSR ResearchBias in MSR Research
Bias in MSR Research
 
From team organisation to software quality
From team organisation to software qualityFrom team organisation to software quality
From team organisation to software quality
 
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
 
My research story (presentation at ICSE 2021 New Faculty Symposium)
My research story (presentation at ICSE 2021 New Faculty Symposium)My research story (presentation at ICSE 2021 New Faculty Symposium)
My research story (presentation at ICSE 2021 New Faculty Symposium)
 
Opinion Mining for Software Engineering
Opinion Mining for Software EngineeringOpinion Mining for Software Engineering
Opinion Mining for Software Engineering
 
Removing Self Admitted Technical Debt
Removing Self Admitted Technical DebtRemoving Self Admitted Technical Debt
Removing Self Admitted Technical Debt
 
Gender Diversity and Inclusion and Software Engineering
Gender Diversity and Inclusion and Software EngineeringGender Diversity and Inclusion and Software Engineering
Gender Diversity and Inclusion and Software Engineering
 

Último

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 

Último (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 

Identifying Developers’ Gender: State of the Art

  • 1. Alexander Serebrenik Eindhoven U of Technology The Netherlands @aserebrenik Identifying Developers’ Gender: State of the Art 1 This talk should be positioned in the broader context of studying gender diversity and its impact on software development process as well as on the outcome of the software.
  • 2. !2 Careful: gender is a complex social construct. Whatever one-word summary we plan to elicit from developers by whatever means, it will be necessarily an oversimplification of gender identity and gender-related experiences of the individual. Another word of warning is related to me continuously learning about gender and how to talk about gender. I am trying to pay attention to the words I choose but I might make mistakes.
  • 3. !3 There are two ways of inferring gender of an individual: either by asking them or by using some kind of algorithm based on the artefacts representing the individual (name, profile picture) or created by them.
  • 4. 4 Whatever technique we use, we should keep in mind that gender is privacy sensitive and should be treated as such. Open source contributors might be hiding their gender on purpose, e.g., many women-developers prefer not to disclose their gender due to safety concerns. Some open source projects do not necessarily want us to know the genders of their members (but some do!) and companies might be sensitive to this topic as well.
  • 5. !5 So let us start with discussing how we can identify gender when talking to people, i.e., conducting interviews or surveys.
  • 6. 2004 !6 A highly influential guide on questionnaire design (which I do recommend) published in 2004 recommends the two check boxes. Sex vs gender
  • 7. 0.7-0.9% !7 However, recent surveys of Stack Overflow and GitHub indicate that 0.7-0.9% of software developers do not identify as either men or women. This might not appear much but
  • 8. 0.7-0.9% x 2 !8 it is twice as much as in the US population in general!
  • 9. !9 Slightly better. This approach has been criticised for othering, creating a sense of unease in participants, and will lead to people feeling disrespected: they might not complete surveys and they might let fellow trans people know the research team doesn’t “get it”.
  • 10. Bauer GR. Making sure everyone counts: considerations for inclusion, identification, and analysis of transgender and transsexual participants in health surveys. In: Coen S, Banister E, editors. What a difference sex and gender make. Vancouver: Institute of Gender and Health, Canadian Institutes of Health Research; 2012. pp. 59–67. !10 A better approach has been advocated by Greta Bauer in 2012. Unfortunately, my favourite survey platform, Google forms, does not seem to support this combination of an open answer in a multiple choice question unless it is the “other”.
  • 11. Bauer GR. Making sure everyone counts: considerations for inclusion, identification, and analysis of transgender and transsexual participants in health surveys. In: Coen S, Banister E, editors. What a difference sex and gender make. Vancouver: Institute of Gender and Health, Canadian Institutes of Health Research; 2012. pp. 59–67. Bauer GR, Braimoh J, Scheim AI, Dharma C (2017) Transgender-inclusive measures of sex/gender for population surveys: Mixed-methods evaluation and recommendations. PLoS ONE 12(5): e0178043.  0 25 50 75 100 Male Female Other Transfeminine (assigned male at birth, identify as women/non-binary) Transmasculine (assigned female at birth, identify as men/non-binary) !11 However, while this question was clear and easily answered by cisgender participants, it did not clearly identify birth-assigned sex or gender identity. In the interviews this item was cognitively taxing for trans interview participants, who tried to figure out exactly what the researchers were asking, and reached different conclusions.
  • 12. The GenIUSS Group. Best practices for asking questions to identify transgender and other gender minority respondents on population-based surveys. Herman JL, editor. Los Angeles (CA): The Williams Institute; 2014 p. 1–68. !12 The GenIUSS group has proposed the schema on the slide. Some people would be offended by this phrasing as it implies that trans female are not necessarily female.
  • 13. Do you consider yourself to be transgender? () Yes () No () Questioning Do you consider yourself to be gender non-conforming, gender diverse, gender variant, or gender expansive? () Yes () No () Questioning Are you intersex? () Yes () No () I don't know Where do you identify on the gender spectrum (check all that apply)? [] Woman [] Demi-girl [] Man [] Demi-boy [] Non-binary [] Demi-non-binary [] Genderqueer [] Genderflux [] Genderfluid [] Demi-fluid [] Demi-gender [] Bigender [] Trigender [] Two-Spirit [] Multigender/polygender [] Pangender/omnigender [] Maxigender [] Aporagender [] Intergender [] Maverique [] Gender confusion/Gender f*ck [] Gender indifferent [] Graygender [] Agender/genderless [] Demi-agender [] Genderless [] Gender neutral [] Neutrois [] Androgynous [] Androgyne [] Prefer not to answer [] Self Identify: _________________ Open demographics https://drnikki.github.io/open-demographics/questions/gender.html !13 Good news: it separates a question about trans* and about gender non-conforming. Even more good news: woman/man instead of female/male; the former puts more stress on identity as opposed to biology. More good news: the question explicitly refers to gender identity and avoids confusion reported for the survey instrument of Bauer. And even more news: “check all that apply”, i.e., someone can be both woman and non-binary. Bad news: there are too many options and we want to keep surveys (and particularly demographic parts of the surveys) short! Even more, some of these notions might be experienced as confusing or taxing. Maverique (pronounced mav-reek) is a specific nonbinary gender identity "characterized by autonomy and inner conviction regarding a sense of self that is entirely independent of male/masculinity, female/femininity or anything which derives from the two while still being neither without gender nor of a neutral gender." Maverique is not close to a female or male gender, and is not like a mix of them; the identity goes beyond the entire scope of the gender binary or any identities within and outside of it. Aporagender (from Greek apo, apor "separate" + "gender") is a nonbinary gender identity and umbrella term for "a gender separate from male, female, and anything in between (unlike Androgyne) while still having a very strong and specific gendered feeling" (that is, not an absence of gender or agender). Neutrois is a non-binary gender identity which is often associated with a "neutral" or "null" gender.
  • 14. https://www.morgan-klaus.com/sigchi-gender-guidelines!14 A much better solution according to the HCI Guidelines for Gender Equity and Inclusivity is to ask an open ended question. This might be difficult for us as researchers to process (code) but most of software engineering surveys are relatively small, a couple of hundreds of responses.
  • 15. https://www.morgan-klaus.com/sigchi-gender-guidelines Where do you identify on the gender spectrum (check all that apply)? [] Woman [] Man [] Non-binary [] Prefer not to disclose [] Self Identify: _________________ !15 If you really want to run a huge survey and manual coding of answers is not an option, then the same HCI Guidelines for Gender Equity and Inclusivity recommend the following phrasing. Also here, notice that *multiple* options are possible, the respondents have means not to disclose their gender or to provide their own response.
  • 16. 10-20% !16 However, whatever survey techniques we use and however we ask the questions, two open problems remain: scale of the data and lack of response. This is a problem if we want to perform a large scale data analysis to tease out minor effects using traditional statistical techniques, since to apply these techniques we need a lot of data to ensure the power of statistical tests. Our recent study that Bogdan is going to talk about tomorrow has involved ~60K individuals. To get this number we will need to survey ~300K-600K developers; if everyone spams 600K respondents the respondents will be even more fed up with us and will not answer our questions…
  • 17. !17 Enter automatic gender detection mechanisms
  • 18. Self-presentation + Artefacts created Gender !18 All these tools are based on the main assumption, namely, that gender can be inferred from the way developers present themselves (username, name, avatar) or artefacts they produce (code, comments, etc.)
  • 19. 19 Basically many of the gender detection techniques look at the names. Many popular names are traditionally associated with a specific gender
  • 20. https://previews.123rf.com/images/pavalena/pavalena1111/pavalena111100046/11314908-map-kingdom-of-belgium.jpg !20 This practice is well established and in some countries it is even recorded in laws and administrative procedures. This is the case for Belgium, where by law the first name should not be confusing. Most local administrations interpret it as “no girls’ names for boys, no boys’ names for girls”.
  • 21. Andrea 21 However, the data we analyse comes from a mix of different countries, and certain names are more commonly associated with men in some countries and with women in other countries. Andrea: IT vs DE.
  • 22. gender Computer !22 This is why, for example, the tool that Bogdan has developed in the past consider location of the developer as the key to interpretation of the gender associated with a particular name. I am using my profile not because I am a paradigmatic developer but because I do not have permission of other GH/SO contributors to use their profiles
  • 23. gender Computer !23 And of course he has also used heuristics to recognise the location based on zip codes, state abbreviations, top level domains and names of large cities.
  • 24. Josh Terrell et al. gender Computer Bin Lin, Alexander Serebrenik: Recognizing gender of stack overflow users. MSR 2016: 425-429 Different data sets require different techniques 24 In 2016 we have evaluated several gender detection mechanisms on SO data. The ground truth was obtained by combining information from several surveys conducted earlier. We have considered 5 basic techniques and added GH to check whether additional information helps.
  • 25. 25 However, location as inferred by genderComputer on its own is not enough. Many of us do not live in countries where we have been born. This person’s name is Andrea and they live in London. What do you think about the gender of this individual based on their name?
  • 26. 26 NamSor takes surnames into account and hence can help with resolving gender of individuals that no longer live in the countries or origin. Unfortunately, NamSor is a commercial tool using the freemium model. Moreover, NamSor works reasonably well only for “real” names as opposed to display names
  • 27. Automatically generated 11% No spaces 37% Three or more spaces 1% Two spaces 5% One space 46% !27 But, of course, viability of these approaches would depend on what share of GH developers or SO contributors provide information that can be interpreted as meaningful first and last names. The grey segment indicates percentage of the SO contributors with automatically generated usernames such as user12345. For these contributors no inference technique can be successful; both the red and the blue segments can be analysed by techniques such as genderComputer and NamSor; the red ones only by genderComputer
  • 28. Lucia Santamaría and Helena Mihaljević (2018), Comparison and benchmark of name-to-gender inference services. PeerJ Comput. Sci. 4:e156; DOI 10.7717/peerj-cs.156 7,076 names 3,811 male, 1,968 female, 1,297 unknown !28 errorCoded: % of individuals coded wrongly or not coded as opposed to the total number of predictions. errorCodedWithoutNA: % of individuals coded wrongly as opposed to the total number of predictions. errorGenderBias: tendency to predict women as men (neg) or men as women (pos).
  • 29. Lucia Santamaría and Helena Mihaljević (2018), Comparison and benchmark of name-to-gender inference services. PeerJ Comput. Sci. 4:e156; DOI 10.7717/peerj-cs.156 Tends to overpredict women as men Tends to overpredict men as women Tends to overreport unknowns 7,076 names 3,811 male, 1,968 female, 1,297 unknown !29 errorCoded: % of individuals coded wrongly or not coded as opposed to the total number of predictions. errorCodedWithoutNA: % of individuals coded wrongly as opposed to the total number of predictions. errorGenderBias: tendency to predict women as men (neg) or men as women (pos). NamSor seems to do it quite well. Data: different collections of authors of scientific publications (world of science, pubmed, etc)
  • 30. Lucia Santamaría and Helena Mihaljević (2018), Comparison and benchmark of name-to-gender inference services. PeerJ Comput. Sci. 4:e156; DOI 10.7717/peerj-cs.156!30 Closer inspection reveals a different story, however. Confidence of NamSor drops the we move to Asian names and particularly Easter and South-Eastern Asian names. Half of the East-Asian names have a confidence score of 0!
  • 31. !31 And this is indeed deeply problematic when trying to apply automatic gender inference techniques to software developers
  • 32. With special thanks to Huilian Sophie Qiu (CMU) Huilian Sophie Qiu, Alexander Nolte, Anita Brown, Alexander Serebrenik, Bogdan Vasilescu. Going Farther Together: The Impact of Social Capital on Sustained Participation in Open Source 41st International Conference on Software Engineering (ICSE 2019), 2019, pp. 688-699!32 We also extracted features from the name itself, including the last character (e.g., in Spanish, names ending in ‘a’ tend to be female), the last two characters (e.g., in Japan, names ending in ‘ko’ tend to be female), and tri-grams and 4-grams to capture romanized Chinese, Japanese, and Korean names.
  • 33. With special thanks to Huilian Sophie Qiu (CMU) Huilian Sophie Qiu, Alexander Nolte, Anita Brown, Alexander Serebrenik, Bogdan Vasilescu. Going Farther Together: The Impact of Social Capital on Sustained Participation in Open Source 41st International Conference on Software Engineering (ICSE 2019), 2019, pp. 688-699!33 We have shared our results with NamSor and they plan on improving their accuracy when it comes to CJK names.
  • 34. https://www.facelytics.io/en/ !34 Another way developers present themselves on social platforms is by using face recognition techniques; here we see that Facelytics has correctly identified my gender. Age-wise it is a bit off, since I am 43.
  • 35. !35 But things do not always go that smoothly. Daniela Petruzalek, a transgender software developer.
  • 36. ~30% autogenerated profile images !36 However, not everybody has a meaningful profile picture. For instance, ca. 30% of the Stack Overflow users only have a default profile picture automatically generated based on the MD5 hash of the users’ mail
  • 37. Bin Lin, Alexander Serebrenik: Recognizing gender of stack overflow users. MSR 2016: 425-429 Age not indicated 15-25 26-31 ≥32 Reputation 1-199 150 50 50 50 Reputation 200-999 150 50 50 50 Reputation ≥1000 150 50 50 50 !37 Moreover not all profile images represent faces (rather than logos or cat pictures). This is why we have carefully selected 900 non-generated profile images and classified them manually. Reputation classes are related to different privileges associated with these classes; age intervals to the general distribution of the ages on SO
  • 38. Bin Lin, Alexander Serebrenik: Recognizing gender of stack overflow users. MSR 2016: 425-429 53% (479/900) !38
  • 39. !39 Let us move to the discussion of artefacts created by software developers
  • 40. Stefan Krüger, Ben Hermann. Can an Online Service Predict Gender? - On the State-of-the-Art in Gender Identification from Texts. Gender Equality Workshop ICSE 2019 !40 When it comes to gender recognition based on the artefacts created most of the approaches consider blog posts and Twitter data.
  • 41. Stefan Krüger, Ben Hermann. Can an Online Service Predict Gender? - On the State-of-the-Art in Gender Identification from Texts. Gender Equality Workshop ICSE 2019 !41 For example the work of Company & Wanner has been designed in the first place for authorship attribution; similar authorship attribution techniques have been designed for the source code. Is this the way to go? Do different gender code differently?
  • 43. Krüger and Hermann. Text. 62%-93% Accuracy Qiu et al. Names 60%-84% !43 for different datasets for different kinds of names The accuracy of our techniques is not perfect. It can be even lower for some subcommunities, e.g., for Chinese names, when some of the gender-specific information is lost during the romanization.
  • 44. Bogdan Vasilescu, Vladimir Filkov, Alexander Serebrenik: Perceptions of Diversity on Git Hub: A User Survey. CHASE@ICSE 2015: 50-56 “I have used a fake GitHub handle (my normal GitHub handle is my first name, which is a distinctly female name) so that people would assume I was male” Reliability !44
  • 45. Krüger and Hermann. Text. 100% Keyes. Face. 92.9-96.7% Stefan Krüger, Ben Hermann. Can an Online Service Predict Gender? - On the State-of-the-Art in Gender Identification from Texts. Gender Equality Workshop ICSE 2019 Os Keyes. The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition. CSCW 2018 Santamaría and Mihaljević. Names. 20% Gender binary !45 Most automatic techniques we discuss assume gender binary. These are percentages of papers reviewed in two meta-studies. Keyes: the first number corresponds to the % in papers that introduce automatic gender recognition and the second one - to papers that use automatic gender recognition. The situation with names is a bit better since the tools tend to be probabilistic and at least recognise their own lack of confidence.
  • 46. !46 We have discussed two large groups of identifying the contributors’ gender: by asking questions and by applying algorithmic tools. None of the techniques is perfect, choice of the technique should of course be done in function of the RQs. However, it might be equally important to discuss the limitations and problems of these techniques (and not only their advantages that made us to choose them).
  • 47. !47 I would like to conclude this talk by the following calls for action
  • 48. !48 The two points I would like to make are (1) more focus on gender beyond the binary, which would require rethinking how to approach underrepresented communities, and (2) gender in combination with other diversity attributes (age, culture, …). These narratives are missing. Call for action: in the same way as we have adapted NamSor to include East-Asian names we need to be aware of cultural differences and take them into account when analysing developers’ communities.
  • 49. First steps!49 The two points I would like to make are (1) more focus on gender beyond the binary, which would require rethinking how to approach underrepresented communities, and (2) gender in combination with other diversity attributes (age, culture, …). These narratives are missing. Call for action: in the same way as we have adapted NamSor to include East-Asian names we need to be aware of cultural differences and take them into account when analysing developers’ communities.
  • 50. Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12 With special thanks to Denae Ford (NCSU) !50 First steps
  • 51. Control of Identity Disclosure: The desire to be seen as presented Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12 “Stack Overflow has constrained expressions of identity. It’s up to you what content you want to fill in. GitHub for a while it was required you expose your email address to the rest of the world.” Petruzalek: The obvious drawback of not being passable is that you become an instant target. So passability is not only an identity goal, its also a mean of self-preservation !51
  • 52. Economically Stable Work: Distance technical merits from identity Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12 !52 A community with 58,481 members hunting for bounties and earning rewards. 30 percent of respondents to the 2015 U.S. Transgender Survey reported being mistreated in the workplace, denied a promotion, or fired because of their gender expression or gender identity. Transgender Americans experience higher levels of unemployment (15% vs 5%), poverty (29% vs 12%) and homelessness (12% vs 0.2%) than their non-transgender peers. http://www.engagetu.com/2018/04/12/economics-and-the-transgender-community/
  • 53. Economically Stable Work: Distance technical merits from identity Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12 You cannot tell from my technical profiles that I’m transgender. I don’t make a big deal that in professional context. It’s just not relevant Ross: “Technology has totally leveled the playing field for someone like me. I can get on the internet and watch tutorials. I have the drive to spend five hours a day to teach myself a skill.” !53
  • 54. Control of Identity Disclosure: The desire to be seen as presented Economically Stable Work: Distance technical merits from identity Autonomy to Disengage or Reengage Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12 !54 And it turns out that these advantages stem from the fact that software developers can work remotely. They can learn remotely, they can work remotely on such platforms as BountySource. In fact, such numbers as 60% of remote workers have been mentioned by software development companies.
  • 55. Denae Ford, Reed Milewicz, Alexander Serebrenik. How Remote Work Can Foster a More Inclusive Environment for Transgender Developers Workshop on Gender Equality in Software Engineering, 2019, pp. 9-12 We believe that remote work offers a mechanism of control for identity disclosure and empowerment of software developers from any marginalized communities. !55 And it turns out that these advantages stem from the fact that software developers can work remotely. In fact, we believe that remote work offers a mechanism of control for identity disclosure and empowerment of software developers from any marginalized communities.
  • 56. With special thanks to Margaret Burnett (Oregon State University) !56 And here I would like to compare the discussion of remote work with the wonderful example of Margaret Burnett that shows how technological solutions supporting one community can help many different ones. This is a picture from Amsterdam.
  • 57. !57 With special thanks to Margaret Burnett (Oregon State University) And here I would like to compare the discussion of remote work with the wonderful example of Margaret Burnett that shows how technological solutions supporting one community can help many different ones.
  • 58. !58 Summary: to achieve gender equality, diversity and inclusion goals we need to understand the experiences of people of different genders.
  • 59. !59 Understanding their experiences requires identification of those genders; identification, manual or automatic, of an individual’s gender is a problematic and sensitive subject. All existing solutions have their limitations.
  • 60. @aserebrenik !60 This being said, the benefits of understanding the experiences of people of different genders is essential and, as the last study conjectures, it can be beneficial not only to one marginalised community but to many of them.