Briefing to eCommerce negotiators on algorithmic decision making and associated issues of algorithmic bias. The presentation uses examples to highlight the types and causes of algorithmic decision making bias and to summarize the current state of regulatory responses.
GESCO SE Press and Analyst Conference on Financial Results 2024
Algorithmic Bias Considerations
1. Bias in Algorithmic Systems
ANSGAR KOENE,
HORIZON DIGITAL ECONOMY RESEARCH INSTITUTE, UNIVERSITY OF NOTTIN GHAM
CHAIR OF IEEE P7003 STANDARD FOR ALGORITHMIC BIAS CONSIDERATIONS
23RD JULY 2018
2. Algorithmic Bias Considerations
All non-trivial* decisions are biased
We seek to minimize bias that is:
Unintended
Unjustified
Unacceptable
as defined by the context where the system is used.
*Non-trivial means the decision has more than one possible outcome and the
choice is not uniformly random.
3. Causes of algorithmic bias
Insufficient understanding of the context of use. – who will be affected?
Failure to rigorously identify the decision/optimization criteria.
Failure to have explicit justifications for the chosen criteria.
Failure to check if the justifications are acceptable in the context where the
system is used.
System not performing as intended. – implementation errors; unreliable input
data.
4. ‘Coded Gaze’
Failure to consider user groups
Face recognition algorithms built and tested using easily accessible examples – US university
students.
Sampling bias for WEIRD (White Educated Industrialized Rich Democratic)
Joy Boulowimi - Algorithmic Justice League - https://www.ajlunited.org/
5. Arkansas algorithmic Medicaid
assessment instrument
When introduced in 2016, many people with cerebral palsy had their care dramatically
reduced – they sued the state resulting in a court case.
Investigation in court revealed:
The algorithm relies on 60 answer scores to questions about descriptions, symptoms and
ailments. A small number of variables could matter enormously: a difference between a
three instead of a four on a handful of items meant a cut of dozens of care hours a month.
One variable was foot problems. When an assessor visited a certain person, they wrote that
the person didn’t have any foot problems — because they were an amputee and didn’t
have feet.
Third-party software vendor implementing the system, mistakenly used a version of the
algorithm that didn’t account for diabetes issues.
Cerebral palsy, wasn’t properly coded in the algorithm, causing incorrect calculations for
hundreds of people, mostly lowering their hours.
https://www.theverge.com/2018/3/21/17144260/healthcare-medicaid-algorithm-arkansas-cerebral-palsy
6. COMPAS Recidivism risk prediction
not-biased, but still biased
COMPAS recidivism prediction tool
Estimates likelihood of criminals re-offending in future
◦ Producers of COMPAS made sure the tool produced similar accuracy rates for
white and black offenders
ProPublica investigation reported systematic racial bias in the
recidivism predictions
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
7. Accuracy rates vs.
False Positives and False Negatives rates
When base rates differ, no solution can simultaneously achieve similar rates of:
◦ Accuracy/Overall Misclassification
◦ False Negatives
◦ False Positives
Figure 2Figure 1
8. Search engine manipulation effect could impact elections –
distort competition
Experiments that manipulated the search
rankings for information about political
candidates for 4556 undecided voters.
i. biased search rankings can shift the voting
preferences of undecided voters by 20% or
more
ii. the shift can be much higher in some
demographic groups
iii. such rankings can be masked so that people
show no awareness of the manipulation.
R. Epstein & R.E. Robertson “The search engine manipulation effect (SEME) and its possible impact on the outcome
of elections”, PNAS, 112, E4512-21, 2015
9. Algorithmic price setting by sellers on amazon
algorithmic tuning of prices becomes visible when something goes wrong
Based on observations based reverse
engineering:
profnath algorithm:
1. Find highest-priced copy of book
2. set price to be 0.9983 times highest-priced.
bordeebook algorithm:
1. Find highest-priced book from other sellers
2. Set price to 1.270589 times that price.
However, “0.9983” and “1.270589” suggests the
ratios were not chosen by humans
http://www.michaeleisen.org/blog/?p=358
11. 11
Initiative launched at end of 2016
https://standards.ieee.org/develop/indconn/ec/autonomous_systems.html
12. IEEE P70xx Standards Projects
typically 3-4 years development
earliest completions expected in next 6-12 months
IEEE P7000: Model Process for Addressing Ethical Concerns During System Design
IEEE P7001: Transparency of Autonomous Systems
IEEE P7002: Data Privacy Process
IEEE P7003: Algorithmic Bias Considerations
IEEE P7004: Child and Student Data Governance
IEEE P7005: Employer Data Governance
IEEE P7006: Personal Data AI Agent Working Group
IEEE P7007: Ontological Standard for Ethically Driven Robotics and Automation Systems
IEEE P7008: Ethically Driven Nudging for Robotic, Intelligent and Autonomous Systems
IEEE P7009: Fail-Safe Design of Autonomous and Semi-Autonomous Systems
IEEE P7010: Wellbeing Metrics Standard for Ethical AI and Autonomous Systems
IEEE P7011: Process of Identifying and Rating the Trustworthiness of News Sources
IEEE P7012: Standard for Machines Readable Personal Privacy Terms
12
13. Reason for industry participation in IEEE P7003
Standard for Algorithmic Bias Considerations
IEEE P7003 working group participant from corporation based in ASEAN AFP
country reported an example why is company is interested in working on IEEE
P7000 series standards:
The company produces image recognition and video behaviour analysis system.
One such system was implemented in an ASEAN country to monitor behaviour
of drivers of public transport to report driver quality.
The system was flagging up predominantly drivers from a specific ethnic group
as problem cases.
They need guidance on how to make sure, and how to communicate, that the
system is not inadvertently basing its decisions on ethinicity.
14. Related AI standards activities
ISO/IEC JTC 1/SC 42 Artificial Intelligence – started spring 2018
◦ SG 1 Computational approaches and characteristics of AI systems
◦ SG 2 Trustworthiness
◦ SG 3 Use cases and applications
◦ WG 1 Foundational standards
Jan 2018 China published “Artificial Intelligence Standardization
White Paper.”
15. ACM Principles on Algorithmic Transparency
and Accountability
Awareness
Access and Redress
Accountability
Explanation
Data Provenance
Auditability: Models, algorithms, data, and decisions
should be recorded so that they can be audited in cases
where harm is suspected
Validation and Testing
15https://www.acm.org/binaries/content/assets/public-policy/2017_usacm_statement_algorithms.pdf
16. FAT/ML: Principles for Accountable Algorithms
and a Social Impact Statement for Algorithms
Responsibility: Externally visible avenues of redress for adverse effects, and designate an
internal role responsible for timely remedy of such issues.
Explainability: Ensure algorithmic decisions and data driving those decisions can be
explained to end-users/stakeholders in non-technical terms.
Accuracy: Identify, log, and articulate sources of error and uncertainty so that
expected/worst case implications can inform mitigation procedures.
Auditability: Enable third parties to probe, understand, and review
algorithm behavior through disclosure of information that enables
monitoring, checking, or criticism, including detailed
documentation, technically suitable APIs, and permissive terms of
use.
Fairness: Ensure algorithmic decisions do not create discriminatory or unjust impacts
when comparing across different demographics.
16https://www.fatml.org/resources/principles-for-accountable-algorithms
18. NY city - Task Force To Examine
Automated Decision Systems
May 16, 2018 Mayor of NY created the Automated Decision Systems Task Force
which will explore how New York City uses algorithms.
The Task Force, the product of a law passed by City Council in December 2017,
aims to produce a report in December 2019 recommending procedures for
reviewing and assessing City algorithmic tools to ensure equity and opportunity.
As part of the task force, AI NOW institute is developing a proposal for
Algorithmic Impact Assessment [ https://ainowinstitute.org/aiareport2018.pdf ]
including a requirement for Meaningful access: Allow researchers and auditors
to review systems once they are deployed.
https://www1.nyc.gov/office-of-the-mayor/news/251-18/mayor-de-blasio-first-in-nation-task-force-examine-
automated-decision-systems-used-by
19. EU initiatives to understand the impact of
AI/algorithmic decision making systems
The European Commission (EC) and European Parliament (EP) are urgently working to
identify the issues
In June 2018 the EC assembled its High-Level Expert Group on Artificial Intelligence to
support the development and implementation of a European strategy on AI – including
challenges of:
◦ Ethics
◦ Law
◦ Societal impact
◦ Socio-economics
The EP has:
- called for reports on policy options for algorithmic transparency and accountability to
be presented in October 2018
- instructed the EC to launch a project on Algorithmic Awareness Building and
development of policy tools for algorithmic decision making.
https://ec.europa.eu/digital-single-market/en/high-level-expert-group-artificial-intelligence
21. Biography
• Dr. Koene is Senior Research Fellow at the Horizon Digital Economy Research
Institute of the University of Nottingham, where he conducts research on
societal impact of Digital Technology.
• Chairs the IEEE P7003TM Standard for Algorithms Bias Considerations working
group., and leads the policy impact activities of the Horizon institute.
• He is co-investigator on the UnBias project to develop regulation-, design- and
education-recommendations for minimizing unintended, unjustified and
inappropriate bias in algorithmic systems.
• Over 15 years of experience researching and publishing on topics ranging from
Robotics, AI and Computational Neuroscience to Human Behaviour studies and
Tech. Policy recommendations.
• He received his M.Eng., and Ph.D. in Electical Engineering & Neuroscience,
respectively, from Delft University of Technology and Utrecht University.
• Trustee of 5Rights, a UK based charity for Enabling Children and Young People
to Access the digital world creatively, knowledgeably and fearlessly.
Dr. Ansgar Koene
ansgar.koene@nottingham.ac.uk
https://www.linkedin.com/in/akoene/
https://www.nottingham.ac.uk/comp
uterscience/people/ansgar.koene
http://unbias.wp.horizon.ac.uk/
22. Institute of Electrical and Electronics
Engineers (IEEE)
IEEE is the world's largest association of technical professionals with more than 423,000 members in
over 160 countries around the world
IEEE Standards Association (IEEE-SA) is responsible for standards such as the wifi 802.11 standard
which is what all wifi devices use so that they can communicate with each other
In addition to developing its own P7000 series Standards, the IEEE Global Initiative on Ethics for
Autonomous and Intelligent Systems is:
An organizational member of the European Commission’s High-Level Expert Group on AI
Has been consulted for input to OECD’s development of principles on AI
Is collaborating with the Association for Computing Machines (ACM) on briefing the US government
and Congress on AI
Provided evidence to the UK’s Select Committee on AI
The P7000 series standards are being referenced by ISO as part of the ISO SC42 AI Standard
development
23.
24. Reproducing human/historical bias
Machine Learning requires large, reliable, ‘ground truth’ data.
Training on historical databases reproducing historical bias.
Bias in datasets can be reduced by targeted sampling, re-weighting and other
methods.
Editor's Notes
All non-trivial decisions are biased. For example, a good results from a search engine should be biased to match the interests of the user as expressed by the search-term, and possibly refined based on personalization data.
When we say we want ‘no Bias’ we mean we want to minimize unintended, unjustified and unacceptable bias, as defined by the context within which the algorithmic system is being used.
In the absence of malicious intent, bias in algorithmic system is generally caused by:
Insufficient understanding of the context that the system is part of. This includes lack of understanding who will be affected by the algorithmic decision outcomes, resulting in a failure to test how the system performs for specific groups, who are often minorities. Diversity in the development team can partially help to address this.
Failure to rigorously map decision criteria. When people think of algorithmic decisions as being more ‘objectively trustworthy’ than human decisions, more often than not they are referring to the idea that algorithmic systems follow a clearly defined set of criteria with no ‘hidden agenda’. The complexity of system development challenges, however, can easily introduce ‘hidden decision criteria’ introduced as a quick fix during debugging or embedded within Machine Learning training data.
Failure to explicitly define and examine the justifications for the decision criteria. Given the context within which the system is used, are these justifications acceptable? For example, in a given context is it OK to treat high correlation as evidence of causation?
- While working with face recognition systems at MIT Joy Boulowimi noticed that the algorithms failed to reliably detect black faces.
- Test subjects and data sets for developing these systems are mostly members of the research lab or university students, the majority of whom are white.
- Simple performance metrics, such as “overall percent correct detection”, that do not consider differences between user groups therefor produced high performance scores despite the systems failing miserably for groups that were a small minority in the test set.
- In Psychology research this common sampling bias is referred to as WEIRD, which stand for White Educated from Industrialized Rich Democratic countries.
The scholar Danielle Keats Citron cites the example of Colorado, where coders placed more than 900 incorrect rules into its public benefits system in the mid-2000s, resulting in problems like pregnant women being denied Medicaid. Similar issues in California, Citron writes in a paper, led to “overpayments, underpayments, and improper terminations of public benefits,” as foster children were incorrectly denied Medicaid. Citron writes about the need for “technological due process” — the importance of both understanding what’s happening in automated systems and being given meaningful ways to challenge them.
- Probably one of the most discussed cases of algorithmic bias is the apparent racial discrimination by the COMPAS recidivism prediction tool, that was reported on by ProPublica in 2016.
- COMPAS is a tool used across the US by judges and parole officers to predict recidivism rates based on responses to a long questionnaire.
- There are many issues that can be discussed regarding the use of this tool. Here we will only look at the central question: does COMPAS produce racially biased predictions? Highlighting the fact that there is no one-true answer to this question.
When evaluating if COMPAS is racially biased there are a number of different error rates that can be considered.
As shown in figure 1, one can compare the Overall Accuracy, or Misclassification Rates for white and black defendants.
Alternatively one could look at the False Negative or False Positive rates, i.e. the rate with which classification errors correspond to predictions of defendants being less or more likely to reoffend.
The developers of COMPAS focused on Accuracy rate, whereas the ProPublica investigation looked at False Positive and False Negative rates.
Figure 2, red box, shows that COMPAS (as well as human judges) has similar Accuracy rates for Black and White defendants, suggesting that it is not biased.
Figure 2, green box, shows that when COMPAS makes errors, black defendants get judged more harshly than they should and white defendants more leniently.
Mathematically is can be shown that when the base rates differ between two groups, white and black offenders, that can simultaneously avoid bias in Accuracy, False negative and False Positive rates.
Institute of Electrical and Electronics Engineers (IEEE) is the world's largest association of technical professionals with more than 423,000 members in over 160 countries around the world
IEEE Standards Association (IEEE-SA) is responsible for standards such as the wifi 802.11 standard which is what all wifi devices use so that they can communicate with each other
FAT/ML (Fairness, Accountability and Transparency in Machine Learning) is a group of leading academic working on Fairness, Accountability and Transparency in Machine Learning with annual conferences highlighting leading work in this area.
Ansgar Koene is a Senior Research Fellow at Horizon Digital Economy Research institute, University of Nottingham and chairs the IEEE P7003 Standard for Algorithm Bias Considerations working group. As part of his work at Horizon Ansgar is the lead researcher in charge of Policy Impact; leads the stakeholder engagement activities of the EPSRC (UK research council) funded UnBias project to develop regulation-, design- and education-recommendations for minimizing unintended, unjustified and inappropriate bias in algorithmic systems; and frequently contributes evidence to UK parliamentary inquiries related to ICT and digital technologies.
Ansgar has a multi-disciplinary research background, having previously worked and published on topics ranging from bio-inspired Robotics, AI and Computational Neuroscience to experimental Human Behaviour/Perception studies.
Each method has its strengths and weaknesses. The following examples provide a brief overview of how these methods work, and how they can go wrong
Machine Learning based systems require large, reliable, ‘ground truth’ data sets to teach them.
Often the easiest solution is to use existing ‘historical’ data.
Want to teach a system who to target for high-paying job ads? Start with a database of people with high paying jobs, and look for defining feature. Due to historical bias the result will favour white males.
Want to use AI to judge a beauty contest? Train it with images of actors and models, and watch it reproduce fashion industry bias for light skin tones.
To avoid reproducing human bias in Machine Learning systems it is necessary to carefully check the training data and possibly adjust sampling rates for various population groups, compensate by adjusting the weight given to errors for different groups, or using other methods.