Testing AI Bias Against Organizational Values

Not Fair! Testing AI Bias and
Organizational Values
Peter Varhol and Gerie Owen

About me
• International speaker and writer
• Graduate degrees in Math, CS, Psychology
• Technology communicator
• AWS certified
• Former university professor, tech journalist
• Cat owner and distance runner
• peter@petervarhol.com

Gerie Owen
3
• Quality Engineering Architect
• Testing Strategist & Evangelist
• Test Manager
• Subject expert on testing for
TechTarget’s
SearchSoftwareQuality.com
• International and Domestic
Conference Presenter
Gerie.owen@gerieowen.com

What You Will Learn
• Why bias is often an outcome of machine learning results.
• How bias that reflects organizational values can be a desirable result.
• How to test bias against organizational values.

Agenda
• What is bias in AI?
• How does it happen?
• Is bias ever good?
• Building in bias intentionally
• Bias in data
• Summary

Bug vs. Bias
• A bug is an identifiable and measurable error in process or result
• Usually fixed with a code change
• A bias is a systematic inflection in decisions that produces results
inconsistent with reality
• Bias can’t be fixed with a code change

How Does This Happen?
• The problem domain is ambiguous
• There is no single “right” answer
• “Close enough” can usually work
• As long as we can quantify “close enough”
• We don’t know quite why the software
responds as it does
• We can’t easily trace code paths
• We choose the data
• The software “learns” from past actions

How Can We Tell If It’s Biased?
• We look very carefully at the training data
• We set strict success criteria based on the system requirements
• We run many tests
• Most change parameters only slightly
• Some use radical inputs
• Compare results to success criteria

Amazon Can’t Rid Its AI of Bias
• Amazon created an AI to crawl the web to find job candidates
• Training data was all resumes submitted for the last ten years
• In IT, the overwhelming majority were male
• The AI “learned” that males were superior for IT jobs
• Amazon couldn’t fix that training bias

Many Systems Use Objective Data
• Electric wind sensor
• Determines wind speed and direction
• Based on the cooling of filaments
• Designed a three-layer neural network
• Then used the known data to train it
• Cooling in degrees of all four filaments
• Wind speed, direction

Can This Possibly Be Biased?
• Well, yes
• The training data could have been recorded in single
temperature/sunlight/humidity conditions
• Which could affect results under those conditions
• It’s a possible bias that doesn’t hurt anyone
• Or does it?
• Does anyone remember a certain O-ring?

Where Do Biases Come From?
• Data selection
• We choose training data that represents only one segment of the domain
• We limit our training data to certain times or seasons
• We overrepresent one population
• Or
• The problem domain has subtly changed

• Latent bias
• Concepts become incorrectly correlated
• Correlation does not mean causation
• But it is high enough to believe
• We could be promoting stereotypes
• This describes Amazon’s problem

• Interaction bias
• We may focus on keywords that users apply incorrectly
• User incorporates slang or unusual words
• “That’s bad, man”
• The story of Microsoft Tay
• It wasn’t bad, it was trained that way

Why Does Bias Matter?
• Wrong answers
• Often with no recourse
• Subtle discrimination (legal or illegal)
• And no one knows it
• Suboptimal results
• We’re not getting it right often enough

It’s Not Just AI
• All software has biases
• It’s written by people
• People make decisions on how to design and implement
• Bias is inevitable
• But can we find it and correct it?
• Do we have to?

Like This One
• A London doctor can’t get into her fitness center locker room
• The fitness center uses a “smart card” to access and record services
• While acknowledging the problem
• The fitness center couldn’t fix it
• But the software development team could
• They had hard-coded “doctor” to be synonymous
with “male”
• It was meant as a convenient shortcut

About That Data
• We use data from the problem domain
• What’s that?
• In some cases, scientific measurements are accurate
• But we can choose the wrong measures
• Or not fully represent the problem domain
• But data can also be subjective
• We train with photos of one race over another
• We train with our own values of beauty

Is Bias Always Bad?
• Bias can result in suboptimal answers
• Answers that reflect the bias rather than rational thought
• But is that always a problem?
• It depends on how we measure our answers
• We may not want the most profitable answer
• Instead we want to reflect organizational values
• What are those values?

Examples of Organizational Values
• Committed with goals to equal hiring, pay, and promotion
• Will not exclude credit based on location, race, or other irrelevant
factor
• Will keep the environment cleaner than we left it
• Net carbon neutral
• No pollutants into atmosphere
• We will delight our customers

Examples of Organizational Values
• These values don’t maximize profit at the expense of everything
• They represent what we might stand for
• They are extremely difficult to train AI for
• Values tend to be nebulous
• Organizations don’t always practice them
• We don’t know how to measure them
• So we don’t know what data to use
• Are we achieving the desired results?
• How can we test this?

How Do We Design Systems With
These Goals in Mind?
• We need data
• But we don’t directly measure the goal
• Is there proxy data?
• Training the system
• Data must reflect goals
• That means we must know or suspect the data
is measuring the bias we want

Examples of Useful Data
• Customer satisfaction
• Survey data
• Complaints/resolution times
• Maintain a clean environment
• Emissions from operations/employee commute
• Recycling volume
• Equal opportunity
• Salary comparisons, hiring statistics

Sample Scenario
• “We delight our customers”
• AI apps make decisions on customer complaints
• Goal is to satisfy as many as possible
• Make it right if possible
• Train with
• Customer satisfaction survey results
• Objective assessment of customer interaction results

Testing the Bias
• Define hypotheses
• Map vague to operational definitions
• Establish test scenarios
• Specify the exact results expected
• With means and standard deviations
• Test using training data
• Measure the results in terms of definitions

Testing the Bias
• Compare test results to the data
• That data measures your organizational values
• Is there a consistent match?
• A consistent match means that the AI is accurately reflecting organizational
values
• Does it meet the goals set forth at the beginning of the project?
• Are ML recommendations reflecting values?
• If not, it’s time to go back to the drawing board
• Better operational definitions
• New data

Finally
• Test using real life data
• Put the application into production
• Confirm results in practice
• At first, side by side with human decision-makers
• Validate the recommendations with people
• Compare recommendations with results
• Yes/no – does the software reflect values

Back to Bias
• Bias isn’t necessarily bad in ML/AI
• But we need to understand it
• And make sure it reflects our goals
• Testers need to understand organizational values
• And how they represent bias
• And how to incorporate that bias into ML/AI apps

Summary
• Machine learning/AI apps can be designed to reflect organizational
values
• That may not result in the best decision from a strict business standpoint
• Know your organizational values
• And be committed to maintaining them
• Test to the data that represents the values
• As well as the written values themselves
• Draw conclusions about the decisions being made

Thank You
• Peter Varhol
peter@petervarhol.com
• Gerie Owen
gerie@gerieowen.com

Testing AI Bias Against Organizational Values

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Testing AI Bias Against Organizational Values

Similar a Testing AI Bias Against Organizational Values (20)

Más de Peter Varhol

Más de Peter Varhol (12)

Último

Último (20)

Testing AI Bias Against Organizational Values