How can algorithms be biased?

How Do Algorithms
Become Biased?
Eva Sasson
@evasasson
#DataDayMX

What you’ll learn today
1. How to build a predictive model
2. Where in the building process bias
can be introduced
3. What are real - world ramifications

What are all these “buzzwords” ?
● Data science produces insights
● Machine learning produces predictions
● Artificial intelligence produces actions

First off, let’s define bias
1. Sample bias
2. Systematic value distortion
3. Prejudice or stereotype bias

Let’s build our
model!
Predicting Property Prices

Clean &
Prepare
2
Get data
1
Predict
5
Train &
Test
3
Improve
4

Getting Data
◂ Public datasets
◂ APIs
◂ Existing datasets
◂ Surveys
◂ Web scrapers

Getting Data
◂ Public datasets
◂ APIs
◂ Existing datasets
◂ Surveys
◂ Web scrapers: import.io, beautiful
soup, scrapey

Our Model: Predicting Housing Prices

$$$
Is our chosen prediction variable
13

Biggest
concentration of
bias is in the
training data itself!

◂ Variables remaining can proxy race
◂ If race is a useful predictor, then you have
a hole in the data
◂ Indirect discrimination
Removing ‘race’ from the dataset
doesn’t remove the problem

Now we know the
risks of training
data...
What do we do now?

Step 2:
Explore and Clean Data

80%
Of data science is cleaning and preparing data
20

Examples of data cleaning
1. Remove Duplicates
2. Remove Empty columns
3. Remove Not-relevant variables
4. Find averages for empty rows, or mark as 0
5. Remove rows that are blank for the features
most important for you
6. Standardize units

9,500
Rows started with
5,360
Rows of data remaining
4140
Duplicates and blank data
removed
22

Be careful in the
cleaning process!
Some variables can be
tampered with, dropped, and
some cannot.
23

“ Collecting and cleaning data is an
inherently subjective process.
-Fabliha Ibnat, researcher at the University of
Washington

biased, skewed,
incomplete,
human-labelled,
human-cleaned
Training
data
=

Supplement your dataset
with new features that
might help you!

Adding additional variables by zip code
◂ Yelp count of stars
◂ Yelp average of stars
◂ Average household income
◂ Per capita income
◂ High income households (% > $200k/yr)

Yelp data seems pretty
democratic, that can’t
cultivate bias right?

“
As people talk about authenticity
more online, star ratings decrease,
independent of food quality
-Sara Kay, food educator in NYC

Low housing prices with
many ethnic restaurants
High housing prices
with few ethnic restaurants
Housing prices from Yelp stars

What happened with
Amazon Express?

We have our data and
it’s cleaned and
wrangled
Now, we’re ready to build
our model with it

Step 3:
Train your model & test it

80:20 split
Pick your train: test ratio
38

Variable Importance for Random Forest

Variable Importance for XGBoost

13%
Of prediction power based on high household income
43

“ “Algorithms replicate the
status quo”
-Cathy O’Neil. Author, speaker, professor.

Step 4: Improve
Hyperparameter tuning

Experiment with Hyperparameter tuning
◂ Increase or decrease number of trees
◂ 10-fold cross validation
◂ Look at depth
◂ Random seed
◂ Where to split the data

Removing outliers leads to lowest MAPE

“ Algorithms will do more justice
to the people who are easiest
to understand at the expense
of those who aren’t.
-Michael Veale, Phd in Responsible ML at UCL

Step 5: Predict
Put your model into action to
uncover predictive insights

Actual Value vs
Predicted Value

Issues and areas of
bias with prediction

Algorithms make the same
prediction every time.

Human bias treated as
science. Opinion
embedded in math.
Cathy O’Neil.

Disadvantages the
already disadvantaged.

What we can make from all of this
Conclusions,
Takeaways &
Solutions

“ “When we look at bias as
just a technical issue, we
are missing the point”
- Kate Crawford

Awareness
We must check
our training data

Python
Packages
FairML package
Black Box
AI Fairness 360

Transparency
What assumptions were made?
How decisions were made ?
Who may be affected?
What was underlying logic?
Who is most at risk?

Purpose limitation
(Just because it exists,
doesn’t mean it should
be used for new
purposes)

Representation!
Diversity in:
ideas
opinions
perspectives

72
Any Questions?
Thanks!
@evasasson
Eva Sasson

How can algorithms be biased?

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a How can algorithms be biased?

Similar a How can algorithms be biased? (20)

Más de Software Guru

Más de Software Guru (20)

Último

Último (20)

How can algorithms be biased?

Notas del editor