This document contains a 12-point checklist for testing AI and bias. It asks questions about understanding biases in data, how data is split for training and testing, biases in human labels or training data, biases in feature selection and normalization, how success is measured and how it relates to end users, model stability over time, biases from data acquisition and labeling, selecting unbiased algorithms, testing for minority users, ensuring experiments don't reflect team biases, and checking models for overall goodness without success bias. The checklist aims to identify potential sources of bias at different stages of AI development and evaluation.
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Testing AI and Bias Questionnaire Checklist
1. Testing AI and Bias Questionnaire Checklist
1. Is there bias in your AI-based software product? Free Answer: Yes!
2. Do you understand the bias in your data?
a. Do you know what data you don’t have that might be different?
b. Do you know the biases in the data that you do have?
c. Why do you think you have ‘enough’ data to represent all interesting input?
3. Do you know how you are splitting your data into training, verification, and testing Sets?
a. Do you know why you sampled how you did?
b. If a simple ‘randI()’, can you justify why that sample doesn’t include bias?
4. Do you understand the bias in your human-labels or training data?
a. Why are you selecting those humans with specific demographics/context?
b. Do those demographics represent the users of your product?
c. What data are you missing? Why is it ok to exclude it?
5. Feature/Training Bias?
a. Is there bias in the features you picked to train the AI?
b. Are there features you might be missing that could eliminate bias in the training
features?
c. Is there bias in the normalization of data used to train the AI?
d. Is the computation of the features correct, and not introducing accidental bias?
6. How do you measure Relevance/Success?
a. Is the measurement biased somehow?
b. Does the measurement of success correlate to end-user success?
c. Is the measurement of relevance/success including the end user experience of
application of the AI (versus just the AI measures of success such as F1 scores)?
7. Is the model reasonably stable over time (or other dimensions)?
a. Why do you think the AI will be accurate in the future, especially despite probably
changes in the input data/world?
b. How do you measure this?
c. For your application, do you know what measures of stability should be checked
and how you ensure in the training/test data that the AI is stable?
8. Do you understand the bias from the acquisition of the training data set?
2. a. Do the data labelers represent the users of the AI-based system?
b. How do you ensure that the labelers represent the users, or interpreters of the
output, of the AI-based systems?
9. Hyper Parameters
a. Why are the AI algorithms you selected appropriately for your solution domain?
b. What bias might there be due solely to algorithms chosen for solution?
10. Testing for outliers
a. What is the testing plan to see how AI impacts not just the average user, but the
minority users?
b. How do you ensure the AI isn’t ‘bad’ for minority users, even though it is great for
the average users?
11. Initial weights
a. How do you ensure the experiments you create don’t represent biases from your
team? E.g. a recommendation system for women’s products probably shouldn’t
be trained by men.
12. Ensuring against success-bias
a. How do you ensure that measures of goodness of AI system end to end can
detect or deploy models that are ‘better’ without checks for the metrics of
goodness?