Product Management Event at #ProductCon NY on how to create AI models for fun and for profit by Jason Nichols, Director of Artificial Intelligence at Walmart Intelligent Research Lab.
8. Different
Models
Answer
Different
Kinds of
Questions
Regressors answer "How many?"
Classifiers answer "What's that?"
Detectors answer "Where are these in this?"
Dimensionality Reducers answer "What makes these
things different?"
Clustering Algorithms answer "How can I make
groups of these?"
9. Step 1: What Kind of Question?
Define your problem as
one of the five questions
One question per model
is best
Business logic and
heuristics are just simple
models
Chains of models
become more brittle the
longer they get
10. Different
Models have
Different Ways
of Learning
• Some are supervised and need
labelled data
• Some are unsupervised and determine
their own labels
• Some take feedback from their
environment and learn via
reinforcement
• Models can also have online and/or
offline learning modes
• Siamese Networks use both –
Training is offline, but enrollment is
online.
11. Step 2: How Will
Your Machine
Learn?
Can humans provide feedback?
Do you have downtime to train?
How will you build CI/CD?
Can you make use of a Knowledge Base?
How will you know if the model is
meeting the business needs?
12. Step 3: Which
model to
use?
Given the question and model
type from Step 1
And the Training Method(s)
from Step 2
How will you implement this
service?
13. AI Lifecycle
Model Zoo Inference Annotation Aggregation
Training &
Optimization
▪Feature selection
▪Model architecture
▪Initial Training & Transfer Learning
▪Model Persistence
▪Evaluation & CI/CD
▪Inference
▪Logging & Sampling
▪Annotation
▪Cross Validation
▪Source Rating
▪Aggregation
▪Normalization & Sanitization
▪Training
▪Knowledge_Base Update
Your System Checklist
14. Ground Truth
Annotators and Algorithms are
just different Agents
Never unduly privilege one over
the other
Humans are lazy, sloppy, and
imprecise
And worse, they build
algorithms
X=Θ+W
• X is an observation (known)
• Θ is ground truth (unknown)
• W is a noise term (unknown)
15. Sources of Error in Annotation
FATIGUE COORDINATIO
N
MISALIGNED
INCENTIVES
UNCONSCIOUS
BIAS
16. Areas of Research
HUMAN COMPUTER
INTERACTION
CULTURAL
ANTHROPOLOGY
KINESIOLOGY PSYCHOLOGY GAME THEORY MACHINE LEARNING
Also: Computer Vision, Biology, Neurology, Anatomy, Optometry, Linguistics, Marketing, … etc
17. How Product Can Drive AI
• Understand that the costs and benefits associated
with the confusion matrix define business value
• Communicate probabilities and confidences to
stakeholders so they can make informed decisions
• Hypothesis Driven, Experimentally Validated
• Think in terms of ROI both for improving features and
confidence
18. Key Terms
Accuracy:
When you ask the model for an inference, what
percentage of the time is it right?
Precision:
When the model says "X", how often is that correct?
Recall:
When "X" occurs how often does the model catch it?
Incidence Rate:
How often does "X" occur?
Error Rate:
How often is the model wrong? (Opposite of
Accuracy)
19. Error Rate v. Incidence Rate
• Given a 10% incidence rate, and a 20% error rate, the maximum precision is 33%.
• The only way to increase precision above that threshold is to increase accuracy.
20. Confusion Matrix
• How often do we mistake "A" for
"B"
• Each mistake can have a very
different business cost
• Applies to all models (except
regression... sort of)
• Probably most important Product
tool
21. Good Statement/Bad Statement
"We believe the model's precision is
between 80 and 90% with 95%
confidence based on a production
experiment with 500 examples."
• Measured in production
• Real metric
• Communicates uncertainty
• High and transparent sample size
• Replicable
"The model's accuracy is 99%"
• No, no it isn't
• What was the sample size?
• How does it generalize?
• How did you measure that?
22. Good Requirement/Bad Requirement
"Just make it right"
• I can make anything right with enough
data and compute
• Need to understand tradeoffs,
constraints, and overall mission
"The business needs the system to have a
minimum precision of 0.95 and recall of 0.90,
a maximum latency of 5s and process 10
streams per GPU."
• Tells me false positives are worse for
the business than false negatives
• Tells me about the compute available
• A real definition of done
• Can still be refined
by adding information on data and
mission