3. #MLSEV 3
Almost There!
• BigML will soon support
image data!
• We’re not quite there yet
• Follow our blog for updates
on progress
• While you wait, allow this talk
to serve as a preview
• Caution: Nothing here is final!
4. #MLSEV 4
In This Session
• BigML’s view of image data processing
• Types of image feature extraction
• Some Non-Trivial Examples of Image Processing Problems
• A Couple Of Issues Unique to Image Problems (or not)
• A Couple Of Solutions to Those Issues
• A Little Pet Peeve
5. #MLSEV 5
What’s Image Processing?
• Convolutional neural networks are powerful, but what are they doing?
• Spoiler: The same thing that all other computer vision feature extraction
methods have been doing for years, only better and with more automation
• Do more traditional vision methods still have a useful function?
6. #MLSEV 6
Flashback #1
• You should in general try to reduce complex data to features that are either
numeric or categorical (i.e., a variable with a finite set of possible values)
• Aside: A good categorical feature should have no unique values and the set of
possible values should be small (less than 10 is good, less than 100 is maybe okay)
• Text data can be reduced to counts of informative words
• Strings representing a date can be reduced to the parts of the date (month, year,
day of month, day of week)
• Sometimes, you must do this yourself, but in some common cases it can be
automated (BigML does this for text and date time data)
7. #MLSEV 7
Images Are Not That Special
• What about just using pixels as features?
• Even smallish images give 256 x 256 x 3 = about 200k numeric features
• Moreover, this ignores the fact that these features are related in a very specific way
• So: Feature engineering, just of a really complicated sort
• Even fancy CNNs for image classifications finally come to the same
thing; some computations to transform images into reasonably
sized tabular data
• After that’s done, we’re free to use any BigML algorithm on these
features
8. #MLSEV 8
Okay, they’re A Little Bit Special
• I’ve glossed over the fact that there are
some problems that are specific to images:
• Object detection
• Image segmentation
• Superresolution
• Our initial release won’t support these
things immediately, but we plan on
supporting some of them in the future
• We’ve done customized work for
customers that includes things of this sort.
10. #MLSEV 10
Featurizing Images
• So: We want to reduce our collection of
images to rows for the purposes of
classification / clustering / anomaly detection /
etc.
• Like with any ML algorithm, there’s a tradeoff
between simple and fast or complex and slow
• The choice is mainly about what information in
the image you want to preserve
[1.2, 4.3, 84, -0.03, 4, 8.57, 2.05, . . .]
11. #MLSEV 11
Method #1: Just a Tiny Image
• Scale image to 4x4
• RGB color gives you 16 x 3 = 48 features
• Preserves some color, some global detail, but discards all local detail
12. #MLSEV 12
Method #2: Pixel Histogram
• For each color, bin pixels into one of 16 bins based on their color
• Again gives 16 x 3 = 48 Features
• Preserves color information, but discards all spatial information
13. #MLSEV 13
Method #3: Histogram of Gradients
• Compute the relative intensities of
the pixels above, below, left, and
right of each pixel
• Use this to compute a gradient angle
and intensity for each pixel
• Sum the gradients into a histogram
for each sub image in an n-by-n grid
• Preserves global shape information,
but discards local detail and color
14. #MLSEV 14
Method #4: Wavelet Decomposition
• Run a high pass filter (edge detector) on the
image, compute the mean and variance of the
filtered image.
• Do this in both the horizontal and vertical
directions
• Reduce resolution by a factor of two and iterate
• Preserves local detail at a coarse level, but
mostly ignores global detail
15. #MLSEV 15
Method #5: Pretrained CNN
• Strip off the fully-connected layers of an
ImageNet-trained CNN (everything after
the final pooling operation)
• Use these as features directly
• Preserves tons of useful information, but
only the information that was useful for
it’s particular training problem (classifying
natural images)
16. #MLSEV 16
So What’s The Best Thing?
• This is where you’re really going to have
to think about about what the features
are causing the classifier to “see”
• You also may want to take extraction
speed into account (more later)
• Lots of vision problems are maybe
simpler than you think
• The impact of proper feature
engineering here is enormous
17. #MLSEV 17
A Toy Example #1: Anomaly Detection
Which of these images is anomalous?
• Consider what features you’ll get from the tiny image extractor
• What about from the color histogram extractor?
18. #MLSEV 18
A Toy Example #2: Anomaly Detection
Which of these images is anomalous?
• What features are you going to get from the color histogram extractor?
• What about from the gradient histogram extractor?
19. #MLSEV 19
So What’s The Best Thing?
• Using the right features makes the problem
completely trivial
• Using the wrong features makes the
problem impossible
• CNNs have gained traction in part because
they do this difficult step for you (in a way)
• But if you can manage it, there are a lot of
problems you can solve with less data and
computing power
20. #MLSEV 20
But Wait, What about CNNs
• Don’t worry, BigML will also offering fully
trainable CNNs
• If you train a deepnet on image data, it
will try a few common CNN
architectures and initializations and pick
one that works well for your problem
• Be aware, though, CNNs typically take
lots more data to train than if you just
pick really appropriate features
22. #MLSEV 22
Interesting Use Cases Are Out There!
• Right now, most SaaS ML providers are
focusing hard on the basic functionality of
image->label using a CNN
• This is good and fine, but we argue that
many real world problems are more
complex than this because:
• You have more data than an image per instance
• You don’t have the compute power to throw a CNN
at everything you see
23. #MLSEV 23
Applications #1: Insurance Claim Estimate
$
Car image
Features
• Suppose you’re an auto insurance company
• Develop software where you have customer take a picture of
the damage to estimate the severity of the claim
• Simple regression problem, image -> $
Regression model
24. #MLSEV 24
Applications #1: Insurance Claim Estimate
$
• Now, what if you have them take both a front and side image?
• What if you have telemetry data from the car?
Front image
Features
Side image
Features
Break pedal
State Steering wheel
Angle
25. #MLSEV 25
Applications #1: Insurance Claim Estimate
• How about the text of the police report?
• User’s mobile phone activity in the moments before the crash?
$
Word counts Accelerometer
Screen on?
Front image
Features
Side image
Features
Break pedal
State Steering wheel
Angle
26. #MLSEV 26
Applications #2: Radarless Radar Gun
• You’re interested in the average speed of the cars outside your house
• Point a camera at the window for a while and train anomaly detectors on the tiles of the image
• When a big enough group of tiles goes off, you’ve probably got a car
• Track the anomalies across the image, and you can estimate the speed!
https://blog.bigml.com/2017/08/16/a-stupidly-easy-speed-detector/
27. #MLSEV 27
Applications #2: Radarless Radar Gun
• The tiles get reduced to a 1x1 image in the detector, which makes it
incredibly fast
• But what if you wanted to reduce false positives by checking if the
clump was actually a car
• Solution: On the subset of instances where the anomaly detector
fires, crop and label a few dozen images to train a car classifier
• You’ll need different features!
• But also, you won’t need CNN-style object detection, which requires
pretty serious horsepower to get up to video speeds
28. #MLSEV 28
Problem #1: Speed
• Consider: You’d like to deploy a full CNN
because performance is very good, but the
platform where you’re deploying lacks the
compute power
• The thing you’re trying to detect is very rare
generally (say, security footage), but you
need to make many predictions per second
• In vision tasks you often need this level of
speed to keep up with video capture
29. #MLSEV 29
Solution #1: Model Cascade
• Use an increasingly accurate and increasingly
expensive chain of models as “gates”
• Use a model that’s fast, has near-perfect recall, but
poor precision
• If something passes the first model, pass it on to a
fancier one with better precision
• Repeat as necessary
• In this way, only a few instances require
prediction on the most time-consuming
model
• This isn’t just for vision!
Simple Model
More Complex
Most Complex
Reject
Reject
Reject
Pass
Pass
Accept
30. #MLSEV 30
Problem #2: Lack of Data
• Consider: You have a problem you think
image classifiers should be able to solve
• You’re worried about the sort of data
variability that would not be a problem for
humans (image noise, blur, lighting
conditions, etc.)
• You want to make your classifier robust
to these situations
31. #MLSEV 31
Solution #2: Data Augmentation
• Images are nice because they give you an opportunity to create images that
are homologous to the ones you have, basically just using an image editor
• The idea is that you can do a lot of things to a picture of a cat (rotate it, blur
it, change the brightness, add some noise, crop a little bit out) and it’s still
obviously (to a human) a cat
• You can see how this will lead to your data multiplying very quickly. It’s not
as good as novel training data, but it helps.
• When you start thinking this way, you might be surprised that there are other
opportunities for this
32. #MLSEV 32
Adversarial Attacks
• Every once in a while, you’ll see a story in the news about an adversarial attack on a CNN
• “If you add some carefully chosen noise, the system thinks its something else!”
• This is always a potential problem, even in non-CNN models; even in non-ML models
• If you think people are going to be out there trying to fool the system, you need to secure
yourself against that possibility
• ML itself may be able to play a role
33. #MLSEV 33
Summary
• Image feature extraction comes in a lot of flavors,
but even the most elaborate CNNs are still
basically feature extractors
• Choosing the right features for image
representation is, if possible, even more crucial
than usual
• Allowing different image features and models to be
composed gives you the flexibility to attack
different problems in different ways