MLSEV Virtual. Image Processing

#MLSEV 2
Image Processing @ BigML
A look into the FUTURE
Charles Parker
VP Algorithms, BigML, Inc

#MLSEV 3
Almost There!
• BigML will soon support
image data!
• We’re not quite there yet
• Follow our blog for updates
on progress
• While you wait, allow this talk
to serve as a preview
• Caution: Nothing here is final!

#MLSEV 4
In This Session
• BigML’s view of image data processing
• Types of image feature extraction
• Some Non-Trivial Examples of Image Processing Problems
• A Couple Of Issues Unique to Image Problems (or not)
• A Couple Of Solutions to Those Issues
• A Little Pet Peeve

#MLSEV 5
What’s Image Processing?
• Convolutional neural networks are powerful, but what are they doing?
• Spoiler: The same thing that all other computer vision feature extraction
methods have been doing for years, only better and with more automation
• Do more traditional vision methods still have a useful function?

#MLSEV 6
Flashback #1
• You should in general try to reduce complex data to features that are either
numeric or categorical (i.e., a variable with a finite set of possible values)
• Aside: A good categorical feature should have no unique values and the set of
possible values should be small (less than 10 is good, less than 100 is maybe okay)
• Text data can be reduced to counts of informative words
• Strings representing a date can be reduced to the parts of the date (month, year,
day of month, day of week)
• Sometimes, you must do this yourself, but in some common cases it can be
automated (BigML does this for text and date time data)

#MLSEV 7
Images Are Not That Special
• What about just using pixels as features?
• Even smallish images give 256 x 256 x 3 = about 200k numeric features
• Moreover, this ignores the fact that these features are related in a very specific way
• So: Feature engineering, just of a really complicated sort
• Even fancy CNNs for image classifications finally come to the same
thing; some computations to transform images into reasonably
sized tabular data
• After that’s done, we’re free to use any BigML algorithm on these
features

#MLSEV 8
Okay, they’re A Little Bit Special
• I’ve glossed over the fact that there are
some problems that are specific to images:
• Object detection
• Image segmentation
• Superresolution
• Our initial release won’t support these
things immediately, but we plan on
supporting some of them in the future
• We’ve done customized work for
customers that includes things of this sort.

#MLSEV 9
Image Feature Extraction

#MLSEV 10
Featurizing Images
• So: We want to reduce our collection of
images to rows for the purposes of
classification / clustering / anomaly detection /
etc.
• Like with any ML algorithm, there’s a tradeoff
between simple and fast or complex and slow
• The choice is mainly about what information in
the image you want to preserve
[1.2, 4.3, 84, -0.03, 4, 8.57, 2.05, . . .]

#MLSEV 11
Method #1: Just a Tiny Image
• Scale image to 4x4
• RGB color gives you 16 x 3 = 48 features
• Preserves some color, some global detail, but discards all local detail

#MLSEV 12
Method #2: Pixel Histogram
• For each color, bin pixels into one of 16 bins based on their color
• Again gives 16 x 3 = 48 Features
• Preserves color information, but discards all spatial information

#MLSEV 13
Method #3: Histogram of Gradients
• Compute the relative intensities of
the pixels above, below, left, and
right of each pixel
• Use this to compute a gradient angle
and intensity for each pixel
• Sum the gradients into a histogram
for each sub image in an n-by-n grid
• Preserves global shape information,
but discards local detail and color

#MLSEV 14
Method #4: Wavelet Decomposition
• Run a high pass filter (edge detector) on the
image, compute the mean and variance of the
filtered image.
• Do this in both the horizontal and vertical
directions
• Reduce resolution by a factor of two and iterate
• Preserves local detail at a coarse level, but
mostly ignores global detail

#MLSEV 15
Method #5: Pretrained CNN
• Strip off the fully-connected layers of an
ImageNet-trained CNN (everything after
the final pooling operation)
• Use these as features directly
• Preserves tons of useful information, but
only the information that was useful for
it’s particular training problem (classifying
natural images)

#MLSEV 16
So What’s The Best Thing?
• This is where you’re really going to have
to think about about what the features
are causing the classifier to “see”
• You also may want to take extraction
speed into account (more later)
• Lots of vision problems are maybe
simpler than you think
• The impact of proper feature
engineering here is enormous

#MLSEV 17
A Toy Example #1: Anomaly Detection
Which of these images is anomalous?
• Consider what features you’ll get from the tiny image extractor
• What about from the color histogram extractor?

#MLSEV 18
A Toy Example #2: Anomaly Detection
Which of these images is anomalous?
• What features are you going to get from the color histogram extractor?
• What about from the gradient histogram extractor?

#MLSEV 19
So What’s The Best Thing?
• Using the right features makes the problem
completely trivial
• Using the wrong features makes the
problem impossible
• CNNs have gained traction in part because
they do this difficult step for you (in a way)
• But if you can manage it, there are a lot of
problems you can solve with less data and
computing power

#MLSEV 20
But Wait, What about CNNs
• Don’t worry, BigML will also offering fully
trainable CNNs
• If you train a deepnet on image data, it
will try a few common CNN
architectures and initializations and pick
one that works well for your problem
• Be aware, though, CNNs typically take
lots more data to train than if you just
pick really appropriate features

#MLSEV 21
A Couple of Applications

#MLSEV 22
Interesting Use Cases Are Out There!
• Right now, most SaaS ML providers are
focusing hard on the basic functionality of
image->label using a CNN
• This is good and fine, but we argue that
many real world problems are more
complex than this because:
• You have more data than an image per instance
• You don’t have the compute power to throw a CNN
at everything you see

#MLSEV 23
Applications #1: Insurance Claim Estimate
$
Car image
Features
• Suppose you’re an auto insurance company
• Develop software where you have customer take a picture of
the damage to estimate the severity of the claim
• Simple regression problem, image -> $
Regression model

#MLSEV 24
$
• Now, what if you have them take both a front and side image?
• What if you have telemetry data from the car?
Front image
Features
Side image
Features
Break pedal
State Steering wheel
Angle

#MLSEV 25
• How about the text of the police report?
• User’s mobile phone activity in the moments before the crash?
$
Word counts Accelerometer
Screen on?
Front image
Features
Side image
Features
Break pedal
State Steering wheel
Angle

#MLSEV 26
Applications #2: Radarless Radar Gun
• You’re interested in the average speed of the cars outside your house
• Point a camera at the window for a while and train anomaly detectors on the tiles of the image
• When a big enough group of tiles goes off, you’ve probably got a car
• Track the anomalies across the image, and you can estimate the speed!
https://blog.bigml.com/2017/08/16/a-stupidly-easy-speed-detector/

#MLSEV 27
Applications #2: Radarless Radar Gun
• The tiles get reduced to a 1x1 image in the detector, which makes it
incredibly fast
• But what if you wanted to reduce false positives by checking if the
clump was actually a car
• Solution: On the subset of instances where the anomaly detector
fires, crop and label a few dozen images to train a car classifier
• You’ll need different features!
• But also, you won’t need CNN-style object detection, which requires
pretty serious horsepower to get up to video speeds

#MLSEV 28
Problem #1: Speed
• Consider: You’d like to deploy a full CNN
because performance is very good, but the
platform where you’re deploying lacks the
compute power
• The thing you’re trying to detect is very rare
generally (say, security footage), but you
need to make many predictions per second
• In vision tasks you often need this level of
speed to keep up with video capture

#MLSEV 29
Solution #1: Model Cascade
• Use an increasingly accurate and increasingly
expensive chain of models as “gates”
• Use a model that’s fast, has near-perfect recall, but
poor precision
• If something passes the first model, pass it on to a
fancier one with better precision
• Repeat as necessary
• In this way, only a few instances require
prediction on the most time-consuming
model
• This isn’t just for vision!
Simple Model
More Complex
Most Complex
Reject
Reject
Reject
Pass
Pass
Accept

#MLSEV 30
Problem #2: Lack of Data
• Consider: You have a problem you think
image classifiers should be able to solve
• You’re worried about the sort of data
variability that would not be a problem for
humans (image noise, blur, lighting
conditions, etc.)
• You want to make your classifier robust
to these situations

#MLSEV 31
Solution #2: Data Augmentation
• Images are nice because they give you an opportunity to create images that
are homologous to the ones you have, basically just using an image editor
• The idea is that you can do a lot of things to a picture of a cat (rotate it, blur
it, change the brightness, add some noise, crop a little bit out) and it’s still
obviously (to a human) a cat
• You can see how this will lead to your data multiplying very quickly. It’s not
as good as novel training data, but it helps.
• When you start thinking this way, you might be surprised that there are other
opportunities for this

#MLSEV 32
Adversarial Attacks
• Every once in a while, you’ll see a story in the news about an adversarial attack on a CNN
• “If you add some carefully chosen noise, the system thinks its something else!”
• This is always a potential problem, even in non-CNN models; even in non-ML models
• If you think people are going to be out there trying to fool the system, you need to secure
yourself against that possibility
• ML itself may be able to play a role

#MLSEV 33
Summary
• Image feature extraction comes in a lot of flavors,
but even the most elaborate CNNs are still
basically feature extractors
• Choosing the right features for image
representation is, if possible, even more crucial
than usual
• Allowing different image features and models to be
composed gives you the flexibility to attack
different problems in different ways

MLSEV Virtual. Image Processing

MLSEV Virtual. Image Processing

Recomendados

Recomendados

Más contenido relacionado

Más de BigML, Inc

Más de BigML, Inc (20)

Último

Último (20)

MLSEV Virtual. Image Processing