Classification of Satellite Images

Classification of Satellite Images
Python based Transfer Learning approach
Johannes Oos
oosjoh@gmail.com

Project Overview - Purpose
“S. […] had a farm north
of the railway about 160
miles from mine. In the
middle of it was a great
tooth of granite, which
soared up for about 200
feet […]”
(Snow on the Equator,
Tilman, 1937)

Single pics via request
PIL à BW à MongoDB
: “0”
: “1”
Manually Label
and Store
Retraining Script
Customized
Classifier
Collect, Classify
and Store Result
: “0.91”
Pre-trained Classifier
Project Overview Data Flow

Technology Overview
1. Collection – gmaps api
2. Storing – MongoDB
3. Labelling – Flask
4. Classifier – Tensorflow

: “0”
: “1”
Manually Label
and Store
Retraining Script
Customized
Classifier
Collect, Classify
and Store Result
: “0.91”
Getting the data

Getting the data - source
No available data on this topic
è Creation of a dataset
Gmaps API
è Very good documentation
è Picture size chosen 320x320 pixel
è Zoom about 1 meter/pixel
PRO:
è covers the entire planet
è flexible zoom and picture sizes,
è high resolution
CON:
è free contingency is only enough for 50x50 km per day

Climbing Areas:
• Kenyan ones give too few
samples
• Filling up with other (American)
Non Climbing Areas:
• Make Sure there is no Climbing
area by coincidence
• Selected mostly from that region
Problem: Positive samples are rare but appear in areas
suspected to be completely negative è careful labelling
Getting the data - subsetting

Storing geodata in MongoDB generally straight forward:
• Very felxible
• Geo Index for quick retrieval
• Geo queries for clear separation with multiple agents
Storing pics directly in MongoDB:
Advantages
• easy to scale (Sharding)
• Easy to back up (Replica)
• File and Other Information in same place (less complex)
• Disadvantages
• Saving pictures as binary
• Performance loss compared to file system (?)
Getting the data - storing

Saving Pics as
binary: file à PIL
à Binary
(and back to .jpg
for classifier)
Getting the data - storing

Getting the data – tile system

: “0”
: “1”
Manually Label
and Store
Retraining Script
Customized
Classifier
Collect, Classify
and Store Result
: “0.91”
Labelling

1. The Unlabelled Data in MongoDB Collection:
• {"loc": (80.0, 70.1), "pic": the binary-PIL-file, "labelled": False}
• MongoDB Document is very flexible and can be used without to
much preparation
2. Show 5 pictures in a go in a flask View with the labelling
option and navigate with tab and enter to select the label
3. Store the pictures in a different collection
Labelling

Flask to show and label the pictures

MongoDB Analytics
statsPreOut[7]: [8620, (8605, 0, 0), 15]

Labelling – Sample “CLIMB”

1. 8605 entries in MongoDB Collection
2. Pictures of Size 320 x 320
3. Single Channel (BW: 0…255)
4. 809 labelled “1” (CLIMB) and 7796 labelled “0” (NO CLIMB)
5. {u'_id': ObjectId('5a901c579f0fea0418f3006e'), u'label': u'0',
u'labelled': True, u'length': 298.21614707161154, u'loc':
[38.7126511, -109.33571780038419], u'pic':
u'iVBORw0KGgoAAAANSUhEUgAAAUAAAAFACAQAAABnmW
0hAAEAAElEQVR4nGT92Y9kZ5om+P3c1mO7uZn5GptHkAxum
cwsVmVlVXV110z1TPd0YyA0IAjQhQboW0kX+k8ECAPNjQBpB
N0MMBcaYDBST3ejqnqr7qwlF2aSDAbJC …….. }
Labelling – Result

: “0”
: “1”
Manually Label
and Store
Retraining Script
Customized
Classifier
Collect, Classify
and Store Result
: “0.91”
Choice of Classifier

Built your Own
• Capsule network
• Customized CNN architecture
• Feature engineering
Pre-trained Object Detection:
Github - detection_model_zoo
Pre-trained Classifier:
• Inception v4
• Inception v3
• Mobile Nets
• VGG16
• AlexNet

AN ANALYSIS OF DEEP NEURAL NETWORK MODELS FOR PRACTICAL APPLICATIONS, Canziani et al 2017

Classifier Architecture – mixed_1

Classifier Architecture – tower_1

Classifier Architecture
Rethinking the Inception Architecture for Computer Vision, Szegedy et al 2015

: “0”
: “1”
Manually Label
and Store
Retraining Script
Customized
Classifier
Collect, Classify
and Store Result
: “0.91”
Retraining

Retraining
1. General Idea
2. How To
3. Input format
4. Size of graph
5. Duration of training
6. Evaluation

Retraining - general idea
Original: Output = f(g(h(Input)))
Retrained: Output = f*(g(h(Input*)))

Original:
Retrained:
https://commons.wikimedia.org/wiki/File:Typical_cnn.png
Climb
No CLimb

adjusted
input
constant
processing
adjusted
output

[…] That means it [the penultimate layer] has to be a meaningful and
compact summary of the images, since it has to contain enough information
for the classifier to make a good choice in a very small set of values. The
reason our final layer retraining can work on new classes is that it turns out
the kind of information needed to distinguish between all the 1,000 classes
in ImageNet is often also useful to distinguish between new kinds of objects.
(www.tensorflow.org/)
è Classification of top view landscape images is
actually out of scope -- but works reasonably well

https://github.com/googlecodelabs/tensorflow-for-poets-2
python scripts/retrain.py
--output_graph=tf_files/new_graph.pb
--output_labels=tf_files/climb_labels.txt
--image_dir=tf_files/climb
--how_many_training_steps=5000
Retraining – HowTo

Retraining - Duration
INFO:tensorflow:2018-05-06 21:08:48.270479: Step 0: Train accuracy = 92.0%
INFO:tensorflow:2018-05-06 21:08:48.273313: Step 0: Cross entropy = 0.601665
INFO:tensorflow:2018-05-06 21:08:51.722497: Step 0: Validation accuracy = 93.0% (N=100)
.
INFO:tensorflow:Final test accuracy = 93.6% (N=776)
INFO:tensorflow:Froze 2 variables.
Converted 2 variables to const ops.

https://github.com/googlecodelabs/tensorflow-for-poets-2
python scripts/retrain.py
--output_graph=tf_files/new_graph.pb
--output_labels=tf_files/climb_labels.txt
--image_dir=tf_files/climb
--how_many_training_steps=5000
--summaries_dir=tf_files/training_summaries/
Retraining - Log

: “0”
: “1”
Manually Label
and Store
Retraining Script
Customized
Classifier
Collect, Classify
and Store Result
: “0.91”
Classifier Evaluation

Evaluation - Tensorboard files

(tFInc) bash-3.2$ cd training_summaries
(tFInc) bash-3.2$ ls
train validation
(tFInc) bash-3.2$ tensorboard --logdir=train
== > TensorBoard 1.7.0 at http://Johannes-MacBook-
Air.local:6006 (Press CTRL+C to quit)
Evaluation - Tensorboard

Evaluation – Scalars / Histogramms

: “0”
: “1”
Manually Label
and Store
Retraining Script
Customized
Classifier
Collect, Classify
and Store Result
: “0.91”
Usage of Classifier

Timings at the start:
Collecting from gmaps: 0.172 sec
Label : 2.698 sec
Store label to MDB: 0.005 sec
Total: 2.875 sec
Timings after 10,000 Pictures:
Collecting from gmaps: 0.207sec
Label : 7.052 sec
Store label to MDB: 0.005 sec
Total: 7.265 sec
Usage of Classifier

Search Process
For picture in area:
Get picture from gmaps
Label with classifier
Store result in Collection
Calculate next center
320 x 320 Pix @ ca. 1 Pix/meter
== > 12 Images per sqkm
25,000 images/day == > 20 km x 100 km /day
searchable

Places Found
• In[2]: getRes()
• Out[2]: ([18504, 7174, 4306, 1093, 572, 119, 59, 17], 75463)

Places Found
Flask output with links to the
highest rated locations

Improvement
1. Relabelling
2. More flexible Collection (Zoom, Stride, Datasource)
3. Result Visualization on a map
4. Bigger Search Area

Classification of Satellite Images

Recomendados

Recomendados

Más contenido relacionado

Similar a Classification of Satellite Images

Similar a Classification of Satellite Images (10)

Último

Último (20)

Classification of Satellite Images