SlideShare una empresa de Scribd logo
1 de 123
Descargar para leer sin conexión
Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not
use or distribute these slides for commercial purposes. You may make copies of these
slides and use or distribute them for educational purposes as long as you
cite DeepLearning.AI as the source of the slides.
For the rest of the details of the license, see
https://creativecommons.org/licenses/by-sa/2.0/legalcode
Model Resource
Management Techniques
Welcome
Dimensionality Reduction
Dimensionality Effect on
Performance
High-dimensional data
Before. .. when it
was all about data
mining
● Domain experts selected features
● Designed feature transforms
● Small number of more relevant features were enough
Now … data science
is about integrating
everything
● Data generation and storage is less of a problem
● Squeeze out the best from data
● More high-dimensional data having more features
A note about neural networks
● Yes, neural networks will perform a kind of automatic feature selection
● However, that’s not as efficient as a well-designed dataset and model
○ Much of the model can be largely “shut off” to ignore unwanted
features
○ Even unused parts of the consume space and compute resources
○ Unwanted features can still introduce unwanted noise
○ Each feature requires infrastructure to collect, store, and manage
High-dimensional spaces
Word embedding - An example
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
10
?
?
?
?
?
?
?
?
6 7 8 5 11
Auto Embedding Weight Matrix
[“I want to search for blood pressure result history “,
“Show blood pressure result for patient”,...]
Input layer
(Learned vectors)
Embedding Layer
i
want
to
search
for
blood
pressure
result
history
show
patient
...
LAST
1
2
3
4
5
6
7
8
9
10
11
20
Initialization and loading the dataset
import tensorflow as tf
from tensorflow import keras
import numpy as np
from keras.datasets import reuters
from keras.preprocessing import sequence
num_words = 1000
(reuters_train_x, reuters_train_y), (reuters_test_x, reuters_test_y) =
tf.keras.datasets.reuters.load_data(num_words=num_words)
n_labels = np.unique(reuters_train_y).shape[0]
Further preprocessing
from keras.utils import np_utils
reuters_train_y = np_utils.to_categorical(reuters_train_y, 46)
reuters_test_y = np_utils.to_categorical(reuters_test_y, 46)
reuters_train_x =
tf.keras.preprocessing.sequence.pad_sequences(reuters_train_x, maxlen=20)
reuters_test_x = tf.keras.preprocessing.sequence.pad_sequences(reuters_test_x,
maxlen=20)
Using all dimensions
from tensorflow.keras import layers
model2 = tf.keras.Sequential(
[
layers.Embedding(num_words, 1000, input_length= 20),
layers.Flatten(),
layers.Dense(256),
layers.Dropout(0.25),
layers.Activation('relu'),
layers.Dense(46),
layers.Activation('softmax')
])
Model compilation and training
model.compile(loss="categorical_crossentropy", optimizer="rmsprop",
metrics=['accuracy'])
model_1 = model.fit(reuters_train_x, reuters_train_y,
validation_data=(reuters_test_x , reuters_test_y),
batch_size=128, epochs=20, verbose=0)
Example with a higher number of dimensions
Acc
Loss
epoch
epoch
model accuracy model loss
train
validation
train
validation
0.9
0.8
0.6
0.7
0.5
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
0.9
0.8
0.6
0.7
0.5
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
0.0 2.5 5.0
Word embeddings: 6 dimensions
from tensorflow.keras import layers
model = tf.keras.Sequential(
[
layers.Embedding(num_words, 6, input_length= 20),
layers.Flatten(),
layers.Dense(256),
layers.Dropout(0.25),
layers.Activation('relu'),
layers.Dense(46),
layers.Activation('softmax')
])
Word embeddings: fourth root of the size of the vocab
Acc
Loss
epoch
epoch
model accuracy model loss
train
validation
train
validation
0.65
060
0.50
0.55
0.45
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
2.6
2.4
2.0
2.2
1.8
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
0.0 2.5 5.0
0.40
0.35
1.4
1.6
1.2
Dimensionality Reduction
Curse of Dimensionality
ML
methods
k-Nearest
Neighbours
Support Vector
Machines
Distance
measure
e.g., Euclidean
Distance
Recommendation
systems
Many ML methods use the distance measure
● More dimensions → more features
● Risk of overfitting our models
● Distances grow more and more alike
● No clear distinction between clustered objects
● Concentration phenomenon for Euclidean distance
Why is high-dimensional data a problem?
Curse of dimensionality
“As we add more dimensions we also increase the processing power we
need to train the model and make predictions, as well as the amount of
training data required”
Badreesh Shetty
Why are more features bad?
● Redundant / irrelevant features
● More noise added than signal
● Hard to interpret and visualize
● Hard to store and process data
The performance of algorithms ~ the number of dimensions
Optimal Dimensionality (# of features)
Classifier
Performance
1 2 3 4 5
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5)
1-D
2-D
... ...
Adding dimensions increases feature space volume
Curse of dimensionality in the distance function
● New dimensions add non-negative terms to the sum
● Distance increases with the number of dimensions
● For a given number of examples, the feature space becomes
increasingly sparse
Euclidean distance
Increasing sparsity with higher dimensions
Dimension
3
Dimension 1
Dimension
2
Dimension 1
Dimension 1
Dimension
2
Dimension
3
Dimension 1
Dimension
2
x x x
x
x
x
x x x
x x x
x x x
The more the features, the larger the hypothesis space
The lower the hypothesis space
● the easier it is to find the correct hypothesis
● the less examples you need
The Hughes effect
Dimensionality Reduction
Curse of Dimensionality:
An example
How dimensionality impacts in other ways
● Runtime and system memory
requirements
● Solutions take longer to reach global
optima
● More dimensions raise the likelihood
of correlated features
More features require more training data
● More features aren’t better if they don’t add predictive information
● Number of training instances needed increases exponentially with each
added feature
● Reduces real-world usefulness of models
Model #1 (missing a single feature)
sex:InputLayer
cp:InputLayer
fbs:InputLayer
restecg : InputLayer
exang:InputLayer
slope:InputLayer
category_encoding_6:CategoryEncoding
category_encoding_1:CategoryEncoding
category_encoding_2:CategoryEncoding
category_encoding_3:CategoryEncoding
category_encoding_4:CategoryEncoding
normalization_5:Normalization concatenate: Concatenate dense: Dense dropout: dropout dense_1: Dense
ca:InputLayer
age:InputLayer
trestbps:InputLayer
chol:InputLayer
thalach:InputLayer
category_enconding_5:CategoryEncoding
normalization:Normalization
normalization_1:Normalization
normalization_2:Normalization
normalization_3:Normalization
oldpeak:InputLayer
normalization_4:Normalization
Model #2 (adds a new feature)
sex:InputLayer
cp:InputLayer
fbs:InputLayer
restecg : InputLayer
exang:InputLayer
category_encoding_6:CategoryEncoding
category_encoding_1:CategoryEncoding
category_encoding_2:CategoryEncoding
category_encoding_3:CategoryEncoding
category_encoding_4:CategoryEncoding
thal:InputLayer string_lookup:StringLookup
ca:InputLayer
age:InputLayer
trestbps:InputLayer
chol:InputLayer
thalach:InputLayer
category_enconding_5:CategoryEncoding
normalization:Normalization
normalization_1:Normalization
normalization_2:Normalization
normalization_3:Normalization
oldpeak:InputLayer
normalization_4:Normalization
A new string categorical
feature is added!
slope:InputLayer normalization_5:Normalization concatenate: Concatenate dense: Dense dropout: dropout dense_1: Dense
from tensorflow.python.keras.utils.layer_utils import count_params
# Number of training parameters in Model #1
>>> count_params(model_1.trainable_variables)
833
# Number of training parameters in Model #2 (with an added feature)
>>> count_params(model_1.trainable_variables)
1057
Comparing the two models’ trainable variables
What do ML models need?
● No hard and fast rule on how many features are required
● Number of features to be used vary depending on
● Prefer uncorrelated data containing information to
produce correct results
Manual Dimensionality
Reduction
Dimensionality Reduction
Increasing predictive performance
● Features must have information to produce correct results
● Derive features from inherent features
● Extract and recombine to create new features
Combining features
● Number of features grows very quickly
● Reduce dimensionality
pixels,
contours,
textures, etc.
samples,
spectrograms,
etc.
ticks, trends,
reversals, etc.
dna, marker
sequences,
genes, etc.
words,
grammatical
classes and
relations, etc.
Initial features
Feature explosion
Why reduce dimensionality?
Major techniques for
dimensionality
reduction Engineering Selection
Storage Computational Consistency Visualization
Need for manually crafting features
Certainly provides food for thought
Engineer features
● Tabular - aggregate, combine,
decompose
● Text-extract context
indicators
● Image-prescribe filters for
relevant structures
Come up with ideas to construct
“better” features
Devising features to reduce
dimensionality
Select the right features to maximize
predictiveness
Evaluate models using chosen
features
It’s an iterative
process
Feature Engineering
Manual Dimensionality
Reduction: case study
Dimensionality Reduction
CSV_COLUMNS = [
'fare_amount',
'pickup_datetime', 'pickup_longitude', 'pickup_latitude',
'Dropoff_longitude', 'dropoff_latitude',
'passenger_count', 'key',
]
LABEL_COLUMN = 'fare_amount'
STRING_COLS = ['pickup_datetime']
NUMERIC_COLS = ['pickup_longitude', 'pickup_latitude',
'dropoff_longitude', 'dropoff_latitude',
'passenger_count']
DEFAULTS = [[0.0], ['na'], [0.0], [0.0], [0.0], [0.0], [0.0], ['na']]
DAYS = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
Taxi Fare dataset
dropoff_longitude:InputLayer
passenger_count:InputLayer
pickup_latitude:InputLayer
pickup_longitude:InputLayer
dropoff_latitude:InputLayer
dense_features_4:DenseFeatures h1:Dense h2:Dense fare:Dense
Build the model in Keras
from tensorflow.keras import layers
from tensorflow.keras.metrics import RootMeanSquared as RMSE
dnn_inputs = layers.DenseFeatures(feature_columns.values())(inputs)
h1 = layers.Dense(32, activation='relu', name='h1')(dnn_inputs)
h2 = layers.Dense(8, activation='relu', name='h2')(h1)
output = layers.Dense(1, activation='linear', name='fare')(h2)
model = models.Model(inputs, output)
model.compile(optimizer='adam', loss='mse',
metrics=[RMSE(name='rmse'), 'mse'])
Build a baseline model using raw features
Train the model
model rmse
epoch
125
120
115
110
105
100
0 1 2 3 4
train
validation
Increasing model performance with Feature Engineering
● Carefully craft features for the data types
○ Temporal (pickup date & time)
○ Geographical (latitude and longitude)
def parse_datetime(s):
if type(s) is not str:
s = s.numpy().decode('utf-8')
return datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S %Z")
def get_dayofweek(s):
ts = parse_datetime(s)
return DAYS[ts.weekday()]
@tf.function
def dayofweek(ts_in):
return tf.map_fn(
lambda s: tf.py_function(get_dayofweek, inp=[s],
Tout=tf.string),
ts_in)
Handling temporal features
def euclidean(params):
lon1, lat1, lon2, lat2 = params
londiff = lon2 - lon1
latdiff = lat2 - lat1
return tf.sqrt(londiff * londiff + latdiff * latdiff)
Geolocational features
def scale_longitude(lon_column):
return (lon_column + 78)/8.
Scaling latitude and longitude
def scale_latitude(lat_column):
return (lat_column - 37)/8.
def transform(inputs, numeric_cols, string_cols, nbuckets):
...
feature_columns = {
colname: tf.feature_column.numeric_column(colname)
for colname in numeric_cols
}
Preparing the transformations
for lon_col in ['pickup_longitude', 'dropoff_longitude']:
transformed[lon_col] = layers.Lambda(scale_longitude,
...)(inputs[lon_col])
for lat_col in ['pickup_latitude', 'dropoff_latitude']:
transformed[lat_col] = layers.Lambda(
scale_latitude,
...)(inputs[lat_col])
...
def transform(inputs, numeric_cols, string_cols, nbuckets):
...
transformed['euclidean'] = layers.Lambda(
euclidean,
name='euclidean')([inputs['pickup_longitude'],
inputs['pickup_latitude'],
inputs['dropoff_longitude'],
inputs['dropoff_latitude']])
feature_columns['euclidean'] = fc.numeric_column('euclidean')
...
Computing the Euclidean distance
def transform(inputs, numeric_cols, string_cols, nbuckets):
...
latbuckets = np.linspace(0, 1, nbuckets).tolist()
lonbuckets = ... # Similarly for longitude
b_plat = fc.bucketized_column(
feature_columns['pickup_latitude'], latbuckets)
b_dlat = # Bucketize 'dropoff_latitude'
b_plon = # Bucketize 'pickup_longitude'
b_dlon = # Bucketize 'dropoff_longitude'
Bucketizing and feature crossing
ploc = fc.crossed_column([b_plat, b_plon], nbuckets * nbuckets)
dloc = # Feature cross 'b_dlat' and 'b_dlon'
pd_pair = fc.crossed_column([ploc, dloc], nbuckets ** 4)
feature_columns['pickup_and_dropoff'] = fc.embedding_column(pd_pair,
100)
Bucketizing and feature crossing
dropoff_longitude:InputLayer
passenger_count:InputLayer
pickup_latitude:InputLayer
pickup_longitude:InputLayer
dropoff_latitude:InputLayer
dense_features_4:DenseFeatures h1:Dense h2:Dense fare:Dense
Scale_dropoff_latitude: Lambda
Scale_dropoff_longitude: Lambda
euclidean: Lambda
passenger_count:InputLayer
scale_pickup_latittude:Lambda
scale_pickup_longitude:Lambda
Build a model with the engineered features
Train the new feature engineered model
train
validation
Improved model rmse
epoch
3
mse
100
90
80
70
60
50
0 1 2 4
40
30
6
5 7
baselinel rmse
epoch
125
120
115
110
105
100
0 1 2 3 4
train
validation
Algorithmic Dimensionality
Reduction
Dimensionality Reduction
Linear dimensionality reduction
● Linearly project n-dimensional data onto a k-dimensional subspace
(k < n, often k << n)
● There are infinitely many k-dimensional subspaces we can project the
data onto
● Which one should we choose?
f1
f2
f3
... ... ... fn-1
fn
Dimensionality
Reduction
Output
(dark) 0 1 (bright)
Projecting onto a line
Best k-dimensional subspace for projection
Classification: maximize separation among classes
Example: Linear discriminant analysis (LDA)
Regression: maximize correlation between projected data and response variable
Example: Partial least squares (PLS)
Unsupervised: retain as much data variance as possible
Example: Principal component analysis (PCA)
Principal Component Analysis
Dimensionality Reduction
Principal component analysis (PCA)
● PCA is a minimization of the
orthogonal distance
● Widely used method for unsupervised
& linear dimensionality reduction
● Accounts for variance of data in as
few dimensions as possible using
linear projections
Principal components (PCs)
● PCs maximize the variance of
projections
● PCs are orthogonal
● Gives the best axis to project
● Goal of PCA: Minimize total squared
reconstruction error
1st
principal vector
2nd
principal vector
2-D data
PCA Algorithm - First Principal Component
Step 1
Find a line, such that when the data is projected onto that line, it has the maximum variance
Step 2
Find a second line, orthogonal to the first, that has maximum projected variance
PCA Algorithm - Second Principal Component
Repeat until we have k orthogonal lines
PCA Algorithm
Step 3
pca = PrincipalComponentAnalysis(n_components=2)
pca.fit(X)
X_pca = pca.transform(X)
Applying PCA on Iris
tot = sum(pca.e_vals_)
var_exp = [(i / tot) * 100 for i in sorted(pca.e_vals_, reverse=True)]
cum_var_exp = np.cumsum(var_exp)
Plot the explained variance
loadings = pca.e_vecs_ * np.sqrt(pca.e_vals_)
PCA factor loadings
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import datasets
# Load the data
digits = datasets.load_digits()
# Standardize the feature matrix
X = StandardScaler().fit_transform(digits.data)
PCA in scikit-learn
# Create a PCA that will retain 99% of the variance
pca = PCA(n_components=0.99, whiten=True)
# Conduct PCA
X_pca = pca.fit_transform(X)
PCA in scikit-learn
When to use PCA?
Strengths
● A versatile technique
● Fast and simple
● Offers several variations and extensions (e.g., kernel/sparse PCA)
Weaknesses
● Result is not interpretable
● Requires setting threshold for cumulative explained variance
Other Techniques
Dimensionality Reduction
More dimensionality reduction algorithms
Unsupervised
● Latent Semantic Indexing/Analysis (LSI and LSA) (SVD)
● Independent Component Analysis (ICA)
Matrix
Factorization
● Non-Negative Matrix Factorization (NMF)
Latent
Methods
● Latent Dirichlet Allocation (LDA)
Singular value decomposition (SVD)
● SVD decomposes non-square matrices
● Useful for sparse matrices as produced by TF-IDF
● Removes redundant features from the dataset
Independent Components Analysis (ICA)
● PCA seeks directions in feature space that minimize reconstruction
error
● ICA seeks directions that are most statistically independent
● ICA addresses higher order dependence
How does ICA work?
● Assume there exists independent signals:
𝑆 = [𝑠1
(𝑡) , 𝑠2
(𝑡) , … , 𝑠𝑁
(𝑡) ]
● Linear combinations of signals: 𝑌(𝑡) = 𝐴 𝑆(𝑡)
○ Both A and S are unknown
○ A - mixing matrix
● Goal of ICA: recover original signals, 𝑆(𝑡) from 𝑌(𝑡)
PCA ICA
Removes correlations ✓ ✓
Removes higher order
dependence
✓
All components treated
fairly?
✓
Orthogonality ✓
Comparing PCA and ICA
Non-negative Matrix Factorization (NMF)
● NMF models are interpretable and easier to understand
● NMF requires the sample features to be non-negative
Mobile, IoT, and Similar Use
Cases
Quantization & Pruning
Trends in adoption of smart devices
Factors driving this trend
● Demands move ML capability from cloud to on-device
● Cost-effectiveness
● Compliance with privacy regulations
Online ML inference
● To generate real-time predictions you can:
○ Host the model on a server
○ Embed the model in the device
● Is it faster on a server, or on-device?
● Mobile processing limitations?
Classification request
Prediction results
Inference on the cloud/server
Pros
● Lots of compute capacity
● Scalable hardware
● Model complexity handled by the server
● Easy to add new features and update the model
● Low latency and batch prediction
Mobile inference
Cons
● Timely inference is needed
Pro
● Improved speed
● Performance
● Network connectivity
● No to-and-fro communication needed
On-device Inference
Mobile inference
Cons
● Less capacity
● Tight resource constraints
Options
On-device
inference
On-device
personalization
On-device
training
Cloud-based
web service
Pretrained
models
Custom
models
ML Kit ✓ ✓ ✓ ✓ ✓
Core ML ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓
Model deployment
*
* Also supported in TFX
Benefits and Process of
Quantization
Quantization & Pruning
Quantization
Why quantize neural networks?
● Neural networks have many parameters and take up space
● Shrinking model file size
● Reduce computational resources
● Make models run faster and use less power with low-precision
Top
1
Accuracy
70
60
50
40
Runtime (ms) on Pixel 2 big core
10
5 100
50
8bit
Snapdragon 835
MobileNets: Latency vs Accuracy trade-off
Float
Benefits of quantization
● Faster compute
● Low memory bandwidth
● Low power
● Integer operations supported across CPU/DSP/NPUs
-127 127
-3e38 3e38
0
min max
float32
int8
The quantization process
What parts of the model are affected?
● Static values (parameters)
● Dynamic values (activations)
● Computation (transformations)
Trade-offs
● Optimizations impact model accuracy
○ Difficult to predict ahead of time
● In rare cases, models may actually gain some accuracy
● Undefined effects on ML interpretability
Choose the best model for the task
Post Training Quantization
Quantization & Pruning
● Reduced precision representation
● Incur small loss in model accuracy
● Joint optimization for model and
latency
Post-training quantization
Technique Benefits
Dynamic range quantization 4x smaller, 2x-3x speedup
Full integer quantization 4x smaller, 3x+ speedup
float16 quantization 2x smaller, GPU acceleration
Post-training quantization
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
Post training quantization
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
INT8
TensorFlow
(tf.Keras)
Saved
Model
+
Calibration
data
TF Lite
Converter
TF Lite
Model
Post-training integer quantization
Model accuracy
● Small accuracy loss incurred (mostly for smaller networks)
● Use the benchmarking tools to evaluate model accuracy
● If the loss of accuracy drop is not within acceptable limits, consider
using quantization-aware training
Quantization Aware
Training
Quantization & Pruning
Quantization-aware training (QAT)
● Inserts fake quantization (FQ) nodes in the forward pass
● Rewrites the graph to emulate quantized inference
● Reduces the loss of accuracy due to quantization
● Resulting model contains all data to be quantized according to spec
● INT8
TensorFlow
(tf.Keras)
Apply QAT
+
Train model
Convert &
Quantize
(TF Lite)
TF Lite
Model
Quantization-aware training (QAT)
ReLU6
+
blases
conv
output
input
Adding the quantization emulation operations
weights
ReLU6 Act quant
+
blases
conv
Wt
quant
output
input
Adding the quantization emulation operations
weights
import tensorflow_model_optimization as tfmot
model = tf.keras.Sequential([
...
])
QAT on entire model
# Quantize the entire model.
quantized_model = tfmot.quantization.keras.quantize_model(model)
# Continue with training as usual.
quantized_model.compile(...)
quantized_model.fit(...)
import tensorflow_model_optimization as tfmot
quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer
model = tf.keras.Sequential([
...
# Only annotated layers will be quantized.
quantize_annotate_layer(Conv2D()),
quantize_annotate_layer(ReLU()),
Dense(),
...
])
# Quantize the model.
quantized_model = tfmot.quantization.keras.quantize_apply(model)
Quantize part(s) of a model
quantize_annotate_layer =
tfmot.quantization.keras.quantize_annotate_layer
quantize_annotate_model =
tfmot.quantization.keras.quantize_annotate_model
quantize_scope = tfmot.quantization.keras.quantize_scope
Quantize custom Keras layer
model = quantize_annotate_model(tf.keras.Sequential([
quantize_annotate_layer(CustomLayer(20, input_shape=(20,)),
DefaultDenseQuantizeConfig()),
tf.keras.layers.Flatten()
]))
# `quantize_apply` requires mentioning `DefaultDenseQuantizeConfig` with
`quantize_scope`
with quantize_scope(
{'DefaultDenseQuantizeConfig': DefaultDenseQuantizeConfig,
'CustomLayer': CustomLayer}):
# Use `quantize_apply` to actually make the model quantization aware.
quant_aware_model = tfmot.quantization.keras.quantize_apply(model)
Quantize custom Keras layer
Model Optimization Results - Accuracy
Model
Top-1 Accuracy
(Original)
Top-1 Accuracy
(Post Training
Quantized)
Top-1 Accuracy
(Quantization Aware
Training)
Mobilenet-v1-1-224 0.709 0.657 0.70
Mobilenet-v2-1-224 0.719 0.637 0.709
Inception_v3 0.78 0.772 0.775
Resnet_v2_101 0.770 0.768 N/A
Model
Latency
(Original) (ms)
Latency
(Post Training
Quantized) (ms)
Latency
(Quantization Aware
Training) (ms)
Mobilenet-v1-1-224 124 112 64
Mobilenet-v2-1-224 89 98 54
Inception_v3 1130 845 543
Resnet_v2_101 3973 2868 N/A
Model Optimization Results - Latency
Model Size (Original) (MB) Size (Optimized) (MB)
Mobilenet-v1-1-224 16.9 4.3
Mobilenet-v2-1-224 14 3.6
Inception_v3 95.7 23.9
Resnet_v2_101 178.3 44.9
Model Optimization Results
Pruning
Quantization & Pruning
Before pruning After pruning
Connection pruning
Model sparsity
Larger
models
More
memory
Less
efficient
Sparse
models
Less
memory
More
efficient
Before pruning After pruning
Pruning
synapses
Pruning
neurons
Origins of weight pruning
The Lottery Ticket Hypothesis
Finding Sparse Neural Networks
“A randomly-initialized, dense neural network contains a subnetwork that is
initialized such that — when trained in isolation — it can match the test
accuracy of the original network after training for at most the same number
of iterations”
Jonathan Frankle and Michael Carbin
Pruning research is evolving
● The new method didn’t perform well at large scale
● The new method failed to identify the randomly initialized winners
● It’s an active area of research
Tensors with no sparsity (left), sparsity in blocks of 1x1 (center), and the
sparsity in blocks 1x2 (right)
Eliminate connections based on their magnitude
Example of sparsity ramp-up function with a schedule to start pruning from step 0
until step 100, and a final target sparsity of 90%.
Apply sparsity with a pruning routine
Animation of pruning applied to a tensor
Black cells indicate where the non-zero weights exist
Sparsity increases with training
What’s special about pruning?
● Better storage and/or transmission
● Gain speedups in CPU and some ML accelerators
● Can be used in tandem with quantization to get additional benefits
● Unlock performance improvements
Pruning with TF Model Optimization Toolkit
● INT8
TensorFlow
(tf.Keras)
Sparsify
+
Train model
Convert &
Quantize
(TF Lite)
TF Lite
Model
import tensorflow_model_optimization as tfmot
model = build_your_model()
pruning_schedule = tfmot.sparsity.keras.PolynomialDecay(
initial_sparsity=0.50, final_sparsity=0.80,
begin_step=2000, end_step=4000)
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(
model,
pruning_schedule=pruning_schedule)
...
model_for_pruning.fit(...)
Pruning with Keras
Model
Non-sparse
Top-1 acc.
Sparse
acc.
Sparsity
Inception
V3
78.1%
78.0% 50%
76.1% 75%
74.6% 87.5%
Mobilenet
V1 224
71.04% 70.84% 50%
Model
Non-sparse
BLEU
Sparse
BLEU
Sparsity
GNMT
EN-DE
26.77
26.86 80%
26.52 85%
26.19 90%
GNMT
DE-EN
29.47
29.50 80%
29.24 85%
28.81 90%
Results across different models & tasks

Más contenido relacionado

La actualidad más candente

Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesHJ van Veen
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RShirin Elsinghorst
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updatedVajira Thambawita
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesScott Clark
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Hayim Makabee
 
Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Jeong-Gwan Lee
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 

La actualidad más candente (20)

Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with R
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
Bayesian Global Optimization
Bayesian Global OptimizationBayesian Global Optimization
Bayesian Global Optimization
 
Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...Reward constrained interactive recommendation with natural language feedback ...
Reward constrained interactive recommendation with natural language feedback ...
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 

Similar a C3 w2

The Ring programming language version 1.9 book - Part 98 of 210
The Ring programming language version 1.9 book - Part 98 of 210The Ring programming language version 1.9 book - Part 98 of 210
The Ring programming language version 1.9 book - Part 98 of 210Mahmoud Samir Fayed
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Spark Summit
 
The Ring programming language version 1.5.1 book - Part 174 of 180
The Ring programming language version 1.5.1 book - Part 174 of 180 The Ring programming language version 1.5.1 book - Part 174 of 180
The Ring programming language version 1.5.1 book - Part 174 of 180 Mahmoud Samir Fayed
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANWuhyun Rico Shin
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx0567Padma
 
VCE Unit 01 (1).pptx
VCE Unit 01 (1).pptxVCE Unit 01 (1).pptx
VCE Unit 01 (1).pptxskilljiolms
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfssuserb4d806
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Observations
ObservationsObservations
Observationsbutest
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecJosh Patterson
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Scaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlowScaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlowDatabricks
 
Simple APIs and innovative documentation
Simple APIs and innovative documentationSimple APIs and innovative documentation
Simple APIs and innovative documentationPyDataParis
 

Similar a C3 w2 (20)

Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
The Ring programming language version 1.9 book - Part 98 of 210
The Ring programming language version 1.9 book - Part 98 of 210The Ring programming language version 1.9 book - Part 98 of 210
The Ring programming language version 1.9 book - Part 98 of 210
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
 
Keras and TensorFlow
Keras and TensorFlowKeras and TensorFlow
Keras and TensorFlow
 
The Ring programming language version 1.5.1 book - Part 174 of 180
The Ring programming language version 1.5.1 book - Part 174 of 180 The Ring programming language version 1.5.1 book - Part 174 of 180
The Ring programming language version 1.5.1 book - Part 174 of 180
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GAN
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
 
VCE Unit 01 (1).pptx
VCE Unit 01 (1).pptxVCE Unit 01 (1).pptx
VCE Unit 01 (1).pptx
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdf
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Observations
ObservationsObservations
Observations
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Scaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlowScaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlow
 
Simple APIs and innovative documentation
Simple APIs and innovative documentationSimple APIs and innovative documentation
Simple APIs and innovative documentation
 

Último

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 

Último (20)

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 

C3 w2

  • 1. Copyright Notice These slides are distributed under the Creative Commons License. DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute these slides for commercial purposes. You may make copies of these slides and use or distribute them for educational purposes as long as you cite DeepLearning.AI as the source of the slides. For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode
  • 4. High-dimensional data Before. .. when it was all about data mining ● Domain experts selected features ● Designed feature transforms ● Small number of more relevant features were enough Now … data science is about integrating everything ● Data generation and storage is less of a problem ● Squeeze out the best from data ● More high-dimensional data having more features
  • 5. A note about neural networks ● Yes, neural networks will perform a kind of automatic feature selection ● However, that’s not as efficient as a well-designed dataset and model ○ Much of the model can be largely “shut off” to ignore unwanted features ○ Even unused parts of the consume space and compute resources ○ Unwanted features can still introduce unwanted noise ○ Each feature requires infrastructure to collect, store, and manage
  • 7. Word embedding - An example ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 10 ? ? ? ? ? ? ? ? 6 7 8 5 11 Auto Embedding Weight Matrix [“I want to search for blood pressure result history “, “Show blood pressure result for patient”,...] Input layer (Learned vectors) Embedding Layer i want to search for blood pressure result history show patient ... LAST 1 2 3 4 5 6 7 8 9 10 11 20
  • 8. Initialization and loading the dataset import tensorflow as tf from tensorflow import keras import numpy as np from keras.datasets import reuters from keras.preprocessing import sequence num_words = 1000 (reuters_train_x, reuters_train_y), (reuters_test_x, reuters_test_y) = tf.keras.datasets.reuters.load_data(num_words=num_words) n_labels = np.unique(reuters_train_y).shape[0]
  • 9. Further preprocessing from keras.utils import np_utils reuters_train_y = np_utils.to_categorical(reuters_train_y, 46) reuters_test_y = np_utils.to_categorical(reuters_test_y, 46) reuters_train_x = tf.keras.preprocessing.sequence.pad_sequences(reuters_train_x, maxlen=20) reuters_test_x = tf.keras.preprocessing.sequence.pad_sequences(reuters_test_x, maxlen=20)
  • 10. Using all dimensions from tensorflow.keras import layers model2 = tf.keras.Sequential( [ layers.Embedding(num_words, 1000, input_length= 20), layers.Flatten(), layers.Dense(256), layers.Dropout(0.25), layers.Activation('relu'), layers.Dense(46), layers.Activation('softmax') ])
  • 11. Model compilation and training model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=['accuracy']) model_1 = model.fit(reuters_train_x, reuters_train_y, validation_data=(reuters_test_x , reuters_test_y), batch_size=128, epochs=20, verbose=0)
  • 12. Example with a higher number of dimensions Acc Loss epoch epoch model accuracy model loss train validation train validation 0.9 0.8 0.6 0.7 0.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 0.9 0.8 0.6 0.7 0.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 0.0 2.5 5.0
  • 13. Word embeddings: 6 dimensions from tensorflow.keras import layers model = tf.keras.Sequential( [ layers.Embedding(num_words, 6, input_length= 20), layers.Flatten(), layers.Dense(256), layers.Dropout(0.25), layers.Activation('relu'), layers.Dense(46), layers.Activation('softmax') ])
  • 14. Word embeddings: fourth root of the size of the vocab Acc Loss epoch epoch model accuracy model loss train validation train validation 0.65 060 0.50 0.55 0.45 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 2.6 2.4 2.0 2.2 1.8 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 0.0 2.5 5.0 0.40 0.35 1.4 1.6 1.2
  • 17. ● More dimensions → more features ● Risk of overfitting our models ● Distances grow more and more alike ● No clear distinction between clustered objects ● Concentration phenomenon for Euclidean distance Why is high-dimensional data a problem?
  • 18. Curse of dimensionality “As we add more dimensions we also increase the processing power we need to train the model and make predictions, as well as the amount of training data required” Badreesh Shetty
  • 19. Why are more features bad? ● Redundant / irrelevant features ● More noise added than signal ● Hard to interpret and visualize ● Hard to store and process data
  • 20. The performance of algorithms ~ the number of dimensions Optimal Dimensionality (# of features) Classifier Performance
  • 21. 1 2 3 4 5 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) 1-D 2-D ... ... Adding dimensions increases feature space volume
  • 22. Curse of dimensionality in the distance function ● New dimensions add non-negative terms to the sum ● Distance increases with the number of dimensions ● For a given number of examples, the feature space becomes increasingly sparse Euclidean distance
  • 23. Increasing sparsity with higher dimensions Dimension 3 Dimension 1 Dimension 2 Dimension 1 Dimension 1 Dimension 2 Dimension 3 Dimension 1 Dimension 2
  • 24. x x x x x x x x x x x x x x x The more the features, the larger the hypothesis space The lower the hypothesis space ● the easier it is to find the correct hypothesis ● the less examples you need The Hughes effect
  • 25. Dimensionality Reduction Curse of Dimensionality: An example
  • 26. How dimensionality impacts in other ways ● Runtime and system memory requirements ● Solutions take longer to reach global optima ● More dimensions raise the likelihood of correlated features
  • 27. More features require more training data ● More features aren’t better if they don’t add predictive information ● Number of training instances needed increases exponentially with each added feature ● Reduces real-world usefulness of models
  • 28. Model #1 (missing a single feature) sex:InputLayer cp:InputLayer fbs:InputLayer restecg : InputLayer exang:InputLayer slope:InputLayer category_encoding_6:CategoryEncoding category_encoding_1:CategoryEncoding category_encoding_2:CategoryEncoding category_encoding_3:CategoryEncoding category_encoding_4:CategoryEncoding normalization_5:Normalization concatenate: Concatenate dense: Dense dropout: dropout dense_1: Dense ca:InputLayer age:InputLayer trestbps:InputLayer chol:InputLayer thalach:InputLayer category_enconding_5:CategoryEncoding normalization:Normalization normalization_1:Normalization normalization_2:Normalization normalization_3:Normalization oldpeak:InputLayer normalization_4:Normalization
  • 29. Model #2 (adds a new feature) sex:InputLayer cp:InputLayer fbs:InputLayer restecg : InputLayer exang:InputLayer category_encoding_6:CategoryEncoding category_encoding_1:CategoryEncoding category_encoding_2:CategoryEncoding category_encoding_3:CategoryEncoding category_encoding_4:CategoryEncoding thal:InputLayer string_lookup:StringLookup ca:InputLayer age:InputLayer trestbps:InputLayer chol:InputLayer thalach:InputLayer category_enconding_5:CategoryEncoding normalization:Normalization normalization_1:Normalization normalization_2:Normalization normalization_3:Normalization oldpeak:InputLayer normalization_4:Normalization A new string categorical feature is added! slope:InputLayer normalization_5:Normalization concatenate: Concatenate dense: Dense dropout: dropout dense_1: Dense
  • 30. from tensorflow.python.keras.utils.layer_utils import count_params # Number of training parameters in Model #1 >>> count_params(model_1.trainable_variables) 833 # Number of training parameters in Model #2 (with an added feature) >>> count_params(model_1.trainable_variables) 1057 Comparing the two models’ trainable variables
  • 31. What do ML models need? ● No hard and fast rule on how many features are required ● Number of features to be used vary depending on ● Prefer uncorrelated data containing information to produce correct results
  • 33. Increasing predictive performance ● Features must have information to produce correct results ● Derive features from inherent features ● Extract and recombine to create new features
  • 34. Combining features ● Number of features grows very quickly ● Reduce dimensionality pixels, contours, textures, etc. samples, spectrograms, etc. ticks, trends, reversals, etc. dna, marker sequences, genes, etc. words, grammatical classes and relations, etc. Initial features Feature explosion
  • 35. Why reduce dimensionality? Major techniques for dimensionality reduction Engineering Selection Storage Computational Consistency Visualization
  • 36. Need for manually crafting features Certainly provides food for thought Engineer features ● Tabular - aggregate, combine, decompose ● Text-extract context indicators ● Image-prescribe filters for relevant structures Come up with ideas to construct “better” features Devising features to reduce dimensionality Select the right features to maximize predictiveness Evaluate models using chosen features It’s an iterative process Feature Engineering
  • 37. Manual Dimensionality Reduction: case study Dimensionality Reduction
  • 38. CSV_COLUMNS = [ 'fare_amount', 'pickup_datetime', 'pickup_longitude', 'pickup_latitude', 'Dropoff_longitude', 'dropoff_latitude', 'passenger_count', 'key', ] LABEL_COLUMN = 'fare_amount' STRING_COLS = ['pickup_datetime'] NUMERIC_COLS = ['pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude', 'passenger_count'] DEFAULTS = [[0.0], ['na'], [0.0], [0.0], [0.0], [0.0], [0.0], ['na']] DAYS = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'] Taxi Fare dataset
  • 40. from tensorflow.keras import layers from tensorflow.keras.metrics import RootMeanSquared as RMSE dnn_inputs = layers.DenseFeatures(feature_columns.values())(inputs) h1 = layers.Dense(32, activation='relu', name='h1')(dnn_inputs) h2 = layers.Dense(8, activation='relu', name='h2')(h1) output = layers.Dense(1, activation='linear', name='fare')(h2) model = models.Model(inputs, output) model.compile(optimizer='adam', loss='mse', metrics=[RMSE(name='rmse'), 'mse']) Build a baseline model using raw features
  • 41. Train the model model rmse epoch 125 120 115 110 105 100 0 1 2 3 4 train validation
  • 42. Increasing model performance with Feature Engineering ● Carefully craft features for the data types ○ Temporal (pickup date & time) ○ Geographical (latitude and longitude)
  • 43. def parse_datetime(s): if type(s) is not str: s = s.numpy().decode('utf-8') return datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S %Z") def get_dayofweek(s): ts = parse_datetime(s) return DAYS[ts.weekday()] @tf.function def dayofweek(ts_in): return tf.map_fn( lambda s: tf.py_function(get_dayofweek, inp=[s], Tout=tf.string), ts_in) Handling temporal features
  • 44. def euclidean(params): lon1, lat1, lon2, lat2 = params londiff = lon2 - lon1 latdiff = lat2 - lat1 return tf.sqrt(londiff * londiff + latdiff * latdiff) Geolocational features
  • 45. def scale_longitude(lon_column): return (lon_column + 78)/8. Scaling latitude and longitude def scale_latitude(lat_column): return (lat_column - 37)/8.
  • 46. def transform(inputs, numeric_cols, string_cols, nbuckets): ... feature_columns = { colname: tf.feature_column.numeric_column(colname) for colname in numeric_cols } Preparing the transformations for lon_col in ['pickup_longitude', 'dropoff_longitude']: transformed[lon_col] = layers.Lambda(scale_longitude, ...)(inputs[lon_col]) for lat_col in ['pickup_latitude', 'dropoff_latitude']: transformed[lat_col] = layers.Lambda( scale_latitude, ...)(inputs[lat_col]) ...
  • 47. def transform(inputs, numeric_cols, string_cols, nbuckets): ... transformed['euclidean'] = layers.Lambda( euclidean, name='euclidean')([inputs['pickup_longitude'], inputs['pickup_latitude'], inputs['dropoff_longitude'], inputs['dropoff_latitude']]) feature_columns['euclidean'] = fc.numeric_column('euclidean') ... Computing the Euclidean distance
  • 48. def transform(inputs, numeric_cols, string_cols, nbuckets): ... latbuckets = np.linspace(0, 1, nbuckets).tolist() lonbuckets = ... # Similarly for longitude b_plat = fc.bucketized_column( feature_columns['pickup_latitude'], latbuckets) b_dlat = # Bucketize 'dropoff_latitude' b_plon = # Bucketize 'pickup_longitude' b_dlon = # Bucketize 'dropoff_longitude' Bucketizing and feature crossing
  • 49. ploc = fc.crossed_column([b_plat, b_plon], nbuckets * nbuckets) dloc = # Feature cross 'b_dlat' and 'b_dlon' pd_pair = fc.crossed_column([ploc, dloc], nbuckets ** 4) feature_columns['pickup_and_dropoff'] = fc.embedding_column(pd_pair, 100) Bucketizing and feature crossing
  • 50. dropoff_longitude:InputLayer passenger_count:InputLayer pickup_latitude:InputLayer pickup_longitude:InputLayer dropoff_latitude:InputLayer dense_features_4:DenseFeatures h1:Dense h2:Dense fare:Dense Scale_dropoff_latitude: Lambda Scale_dropoff_longitude: Lambda euclidean: Lambda passenger_count:InputLayer scale_pickup_latittude:Lambda scale_pickup_longitude:Lambda Build a model with the engineered features
  • 51. Train the new feature engineered model train validation Improved model rmse epoch 3 mse 100 90 80 70 60 50 0 1 2 4 40 30 6 5 7 baselinel rmse epoch 125 120 115 110 105 100 0 1 2 3 4 train validation
  • 53. Linear dimensionality reduction ● Linearly project n-dimensional data onto a k-dimensional subspace (k < n, often k << n) ● There are infinitely many k-dimensional subspaces we can project the data onto ● Which one should we choose?
  • 54. f1 f2 f3 ... ... ... fn-1 fn Dimensionality Reduction Output (dark) 0 1 (bright) Projecting onto a line
  • 55. Best k-dimensional subspace for projection Classification: maximize separation among classes Example: Linear discriminant analysis (LDA) Regression: maximize correlation between projected data and response variable Example: Partial least squares (PLS) Unsupervised: retain as much data variance as possible Example: Principal component analysis (PCA)
  • 57. Principal component analysis (PCA) ● PCA is a minimization of the orthogonal distance ● Widely used method for unsupervised & linear dimensionality reduction ● Accounts for variance of data in as few dimensions as possible using linear projections
  • 58. Principal components (PCs) ● PCs maximize the variance of projections ● PCs are orthogonal ● Gives the best axis to project ● Goal of PCA: Minimize total squared reconstruction error 1st principal vector 2nd principal vector
  • 60. PCA Algorithm - First Principal Component Step 1 Find a line, such that when the data is projected onto that line, it has the maximum variance
  • 61. Step 2 Find a second line, orthogonal to the first, that has maximum projected variance PCA Algorithm - Second Principal Component
  • 62. Repeat until we have k orthogonal lines PCA Algorithm Step 3
  • 63. pca = PrincipalComponentAnalysis(n_components=2) pca.fit(X) X_pca = pca.transform(X) Applying PCA on Iris
  • 64. tot = sum(pca.e_vals_) var_exp = [(i / tot) * 100 for i in sorted(pca.e_vals_, reverse=True)] cum_var_exp = np.cumsum(var_exp) Plot the explained variance
  • 65. loadings = pca.e_vecs_ * np.sqrt(pca.e_vals_) PCA factor loadings
  • 66. from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA from sklearn import datasets # Load the data digits = datasets.load_digits() # Standardize the feature matrix X = StandardScaler().fit_transform(digits.data) PCA in scikit-learn
  • 67. # Create a PCA that will retain 99% of the variance pca = PCA(n_components=0.99, whiten=True) # Conduct PCA X_pca = pca.fit_transform(X) PCA in scikit-learn
  • 68. When to use PCA? Strengths ● A versatile technique ● Fast and simple ● Offers several variations and extensions (e.g., kernel/sparse PCA) Weaknesses ● Result is not interpretable ● Requires setting threshold for cumulative explained variance
  • 70. More dimensionality reduction algorithms Unsupervised ● Latent Semantic Indexing/Analysis (LSI and LSA) (SVD) ● Independent Component Analysis (ICA) Matrix Factorization ● Non-Negative Matrix Factorization (NMF) Latent Methods ● Latent Dirichlet Allocation (LDA)
  • 71. Singular value decomposition (SVD) ● SVD decomposes non-square matrices ● Useful for sparse matrices as produced by TF-IDF ● Removes redundant features from the dataset
  • 72. Independent Components Analysis (ICA) ● PCA seeks directions in feature space that minimize reconstruction error ● ICA seeks directions that are most statistically independent ● ICA addresses higher order dependence
  • 73. How does ICA work? ● Assume there exists independent signals: 𝑆 = [𝑠1 (𝑡) , 𝑠2 (𝑡) , … , 𝑠𝑁 (𝑡) ] ● Linear combinations of signals: 𝑌(𝑡) = 𝐴 𝑆(𝑡) ○ Both A and S are unknown ○ A - mixing matrix ● Goal of ICA: recover original signals, 𝑆(𝑡) from 𝑌(𝑡)
  • 74. PCA ICA Removes correlations ✓ ✓ Removes higher order dependence ✓ All components treated fairly? ✓ Orthogonality ✓ Comparing PCA and ICA
  • 75. Non-negative Matrix Factorization (NMF) ● NMF models are interpretable and easier to understand ● NMF requires the sample features to be non-negative
  • 76. Mobile, IoT, and Similar Use Cases Quantization & Pruning
  • 77. Trends in adoption of smart devices
  • 78. Factors driving this trend ● Demands move ML capability from cloud to on-device ● Cost-effectiveness ● Compliance with privacy regulations
  • 79. Online ML inference ● To generate real-time predictions you can: ○ Host the model on a server ○ Embed the model in the device ● Is it faster on a server, or on-device? ● Mobile processing limitations?
  • 80. Classification request Prediction results Inference on the cloud/server Pros ● Lots of compute capacity ● Scalable hardware ● Model complexity handled by the server ● Easy to add new features and update the model ● Low latency and batch prediction Mobile inference Cons ● Timely inference is needed
  • 81. Pro ● Improved speed ● Performance ● Network connectivity ● No to-and-fro communication needed On-device Inference Mobile inference Cons ● Less capacity ● Tight resource constraints
  • 82. Options On-device inference On-device personalization On-device training Cloud-based web service Pretrained models Custom models ML Kit ✓ ✓ ✓ ✓ ✓ Core ML ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Model deployment * * Also supported in TFX
  • 83. Benefits and Process of Quantization Quantization & Pruning
  • 85. Why quantize neural networks? ● Neural networks have many parameters and take up space ● Shrinking model file size ● Reduce computational resources ● Make models run faster and use less power with low-precision
  • 86. Top 1 Accuracy 70 60 50 40 Runtime (ms) on Pixel 2 big core 10 5 100 50 8bit Snapdragon 835 MobileNets: Latency vs Accuracy trade-off Float
  • 87. Benefits of quantization ● Faster compute ● Low memory bandwidth ● Low power ● Integer operations supported across CPU/DSP/NPUs
  • 88. -127 127 -3e38 3e38 0 min max float32 int8 The quantization process
  • 89. What parts of the model are affected? ● Static values (parameters) ● Dynamic values (activations) ● Computation (transformations)
  • 90. Trade-offs ● Optimizations impact model accuracy ○ Difficult to predict ahead of time ● In rare cases, models may actually gain some accuracy ● Undefined effects on ML interpretability
  • 91. Choose the best model for the task
  • 93. ● Reduced precision representation ● Incur small loss in model accuracy ● Joint optimization for model and latency Post-training quantization
  • 94. Technique Benefits Dynamic range quantization 4x smaller, 2x-3x speedup Full integer quantization 4x smaller, 3x+ speedup float16 quantization 2x smaller, GPU acceleration Post-training quantization
  • 95. import tensorflow as tf converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) Post training quantization converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE] tflite_quant_model = converter.convert()
  • 97. Model accuracy ● Small accuracy loss incurred (mostly for smaller networks) ● Use the benchmarking tools to evaluate model accuracy ● If the loss of accuracy drop is not within acceptable limits, consider using quantization-aware training
  • 99. Quantization-aware training (QAT) ● Inserts fake quantization (FQ) nodes in the forward pass ● Rewrites the graph to emulate quantized inference ● Reduces the loss of accuracy due to quantization ● Resulting model contains all data to be quantized according to spec
  • 100. ● INT8 TensorFlow (tf.Keras) Apply QAT + Train model Convert & Quantize (TF Lite) TF Lite Model Quantization-aware training (QAT)
  • 102. ReLU6 Act quant + blases conv Wt quant output input Adding the quantization emulation operations weights
  • 103. import tensorflow_model_optimization as tfmot model = tf.keras.Sequential([ ... ]) QAT on entire model # Quantize the entire model. quantized_model = tfmot.quantization.keras.quantize_model(model) # Continue with training as usual. quantized_model.compile(...) quantized_model.fit(...)
  • 104. import tensorflow_model_optimization as tfmot quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer model = tf.keras.Sequential([ ... # Only annotated layers will be quantized. quantize_annotate_layer(Conv2D()), quantize_annotate_layer(ReLU()), Dense(), ... ]) # Quantize the model. quantized_model = tfmot.quantization.keras.quantize_apply(model) Quantize part(s) of a model
  • 105. quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer quantize_annotate_model = tfmot.quantization.keras.quantize_annotate_model quantize_scope = tfmot.quantization.keras.quantize_scope Quantize custom Keras layer model = quantize_annotate_model(tf.keras.Sequential([ quantize_annotate_layer(CustomLayer(20, input_shape=(20,)), DefaultDenseQuantizeConfig()), tf.keras.layers.Flatten() ]))
  • 106. # `quantize_apply` requires mentioning `DefaultDenseQuantizeConfig` with `quantize_scope` with quantize_scope( {'DefaultDenseQuantizeConfig': DefaultDenseQuantizeConfig, 'CustomLayer': CustomLayer}): # Use `quantize_apply` to actually make the model quantization aware. quant_aware_model = tfmot.quantization.keras.quantize_apply(model) Quantize custom Keras layer
  • 107. Model Optimization Results - Accuracy Model Top-1 Accuracy (Original) Top-1 Accuracy (Post Training Quantized) Top-1 Accuracy (Quantization Aware Training) Mobilenet-v1-1-224 0.709 0.657 0.70 Mobilenet-v2-1-224 0.719 0.637 0.709 Inception_v3 0.78 0.772 0.775 Resnet_v2_101 0.770 0.768 N/A
  • 108. Model Latency (Original) (ms) Latency (Post Training Quantized) (ms) Latency (Quantization Aware Training) (ms) Mobilenet-v1-1-224 124 112 64 Mobilenet-v2-1-224 89 98 54 Inception_v3 1130 845 543 Resnet_v2_101 3973 2868 N/A Model Optimization Results - Latency
  • 109. Model Size (Original) (MB) Size (Optimized) (MB) Mobilenet-v1-1-224 16.9 4.3 Mobilenet-v2-1-224 14 3.6 Inception_v3 95.7 23.9 Resnet_v2_101 178.3 44.9 Model Optimization Results
  • 111. Before pruning After pruning Connection pruning
  • 113. Before pruning After pruning Pruning synapses Pruning neurons Origins of weight pruning
  • 114. The Lottery Ticket Hypothesis
  • 115. Finding Sparse Neural Networks “A randomly-initialized, dense neural network contains a subnetwork that is initialized such that — when trained in isolation — it can match the test accuracy of the original network after training for at most the same number of iterations” Jonathan Frankle and Michael Carbin
  • 116. Pruning research is evolving ● The new method didn’t perform well at large scale ● The new method failed to identify the randomly initialized winners ● It’s an active area of research
  • 117. Tensors with no sparsity (left), sparsity in blocks of 1x1 (center), and the sparsity in blocks 1x2 (right) Eliminate connections based on their magnitude
  • 118. Example of sparsity ramp-up function with a schedule to start pruning from step 0 until step 100, and a final target sparsity of 90%. Apply sparsity with a pruning routine
  • 119. Animation of pruning applied to a tensor Black cells indicate where the non-zero weights exist Sparsity increases with training
  • 120. What’s special about pruning? ● Better storage and/or transmission ● Gain speedups in CPU and some ML accelerators ● Can be used in tandem with quantization to get additional benefits ● Unlock performance improvements
  • 121. Pruning with TF Model Optimization Toolkit ● INT8 TensorFlow (tf.Keras) Sparsify + Train model Convert & Quantize (TF Lite) TF Lite Model
  • 122. import tensorflow_model_optimization as tfmot model = build_your_model() pruning_schedule = tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.50, final_sparsity=0.80, begin_step=2000, end_step=4000) model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude( model, pruning_schedule=pruning_schedule) ... model_for_pruning.fit(...) Pruning with Keras
  • 123. Model Non-sparse Top-1 acc. Sparse acc. Sparsity Inception V3 78.1% 78.0% 50% 76.1% 75% 74.6% 87.5% Mobilenet V1 224 71.04% 70.84% 50% Model Non-sparse BLEU Sparse BLEU Sparsity GNMT EN-DE 26.77 26.86 80% 26.52 85% 26.19 90% GNMT DE-EN 29.47 29.50 80% 29.24 85% 28.81 90% Results across different models & tasks