2. Agenda
• What
is
machine
learning?
• Why
machine
learning
and
why
now?
• Machine
learning
terminology
• Overview
of
machine
learning
methods
• Machine
learning
to
deep
learning
• Summary
and
Q
&
A
iksinc@yahoo.com
4. What
is
Machine
Learning?
• Machine
learning
deals
with
making
computers
learn
to
make
predic)ons/decisions
without
explicitly
programming
them.
Rather
a
large
number
of
examples
of
the
underlying
task
are
shown
to
op)mize
a
performance
criterion
to
achieve
learning.
iksinc@yahoo.com
5. An
Example
of
Machine
Learning:
Credit
Default
Predic)on
We
have
historical
data
about
businesses
and
their
delinquency.
The
data
consists
of
100
businesses.
Each
business
is
characterized
via
two
a7ributes:
business
age
in
months
and
number
of
days
delinquent
in
payment.
We
also
know
whether
a
business
defaulted
or
not.
Using
machine
learning,
we
can
build
a
model
to
predict
the
probability
whether
a
given
business
will
default
or
not.
0
20
40
60
80
100
0
100
200
300
400
500
iksinc@yahoo.com
6. Logis)c
Regression
• The
model
that
is
used
here
is
called
the
logis&c
regression
model.
Lets
look
at
the
following
expression
,
where
x1,
x2,…,
xk
are
the
a7ributes.
• In
our
example,
the
a7ributes
are
business
age
and
number
of
days
of
delinquency.
• The
quan)ty
p
will
always
lie
in
the
range
0-‐1
and
thus
can
be
interpreted
as
the
probability
of
outcome
being
default
or
no
default.
p =
e(a0+a1x1...+ak xk )
1+e(a0+a1x1...+ak xk )
iksinc@yahoo.com
7. Logis)c
Regression
• By
simple
rewri)ng,
we
get:
log(p/(1-‐p))
=
a0
+
a1x1
+
a2x2
+·∙·∙·∙
+
akxk
• This
ra)o
is
called
log
odds
• The
parameters
of
the
logis)c
model,
a0
,
a1,…,
ak,
are
learned
via
an
op)miza)on
procedure
• The
learned
parameters
can
then
be
deployed
in
the
field
to
make
predic)ons
iksinc@yahoo.com
8. 0
0.2
0.4
0.6
0.8
1
1.2
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
Only
in
rare
cases,
we
get
a
100%
accurate
model.
Model
Details
and
Performance
Plot
of
predicted
default
probability
iksinc@yahoo.com
9. Using
the
Model
• What
is
the
probability
of
a
business
defaul)ng
given
that
business
has
been
with
the
bank
for
26
months
and
is
delinquent
for
58
days?
e0.008*26+0.102*58-‐5.706/
(1+e0.008*26+0.102*58-‐5.706)
0.603
Plug
the
model
parameters
to
calculate
p
BUSAGE:
0.008;
DAYSDELQ:
0.102;
Intercept:
-‐5.076
iksinc@yahoo.com
12. Buzz
about
Machine
Learning
"Every
company
is
now
a
data
company,
capable
of
using
machine
learning
in
the
cloud
to
deploy
intelligent
apps
at
scale,
thanks
to
three
machine
learning
trends:
data
flywheels,
the
algorithm
economy,
and
cloud-‐hosted
intelligence."
Three
factors
are
making
machine
learning
hot.
These
are
cheap
data,
algorithmic
economy,
and
cloud-‐based
solu)ons.
iksinc@yahoo.com
13. Data
is
gemng
cheaper
For
example,
Tesla
has
780
million
miles
of
driving
data,
and
adds
another
million
every
10
hours
iksinc@yahoo.com
16. Cloud-‐Based
Intelligence
Emerging
machine
intelligence
plaoorms
hos)ng
pre-‐trained
machine
learning
models-‐as-‐a-‐service
are
making
it
easy
for
companies
to
get
started
with
ML,
allowing
them
to
rapidly
take
their
applica)ons
from
prototype
to
produc)on.
Many
open
source
machine
learning
and
deep
learning
frameworks
running
in
the
cloud
allow
easy
leveraging
of
pre-‐
trained,
hosted
models
to
tag
images,
recommend
products,
and
do
general
natural
language
processing
tasks.
iksinc@yahoo.com
20. Feature
Vectors
in
ML
• A
machine
learning
system
builds
models
using
proper)es
of
objects
being
modeled.
These
proper)es
are
called
features
or
a@ributes
and
the
process
of
measuring/obtaining
such
proper)es
is
called
feature
extrac&on.
It
is
common
to
represent
the
proper)es
of
objects
as
feature
vectors.
Sepal
width
Sepal
length
Petal
width
Petal
length
x =
2
6
6
4
x1
x2
x3
x4
3
7
7
5
iksinc@yahoo.com
21. Learning
Styles
• Supervised
Learning
– Training
data
comes
with
answers,
called
labels
– The
goal
is
to
produce
labels
for
new
data
iksinc@yahoo.com
22. Supervised
Learning
Models
• Classifica)on
models
– Predict
whether
a
customer
is
likely
to
be
lost
to
compe)tor
– Tag
objects
in
a
given
image
– Determine
whether
an
incoming
email
is
spam
or
not
iksinc@yahoo.com
23. Supervised
Learning
Models
• Regression
models
– Predict
credit
card
balance
of
customers
– Predict
the
number
of
'likes'
for
a
pos)ng
– Predict
peak
load
for
a
u)lity
given
weather
informa)on
iksinc@yahoo.com
24. Learning
Styles
• Unsupervised
Learning
– Training
data
comes
without
labels
– The
goal
is
to
group
data
into
different
categories
based
on
similari)es
Grouped
Data
iksinc@yahoo.com
25. Unsupervised
Learning
Models
• Segment/
cluster
customers
into
different
groups
• Organize
a
collec)on
of
documents
based
on
their
content
• Make
Recommenda)ons
for
products
iksinc@yahoo.com
26. Learning
Styles
• Reinforcement
Learning
– Training
data
comes
without
labels
– The
learning
system
receives
feedback
from
its
opera)ng
environment
to
know
how
well
it
is
doing
– The
goal
is
to
perform
be7er
iksinc@yahoo.com
28. Walk
Through
An
Example:
Flower
Classifica)on
• Build
a
classifica)on
model
to
differen)ate
between
two
classes
of
flower
iksinc@yahoo.com
29. How
Do
We
Go
About
It?
• Collect
a
large
number
of
both
types
of
flowers
with
the
help
of
an
expert
• Measure
some
a7ributes
that
can
help
differen)ate
between
the
two
types
of
flowers.
Let
those
a7ributes
be
petal
area
and
sepal
area.
iksinc@yahoo.com
31. We
can
separate
the
flower
types
using
the
linear
boundary
shown
above.
The
parameters
of
the
line
represent
the
learned
classifica)on
model.
iksinc@yahoo.com
32. Another
possible
boundary.
This
boundary
cannot
be
expressed
via
an
equa)on.
However,
a
tree
structure
can
be
used
to
express
this
boundary.
Note,
this
boundary
does
be7er
predic)on
of
the
collected
data
iksinc@yahoo.com
33. Yet
another
possible
boundary.
This
boundary
does
predic)on
without
any
error.
Is
this
a
be7er
boundary?
iksinc@yahoo.com
34. Model
Complexity
• There
are
tradeoffs
between
the
complexity
of
models
and
their
performance
in
the
field.
A
good
design
(model
choice)
weighs
these
tradeoffs.
• A
good
design
should
avoid
overfimng.
How?
– Divide
the
en)re
data
into
three
sets
• Training
set
(about
70%
of
the
total
data).
Use
this
set
to
build
the
model
• Test
set
(about
20%
of
the
total
data).
Use
this
set
to
es)mate
the
model
accuracy
auer
deployment
• Valida)on
set
(remaining
10%
of
the
total
data).
Use
this
set
to
determine
the
appropriate
semngs
for
free
parameters
of
the
model.
May
not
be
required
in
some
cases.
iksinc@yahoo.com
35. Measuring
Model
Performance
• True
Posi)ve:
Correctly
iden)fied
as
relevant
• True
Nega)ve:
Correctly
iden)fied
as
not
relevant
• False
Posi)ve:
Incorrectly
labeled
as
relevant
• False
Nega)ve:
Incorrectly
labeled
as
not
relevant
Image:
True
Posi)ve
True
Nega)ve
Cat
vs.
No
Cat
False
Nega)ve
False
Posi)ve
iksinc@yahoo.com
36. Precision,
Recall,
and
Accuracy
• Precision
– Percentage
of
posi)ve
labels
that
are
correct
– Precision
=
(#
true
posi)ves)
/
(#
true
posi)ves
+
#
false
posi)ves)
• Recall
– Percentage
of
posi)ve
examples
that
are
correctly
labeled
– Recall
=
(#
true
posi)ves)
/
(#
true
posi)ves
+
#
false
nega)ves)
• Accuracy
– Percentage
of
correct
labels
– Accuracy
=
(#
true
posi)ves
+
#
true
nega)ves)
/
(#
of
samples)
iksinc@yahoo.com
37. Sum-‐of-‐Squares
Error
for
Regression
Models
For
regression
model,
the
error
is
measured
by
taking
the
square
of
the
difference
between
the
predicted
output
value
and
the
target
value
for
each
training
(test)
example
and
adding
this
number
over
all
examples
as
shown
iksinc@yahoo.com
38. Bias
and
Variance
• Bias:
expected
difference
between
model’s
predic)on
and
truth
• Variance:
how
much
the
model
differs
among
training
sets
• Model
Scenarios
– High
Bias:
Model
makes
inaccurate
predic)ons
on
training
data
– High
Variance:
Model
does
not
generalize
to
new
datasets
– Low
Bias:
Model
makes
accurate
predic)ons
on
training
data
– Low
Variance:
Model
generalizes
to
new
datasets
iksinc@yahoo.com
40. Model
Building
Algorithms
• Supervised
learning
algorithms
– Linear
methods
– k-‐NN
classifiers
– Neural
networks
– Support
vector
machines
– Decision
trees
– Ensemble
methods
iksinc@yahoo.com
41. Illustra)on
of
k-‐NN
Model
Predicted
label
of
test
example
with
1-‐NN
model
:
Versicolor
Predicted
label
of
text
example
with
3-‐NN
model:
Virginica
Test
example
iksinc@yahoo.com
42. Illustra)on
of
Decision
Tree
Model
Petal
width
<=
0.8
Setosa
Yes
Petal
length
<=
4.75
Versicolor
Virginica
Yes
No
No
The
decision
tree
is
automa)cally
generated
by
a
machine
learning
algorithm.
iksinc@yahoo.com
43. Model
Building
Algorithms
• Unsupervised
learning
– k-‐means
clustering
– Agglomera)ve
clustering
– Self
organiza)on
feature
maps
– Recommenda)on
system
iksinc@yahoo.com
44. K-‐means
Clustering
K-m
“by far the
clusterin
nowadays in
industrial
Choose
the
number
of
clusters,
k,
and
ini)al
cluster
centers
45. K-‐means
Clustering
K-m
“by far the
clusterin
nowadays in
industrial
K-means clustering
2
K-means clustering
2
K-means clustering
2
Assign
data
points
to
clusters
based
on
distance
to
cluster
centers
46. K-‐means
Clustering
K-m
“by far the
clusterin
nowadays in
industrial
K-means clustering
2
K-means clustering
2
K-means clustering
2
K-means clustering p
(sum of square dis
from data points to
centers)
minimize
N
n=1
⇥xn cente
3
Update
cluster
centers
and
reassign
data
points.
K-means
K-means clustering problem
(sum of square distances
from data points to cluster
minimize
N
n=1
⇥xn centern⇥
2
49. Steps
Towards
a
Machine
Learning
Project
• Collect
data
• Explore
data
via
sca7er
plots,
histograms.
Remove
duplicates
and
data
records
with
missing
values
• Check
for
dimensionality
reduc)on
• Build
model
(itera)ve
process)
• Transport/Integrate
with
an
applica)on
iksinc@yahoo.com
51. Machine
Learning
Limita)on
• Machine
learning
methods
operate
on
manually
designed
features.
• The
design
of
such
features
for
tasks
involving
computer
vision,
speech
understanding,
natural
language
processing
is
extremely
difficult.
This
puts
a
limit
on
the
performance
of
the
system.
iksinc@yahoo.com
Feature
Extractor
Trainable
Classifier
52. Processing
Sensory
Data
is
Hard
How
do
we
bridge
this
gap
between
the
pixels
and
meaning
via
machine
learning?
53. Sensory
Data
Processing
is
Challenging
So
why
not
build
integrated
learning
systems
that
perform
end-‐to-‐end
learning,
i.e.
learn
the
representa)on
as
well
as
classifica)on
from
raw
data
without
any
engineered
features.
Feature
Learner
Trainable
Classifier
An
approach
performing
end-‐to-‐end
learning,
typically
performed
through
a
series
of
successive
abstrac)ons,
is
in
a
nutshell
deep
learning
54. SegNet
is
a
deep
learning
architecture
for
pixel
wise
seman)c
segmenta)on
from
the
University
of
Cambridge.
An
example
of
deep
learning
Capability
55. Summary
• We
have
just
skimmed
machine
learning
at
surface
• Web
is
full
of
reading
resources
(free
books,
lecture
notes,
blogs,
videos)
to
dig
into
machine
learning
• Several
open
source
souware
resources
(R,
Rapid
Miner,
and
Scikit-‐learn
etc.)
to
learn
via
experimenta)on
• Applica)ons
based
on
vision,
speech,
and
natural
language
processing
are
excellent
candidates
for
deep
learning
iksinc@yahoo.com