SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
Linear Models
Palacode Narayana Iyer Anantharaman
Narayana dot Anantharaman at gmail dot com
23 Aug 2017
Copyright 2016 JNResearch, All Rights Reserved
Linear Models
• Linear models constitute a space of hypothesis that assume the
output of a system is a linear function of the input
• Regression problems are those with continuous variables as output
• The term regression sometimes is confusing as some of us may think of
regression as in “regression testing”, which is a different concept. In
statistics and in machine learning, regression problems are simply
those that produce real valued outputs.
• Classification problems are those that label the given input in to one
or more labels from a finite label set
• Linear Regression is a machine learning technique that predicts an
output assuming a linear relation between the output and input
• Logistic Regression is a classification technique assuming a linear
decision boundary
• Though a sigmoid function is used in Logistic Regression, it is still
considered a linear model, as the decision surface is linear
Copyright 2016 JNResearch, All Rights Reserved
input
Output
x1
x2
Some Example applications of Linear Regression
• Time series predictions
• TV advertising: Given the TRP rating data across a number of TV programs, can we determine the serial to advertise?
• You are the director of a mega serial and you have the viewership information from past weeks. Can you predict the
future trends and do some intervention as needed?
• Evaluating trends and making forecasts
• Suppose your company offers infrastructure as a service over cloud and you are required to plan periodic upgrades to
the hardware. Given the customer acquisition rate, usage patterns, etc. can you predict the capacity upgrade?
• Assessing credit limits
• If you are a bank and have received a home loan application, can you determine the optimal loan amount and a
custom EMI scheme that maximizes mutual benefit?
• Predict the onset of a disease
• Given the historical data on patients with diabetes, predict the probability of a new patient getting this disease
Copyright 2016 JNResearch, All Rights Reserved
Copyright 2016 JNResearch, All Rights Reserved
Example: Banking Sector (Ref ToI, 12th July 2016)
Understanding Machine Learning Models
• What does the model compute?
• Hypothesis
• How do we learn the parameters?
• Training
• How do we ensure we learnt the right parameters?
• Validation
• Given a trained model, how do we predict?
• Inference
• What can affect the performance of the model?
• Overfitting, Underfitting
• How do we measure the performance of the model?
• What are the limitations of the model?
• Precision, Recall, F1 Score, Confusion Matrix
Copyright 2016 JNResearch, All Rights Reserved
Measures
Copyright 2016 JNResearch, All Rights Reserved
Expected + Expected -
Predicted + tp fp
Predicted - Fn tn
Precision P = tp/(tp + fp)
Recall R = tp/(tp + fn)
F1 Score = 2PR/(P+R)
Linear Regression Hypothesis
• A supervised machine learning hypothesis is a model that
attempts to behave as the true but unknown target function:
𝑓: 𝑋 → 𝑌
• Given the training data (inputs, targets), the model learns to
emulate f by maximizing the probability of training data
• Linear regression assumes f to be a linear function as given by:
ℎ 𝜃 = 𝜃0 + 𝜃1 𝑥1 + ⋯ + 𝜃 𝑛 𝑥 𝑛
• The values of 𝜃 are to be determined by the training procedure
such that the hypothesis ℎ 𝜃 approximates 𝑓(𝑥)
Copyright 2016 JNResearch, All Rights Reserved
Training Set
Learning
Algorithm
h
Patient’s
data
Probability
of disease
Estimating Model Parameters
• At a high level, we need to:
• Start with some initial model (that is some random values of theta)
• Estimate the output
• Determine the cost of misclassification (in the case of classification problems) or
inaccurate estimation (in the case of regression problems)
• Using the training data, iteratively modify the model parameters such that the
cost is minimized
• Output the model when the cost is minimum (that is when the iteration
converges)
• Once the model is estimated it can be used for predicting/estimating new inputs
(test data)
Cost Function for Linear Regression
• The cost function is often dictated by the requirements of the application.
Though several widely used cost functions (e.g. mean squared error) exist, one
may come up with his/her own if needed
• The most commonly used cost function for regression problems is the squared
error, defines as:
J(𝜃) =
1
2𝑚
∗ 𝑖=1
𝑚
(ℎ 𝜃(𝑥 𝑖) − 𝑦 𝑖)2
• Our goal is to minimize J(𝜃) with respect to 𝜃
• J(𝜃) as a function of 𝜃 is a quadratic function. It is a convex function with a single
minimum point and is differentiable.
Finding the minimum of J(𝜃)
• We need to find 𝜃 by minimizing: J(𝜃) =
1
2𝑚
∗ 𝑖=1
𝑚
(ℎ 𝜃(𝑥 𝑖
) − 𝑦 𝑖
)2
• To determine the min we differentiate the cost function and set it to zero
• The above form of J(𝜃) is differentiable and can be solved analytically. For the ith training example
and for the parameter 𝜃𝑗, the derivative is:
𝜕𝐽
𝜕𝜃𝑗
= ℎ 𝜃 𝑥 𝑖 − 𝑦 𝑖 𝑥𝑗
𝑖
• We use a numerical way of determining the differentials because:
• Analytical solutions involve matrix multiplications and for a large dataset with a large number of
training examples and features, there may be memory overflows
• For some cost functions, closed form solutions are not available.
• Gradient Descent algorithm is widely used to compute parameter updates
Copyright 2016 JNResearch, All Rights Reserved
Batch Gradient Descent Algorithm
Copyright 2016 JNResearch, All Rights Reserved
J(𝜃) =
1
2𝑚
∗ 𝑖=1
𝑚
(ℎ 𝜃(𝑥 𝑖) − 𝑦 𝑖)2
Repeat {
𝜃𝑗 = 𝜃𝑗 − 𝛼
1
𝑚
𝑖=1
𝑚
(ℎ 𝜃 𝑥 𝑖 − 𝑦 𝑖)𝑥𝑗
𝑖
} for every j from 1 to n
α is the learning rate, which is a hyper parameter that is usually set to a small fraction
Learning Rate in Gradient descent
• It is necessary to choose the learning rate α
to have an optimal value
• Small values of α result in more number of
iterations for convergence while a large
value of α may make the algorithm
overshoot the minimum point and J(𝜃) may
not decrease.
• To choose α try 0.001, 0.003 and so on
J(𝜽)
𝜽
Effect of initialization
• Does the parameter initialization have an impact on determining the
right model that has minimum cost?
• What happens if we set all parameters to zero during initialization?
Copyright 2016 JNResearch, All Rights Reserved
Minimizing Loss: Gradient Descent
• Batch gradient descent
• Given m training examples, compute J(𝜃) as the total cost across all examples and use this to
perform parameter updates.
• Batch gradient descent uses ALL the examples in order to perform a single parameter update.
• Minibatch gradient descent
• Segment the m examples in to batches of a batch size b, compute J(𝜃) for a batch, update the
parameters and do this for all batches (m/b)
• Mini batch gradient descent uses b examples for a single parameter update
• Stochastic Gradient Descent
• For each training example, compute the cost and update the parameters. This is equivalent to
minibatch with a batch size b = 1
Copyright 2016 JNResearch, All Rights Reserved
Applying Linear Regression to classification
• As described, linear regression produces the output as a continuous variable.
That is both h(x) as well as y are continuous variables.
• If we constrain y to be discrete, say -1 or 1, then the same linear regression model
can be applied to classification problems also.
• However the decision surfaces are inflexible
• They can be used to generate the initial model parameters to another more
specialized classification algorithm
Linear Regression for non linear features
• What if the underlying target function is a non
linear function with respect to the features?
• We can still use linear models with a pre
processing step
• Suppose we have: 𝑋 = (𝑥1, 𝑥2) and the output
is a quadratic function of input.
• Form new features: (𝑥1, 𝑥2, 𝑥1
2
, 𝑥2
2
, 𝑥1 𝑥2) and
corresponding theta parameters.
• Train as before and get the new model
• For prediction, perform same transformation
and use the linear model as before
Copyright 2016 JNResearch, All Rights Reserved
time
views
Case Study
Copyright 2016 JNResearch, All Rights Reserved
• Problem
• Suppose we have videos of a TV serial that get uploaded on the YouTube on a daily basis
• Each day there will be a new episode and will be uploaded
• Our goal is to predict the number of views of any given episode on the day 3 after it was
uploaded.
• Assume that we don’t have access to the special analytics that YouTube may be providing to their
customers
• Solution
• Model the growth rate of the number of views with respect to time. This can be done by
visualizing the data and arriving at a suitable model.
• Perform the transformation of input vectors as needed
• Train the system with the data available from recent episodes
• Validate, test
Logistic Regression
Copyright 2016 JNResearch, All Rights Reserved
• Used for addressing classification problems
• Examples:
• Email Spam classification
• Loan approvals
• Text classification
• Here the output is a discrete value. For binary classification, where the output is
either 0 or 1, we have:
𝑦 ∈ {0, 1}
Logistic Regression: 0 ≤ ℎ 𝜃(𝑥) ≤ 1
Logistic Regression Hypothesis
Copyright 2016 JNResearch, All Rights Reserved
• Logistic Regression is a linear regression model
with a sigmoid at the output
• Sigmoid is a squashing function that is differentiable
with output bounds 0, 1
• ℎ 𝜃(x) is the probability of output y = 1 on the
given input x, parameterized by 𝜃
• Note: The logistic regression system outputs a valid
probability
• Question: Suppose a function f(x) outputs a
real number between 0 to 1 for any x. Can this
be considered as generating a valid probability
distribution?
ℎ 𝜃 𝑥 = 𝑔(𝜃 𝑇
𝑥)
𝑔 𝑧 =
1
1 + 𝑒−𝑧
Logistic Regression: Decision Boundary
• Predicts the output to be 1 if h 𝜃(x) >= 0.5, predict 0 if h 𝜃(x) < 0.5
• This implies: predict 1 if 𝜃Tx >= 0, predict 0 if 𝜃Tx < 0
• Example:
Let 𝜃 be the vector (-3, 1, 1). We have: h 𝜃(x) = -3𝑥0 + 𝑥1 + 𝑥2
Predict y = 1 if -3𝑥0 + 𝑥1 + 𝑥2 >= 0, else predict y = 0
x1
x2
1 2 3
1
2
3
Logistic Regression Cost Function
The cost function for logistic regression is usually the cross entropy
error function.
𝐽 𝜃 = −
1
𝑚
1
𝑚
𝑦 𝑖 log ℎ 𝜃 𝑥 𝑖 + 1 − 𝑦 𝑖 log(1 − (ℎ 𝜃 𝑥 𝑖 )
Copyright 2016 JNResearch, All Rights Reserved
Let’s take a brief digression to understand the cross entropy concept in the next 2 slides
Cost Function - intuition
𝐽 𝜃 = −
1
𝑚
1
𝑚
𝑦 𝑖 log ℎ 𝜃 𝑥 𝑖 + 1 − 𝑦 𝑖 log(1 − (ℎ 𝜃 𝑥 𝑖 )
• The cost can be split in to two parts: cost incurred when
true output y = 0 and the cost when y = 1
𝐶𝑜𝑠𝑡 ℎ 𝜃 𝑥 , 𝑦 = − log ℎ 𝜃 𝑥 𝑤ℎ𝑒𝑛 𝑦 = 1
𝐶𝑜𝑠𝑡 ℎ 𝜃 𝑥 , 𝑦 = − log 1 − ℎ 𝜃 𝑥 𝑤ℎ𝑒𝑛 𝑦 = 0
• From the above it is easy to see:
Cost = 0 when y = 1 and ℎ 𝜃 𝑥 = 1 but as ℎ 𝜃 𝑥 → 0, 𝐶𝑜𝑠𝑡 → ∞
Similarly we can analyse the cost for the case where y = 0
Copyright 2016 JNResearch, All Rights Reserved
Gradient Descent for logistic regression
• Using the cross entropy error function with the logistic regression
hypothesis function, we can show that the derivatives turn out to be
of the same form as linear regression
• Hence the same gradient descent algorithm works for logistic
regression with ℎ 𝜃 𝑥 replaced with the sigmoid
Copyright 2016 JNResearch, All Rights Reserved
Overfitting and Underfitting
• Consider the adjoining figure where:
• Let’s say the unknown target function is close to a Sine wave and the
training samples are shown as x marks
• We fit different polynomials: constant, linear, cubic and higher order
polynomials and the model is shown as a blue coloured curve
• We observe:
• Fit with constant or linear is too simplistic
• Fit with polynomials of high order, say, order = 9, are complex models
and they try to pass through every sample in the training data.
Copyright 2016 JNResearch, All Rights Reserved
Underfitting: High Bias Problem
• Fitting a simplistic (say, linear) model to a highly non linear underlying target
function would result in high training error as well as test error.
• Though the training samples show a non linearity, we pretend as if everything is
simple and try to fit the data with our biased view of the world! Due to this, we
say this is a “bias” problem.
• Suppose we need to do face recognition given an image, is it possible to use linear models
and hope for great accuracy?
• This is termed underfitting or high bias
Copyright 2016 JNResearch, All Rights Reserved
Overfitting: High Variance Problem
• If we use a complex model (say, a high order
polynomial of degree > 5) we end up trying to account
for every training sample.
• This is like fitting the data rather than emulating the
underlying target process.
• This kind of model can’t see the wood for the trees!
• This approach is error prone because:
• This is very sensitive to noise and outliers
• Fitting very tightly to the training data is memorizing and not
generalizing!
Copyright 2016 JNResearch, All Rights Reserved
Training and Test Errors
• When the model is too simple for a given
application, both the training and test errors
are high due to underfitting
• When the model is the right fit, the test error
is at the minimum
• If we increase the model complexity still
further, the training error continues to reduce
but the test error shoots up
Copyright 2016 JNResearch, All Rights Reserved
Model Complexity
Error
Training error
Test error
If a model fits the training data well, it doesn’t
automatically mean that it would fit the test data well
Copyright 2016 JNResearch, All Rights Reserved
How to avoid underfitting?
• Assuming we have adequate amount of training data, we could
increase the model complexity
• Use a higher order polynomial to fit the data
• Use a more powerful model (Neural Nets, Deep Learning, etc)
• Add more relevant features
Copyright 2016 JNResearch, All Rights Reserved
How to avoid overfitting?
• When we use a complex model such as the one with higher order polynomials,
we need to keep the weights (theta values) small.
• Without any constraints in choosing the parameters, complex models tend to
have larger weights when they attempt to fit every training example.
• This is similar to having too many knobs to turn and each knob set to whatever value without
constraints in order to fit the data
• One way to minimize the impact of overfitting is to use regularization
Copyright 2016 JNResearch, All Rights Reserved
Regularization
• Small values of parameters lead to simpler hypothesis and reduces overfitting
• We can keep the parameters under check by introducing a regularization term to
the cost function as below:
J(𝜃) =
1
2𝑚
∗ 𝑖=1
𝑚
(ℎ 𝜃(𝑥 𝑖
) − 𝑦 𝑖
)2
+ 𝜆 𝑖 𝜃𝑗
2
𝜆 𝑖𝑠 𝑡ℎ𝑒 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
Copyright 2016 JNResearch, All Rights Reserved
𝐽 𝜃 = −
1
𝑚
1
𝑚
𝑦 𝑖
log ℎ 𝜃 𝑥 𝑖
+ 1 − 𝑦 𝑖
log(1 − (ℎ 𝜃 𝑥 𝑖
) +
𝜆
2𝑚
𝑖
𝜃𝑗
2
Multiclass Classification
• The logistic regression we discussed is a binary classifier
• Many real world applications may require a multi class classification
• Examples
• Text classification: Suppose we want to categorize a document in to (Technical, General,
Sports, Movies, Business). If there is a news article that reads something like: “Sachin
Tendulkar was awarded Bharat Ratna”, we might label this both as general news as well as an
item pertaining to sports.
Copyright 2016 JNResearch, All Rights Reserved
One vs All Logistic Classifier
• To perform multiclass classifier for K classes, build K
independent binary logistic classifiers each with their
parameter vectors.
• A binary classifier for label j is trained to predict for y =
j and treat all other classes as not(y=j)
Copyright 2016 JNResearch, All Rights Reserved
x1
x2
x1
x2
x1
x2
How does Naïve Bayes Relates to Logistic
Regression?
• Some interesting questions to think about are:
• Does logistic regression have equivalence to Naïve Bayes Classifier?
• What kind of decision surface does the Naïve Bayes classify?
• With a few simple mathematical steps the equivalence between the
two models can be demonstrated
Copyright 2016 JNResearch, All Rights Reserved
Softmax Classifier
• A multiclass logistic regression allows labelling over K classes (instead of just
binary) by running K independent regressors in parallel.
• Thus, the outputs 𝑖 ∈ 𝐾 𝑎𝑛𝑑 𝑗 ∈ 𝐾 𝑤ℎ𝑒𝑟𝑒 𝑖 ≠ 𝑗 are produced independently,
except that they share the same input.
• On many situations, it may be more useful to generate a probability distribution
over the output classes for a given input.
• Softmax classifier is a multinomial logistic regression classifier that addresses this
requirement
Copyright 2016 JNResearch, All Rights Reserved
Softmax Hypothesis
• Recall that the logistic is a linear regression with a sigmoid discriminator output
• Let the real valued output from the linear transformation 𝜃 𝑇
𝑥 be a score s
𝑠 = 𝑓 𝑋; 𝜃 𝑤ℎ𝑒𝑟𝑒 𝑓 𝑖𝑠 𝑎 𝑙𝑖𝑛𝑒𝑎𝑟 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑠 𝑡ℎ𝑒 𝑖𝑛𝑝𝑢𝑡
• The softmax output is a probability distribution given by:
𝑃 𝑌 = 𝑘 𝑋 =
exp(𝑠 𝑘)
𝑗 exp(𝑠𝑗)
• It is easy to show that P(Y|X) as above is a valid probability distribution.
Copyright 2016 JNResearch, All Rights Reserved
Cost Function for Softmax
• The hypothesis expression of a softmax classifier is a probability distribution over
the output classes
• Given the training data, we would like to maximize the likelihood or equivalently,
the log likelihood
• Log is a monotonic function of its input, maximizing log(x) is equivalent to maximizing x.
• We use log due to mathematical convenience
• Maximizing log likelihood is same as minimizing the negative log likelihood (NLL)
• Thus, our cost function is: 𝐿𝑖 = − log( 𝑃 𝑌 = 𝑦𝑖 𝑋 = 𝑋𝑗 )
𝐿𝑖 = − log( 𝑃 𝑌 = 𝑦𝑖 𝑋 = 𝑋𝑗 ) = −log(
exp(𝑠𝑖)
𝑧 exp(𝑠𝑧)
)
Copyright 2016 JNResearch, All Rights Reserved
Code Walkthrough
Copyright 2016 JNResearch, All Rights Reserved
Linear Models: Hands on
• We will work on 2 sample datasets:
• A toy dataset using a simulator
• Mobile devices specs dataset
• Our goal is to perform both regression and classification tasks on each of these
datasets
Copyright 2016 JNResearch, All Rights Reserved
Steps
• Target function generator is a simulator
that generates linear decision surfaces.
Go through the code so that it can be
modified later
• linear_model_test.py implements a
linear and logistic regression using
Keras framework. Go through the code
in this file and run the code.
• The program will request user input:
(Number of samples and fraction of
data for validation). You can start with
1000 and 0.2
• You can start with 2 dimensions to start
with and can increase subsequently
• For different configurations measure
the training as well as validation
accuracy
Copyright 2016 JNResearch, All Rights Reserved
Steps
• Review the dataset
mobile_phones_data.csv from the
datasets directory.
• You can use the function
get_data_from_csv(…) for getting
relevant fields from the dataset for our
experiments
• Run the code mobile_linear_reg.py and
observe the accuracy. Modify the code
in order to get the value of accuracy
displayed on the screen
• You are required to modify the code
under the while True loop in order to
test the program.
Generative Versus Discriminative Models
• Key notion behind the generative models is that the
abstract (or hidden or latent) state generates the
observations. A given abstract state has a likelihood of
generating the given observations.
• Naïve Bayes Classifier is an example of generative model
• As opposed to the generative models, the
discriminative models take the observations as given
and try to figure out what hidden state might best
explain the observations.
• Logistic Regression is an example of discriminative model
Copyright 2016 JNResearch, All Rights Reserved
Y
X1 X2
Y
X1 X2
Generative Versus Discriminative Models
• Generative models are characterized by joint distributions.
• We ask the question: What is the probability of the observations X and the abstract label Y to co occur?
That is: P(X, Y)
• Discriminative models are conditional models. They are characterized by the conditional
probability distribution
• Given the observations (features), what is the class label that is most likely?
• As we model the joint distribution in generative models, we need both the prior P(Y) as well as
the likelihood P(X|Y) in order to determine the posterior P(Y|X)
• Discriminative model do not need to model hidden states P(Y) as they directly learn P(Y|X) from
the observations or features
Copyright 2016 JNResearch, All Rights Reserved
Back to the earlier case study…
• How do we apply logistic or softmax regression to the twitter search problem we discussed?
• For a domain specific problem like this (where the labels are fixed and given), we can take advantage of some
prior knowledge, instead of depending only on observed data
• Example of prior knowledge: The term “democrats” is associated with US Congress, Sonia with Indian National
Congress, Internet with Mobile World Congress and so on.
Copyright 2016 JNResearch, All Rights Reserved
Using Logistic for text classification problem
• A key issue to handle when using logistic or softmax for NLP applications is the
arbitrary variable length of sentences.
• When we use the words in a sentence as features (as we did in the case of Naïve
Bayes), the feature length n becomes a variable
• Logistic requires a fixed number of features and the associated theta parameters.
• Some options to consider to fit the input to a logistic:
• Regardless of the size of the input sentence, always form an n-feature vector. If the sentence
length is > n, chop off extra words and if it is < n, pad it suitably
• Split a long sentence in fixed size blocks of n
• As we observe, such pre processing of input for the sake of fitting it in to a
Logistic regression might degrade the prediction accuracy
• Solution: Use a different model 
Copyright 2016 JNResearch, All Rights Reserved
MaxEnt Models
• MaxEnt models support an elegant way to map a variable length input in to a
fixed size feature vector.
• This is a powerful model that has equivalence to logistic regression
• Many NLP problems can be reformulated as classification problems
• E.g. Language Modelling, Tagging Problems (these will be covered in detail later)
• MaxEnt is widely used for various text processing tasks.
• Task is to estimate the probability of a class given the context
• The term context may refer to a single word or group of words
• The key idea is to use a bunch of “feature functions” that map the input in to a
fixed length feature vector. Often these are Boolean functions.
• MaxEnt models are also termed as log-linear models
Copyright 2016 JNResearch, All Rights Reserved
Summary
• Linear models assume a linear decision boundary and are simple
models
• Though the models are simple, they are extensively used as:
• Stand alone classifiers
• As a layer or a building block in multi layer models (such as Neural Networks,
CNNs, RNNs)
• MaxEnt models are widely used in NLP applications
Copyright 2016 JNResearch, All Rights Reserved

Más contenido relacionado

La actualidad más candente

Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processingananth
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorizationBalázs Hidasi
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Balázs Hidasi
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learningBabu Priyavrat
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine LearningPranav Ainavolu
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Modelsbutest
 

La actualidad más candente (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorization
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 
Scene understanding
Scene understandingScene understanding
Scene understanding
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
 

Similar a Artificial Intelligence Course: Linear models

مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptxPriyadharshiniG41
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
EMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxEMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxAliElMoselhy
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptxhiblooms
 
Analysis and Design of Algorithms
Analysis and Design of AlgorithmsAnalysis and Design of Algorithms
Analysis and Design of AlgorithmsBulbul Agrawal
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmSupun Abeysinghe
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 

Similar a Artificial Intelligence Course: Linear models (20)

ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
EMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxEMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptx
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
 
Analysis and Design of Algorithms
Analysis and Design of AlgorithmsAnalysis and Design of Algorithms
Analysis and Design of Algorithms
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 

Más de ananth

Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligenceananth
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligenceananth
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vecananth
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1ananth
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distanceananth
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
L05 word representation
L05 word representationL05 word representation
L05 word representationananth
 
Natural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpNatural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpananth
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 wordsananth
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionananth
 

Más de ananth (17)

Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
L05 word representation
L05 word representationL05 word representation
L05 word representation
 
Natural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpNatural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlp
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
 

Último

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 

Último (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 

Artificial Intelligence Course: Linear models

  • 1. Linear Models Palacode Narayana Iyer Anantharaman Narayana dot Anantharaman at gmail dot com 23 Aug 2017 Copyright 2016 JNResearch, All Rights Reserved
  • 2. Linear Models • Linear models constitute a space of hypothesis that assume the output of a system is a linear function of the input • Regression problems are those with continuous variables as output • The term regression sometimes is confusing as some of us may think of regression as in “regression testing”, which is a different concept. In statistics and in machine learning, regression problems are simply those that produce real valued outputs. • Classification problems are those that label the given input in to one or more labels from a finite label set • Linear Regression is a machine learning technique that predicts an output assuming a linear relation between the output and input • Logistic Regression is a classification technique assuming a linear decision boundary • Though a sigmoid function is used in Logistic Regression, it is still considered a linear model, as the decision surface is linear Copyright 2016 JNResearch, All Rights Reserved input Output x1 x2
  • 3. Some Example applications of Linear Regression • Time series predictions • TV advertising: Given the TRP rating data across a number of TV programs, can we determine the serial to advertise? • You are the director of a mega serial and you have the viewership information from past weeks. Can you predict the future trends and do some intervention as needed? • Evaluating trends and making forecasts • Suppose your company offers infrastructure as a service over cloud and you are required to plan periodic upgrades to the hardware. Given the customer acquisition rate, usage patterns, etc. can you predict the capacity upgrade? • Assessing credit limits • If you are a bank and have received a home loan application, can you determine the optimal loan amount and a custom EMI scheme that maximizes mutual benefit? • Predict the onset of a disease • Given the historical data on patients with diabetes, predict the probability of a new patient getting this disease Copyright 2016 JNResearch, All Rights Reserved
  • 4. Copyright 2016 JNResearch, All Rights Reserved Example: Banking Sector (Ref ToI, 12th July 2016)
  • 5. Understanding Machine Learning Models • What does the model compute? • Hypothesis • How do we learn the parameters? • Training • How do we ensure we learnt the right parameters? • Validation • Given a trained model, how do we predict? • Inference • What can affect the performance of the model? • Overfitting, Underfitting • How do we measure the performance of the model? • What are the limitations of the model? • Precision, Recall, F1 Score, Confusion Matrix Copyright 2016 JNResearch, All Rights Reserved
  • 6. Measures Copyright 2016 JNResearch, All Rights Reserved Expected + Expected - Predicted + tp fp Predicted - Fn tn Precision P = tp/(tp + fp) Recall R = tp/(tp + fn) F1 Score = 2PR/(P+R)
  • 7. Linear Regression Hypothesis • A supervised machine learning hypothesis is a model that attempts to behave as the true but unknown target function: 𝑓: 𝑋 → 𝑌 • Given the training data (inputs, targets), the model learns to emulate f by maximizing the probability of training data • Linear regression assumes f to be a linear function as given by: ℎ 𝜃 = 𝜃0 + 𝜃1 𝑥1 + ⋯ + 𝜃 𝑛 𝑥 𝑛 • The values of 𝜃 are to be determined by the training procedure such that the hypothesis ℎ 𝜃 approximates 𝑓(𝑥) Copyright 2016 JNResearch, All Rights Reserved Training Set Learning Algorithm h Patient’s data Probability of disease
  • 8. Estimating Model Parameters • At a high level, we need to: • Start with some initial model (that is some random values of theta) • Estimate the output • Determine the cost of misclassification (in the case of classification problems) or inaccurate estimation (in the case of regression problems) • Using the training data, iteratively modify the model parameters such that the cost is minimized • Output the model when the cost is minimum (that is when the iteration converges) • Once the model is estimated it can be used for predicting/estimating new inputs (test data)
  • 9. Cost Function for Linear Regression • The cost function is often dictated by the requirements of the application. Though several widely used cost functions (e.g. mean squared error) exist, one may come up with his/her own if needed • The most commonly used cost function for regression problems is the squared error, defines as: J(𝜃) = 1 2𝑚 ∗ 𝑖=1 𝑚 (ℎ 𝜃(𝑥 𝑖) − 𝑦 𝑖)2 • Our goal is to minimize J(𝜃) with respect to 𝜃 • J(𝜃) as a function of 𝜃 is a quadratic function. It is a convex function with a single minimum point and is differentiable.
  • 10. Finding the minimum of J(𝜃) • We need to find 𝜃 by minimizing: J(𝜃) = 1 2𝑚 ∗ 𝑖=1 𝑚 (ℎ 𝜃(𝑥 𝑖 ) − 𝑦 𝑖 )2 • To determine the min we differentiate the cost function and set it to zero • The above form of J(𝜃) is differentiable and can be solved analytically. For the ith training example and for the parameter 𝜃𝑗, the derivative is: 𝜕𝐽 𝜕𝜃𝑗 = ℎ 𝜃 𝑥 𝑖 − 𝑦 𝑖 𝑥𝑗 𝑖 • We use a numerical way of determining the differentials because: • Analytical solutions involve matrix multiplications and for a large dataset with a large number of training examples and features, there may be memory overflows • For some cost functions, closed form solutions are not available. • Gradient Descent algorithm is widely used to compute parameter updates Copyright 2016 JNResearch, All Rights Reserved
  • 11. Batch Gradient Descent Algorithm Copyright 2016 JNResearch, All Rights Reserved J(𝜃) = 1 2𝑚 ∗ 𝑖=1 𝑚 (ℎ 𝜃(𝑥 𝑖) − 𝑦 𝑖)2 Repeat { 𝜃𝑗 = 𝜃𝑗 − 𝛼 1 𝑚 𝑖=1 𝑚 (ℎ 𝜃 𝑥 𝑖 − 𝑦 𝑖)𝑥𝑗 𝑖 } for every j from 1 to n α is the learning rate, which is a hyper parameter that is usually set to a small fraction
  • 12. Learning Rate in Gradient descent • It is necessary to choose the learning rate α to have an optimal value • Small values of α result in more number of iterations for convergence while a large value of α may make the algorithm overshoot the minimum point and J(𝜃) may not decrease. • To choose α try 0.001, 0.003 and so on J(𝜽) 𝜽
  • 13. Effect of initialization • Does the parameter initialization have an impact on determining the right model that has minimum cost? • What happens if we set all parameters to zero during initialization? Copyright 2016 JNResearch, All Rights Reserved
  • 14. Minimizing Loss: Gradient Descent • Batch gradient descent • Given m training examples, compute J(𝜃) as the total cost across all examples and use this to perform parameter updates. • Batch gradient descent uses ALL the examples in order to perform a single parameter update. • Minibatch gradient descent • Segment the m examples in to batches of a batch size b, compute J(𝜃) for a batch, update the parameters and do this for all batches (m/b) • Mini batch gradient descent uses b examples for a single parameter update • Stochastic Gradient Descent • For each training example, compute the cost and update the parameters. This is equivalent to minibatch with a batch size b = 1 Copyright 2016 JNResearch, All Rights Reserved
  • 15. Applying Linear Regression to classification • As described, linear regression produces the output as a continuous variable. That is both h(x) as well as y are continuous variables. • If we constrain y to be discrete, say -1 or 1, then the same linear regression model can be applied to classification problems also. • However the decision surfaces are inflexible • They can be used to generate the initial model parameters to another more specialized classification algorithm
  • 16. Linear Regression for non linear features • What if the underlying target function is a non linear function with respect to the features? • We can still use linear models with a pre processing step • Suppose we have: 𝑋 = (𝑥1, 𝑥2) and the output is a quadratic function of input. • Form new features: (𝑥1, 𝑥2, 𝑥1 2 , 𝑥2 2 , 𝑥1 𝑥2) and corresponding theta parameters. • Train as before and get the new model • For prediction, perform same transformation and use the linear model as before Copyright 2016 JNResearch, All Rights Reserved time views
  • 17. Case Study Copyright 2016 JNResearch, All Rights Reserved • Problem • Suppose we have videos of a TV serial that get uploaded on the YouTube on a daily basis • Each day there will be a new episode and will be uploaded • Our goal is to predict the number of views of any given episode on the day 3 after it was uploaded. • Assume that we don’t have access to the special analytics that YouTube may be providing to their customers • Solution • Model the growth rate of the number of views with respect to time. This can be done by visualizing the data and arriving at a suitable model. • Perform the transformation of input vectors as needed • Train the system with the data available from recent episodes • Validate, test
  • 18. Logistic Regression Copyright 2016 JNResearch, All Rights Reserved • Used for addressing classification problems • Examples: • Email Spam classification • Loan approvals • Text classification • Here the output is a discrete value. For binary classification, where the output is either 0 or 1, we have: 𝑦 ∈ {0, 1} Logistic Regression: 0 ≤ ℎ 𝜃(𝑥) ≤ 1
  • 19. Logistic Regression Hypothesis Copyright 2016 JNResearch, All Rights Reserved • Logistic Regression is a linear regression model with a sigmoid at the output • Sigmoid is a squashing function that is differentiable with output bounds 0, 1 • ℎ 𝜃(x) is the probability of output y = 1 on the given input x, parameterized by 𝜃 • Note: The logistic regression system outputs a valid probability • Question: Suppose a function f(x) outputs a real number between 0 to 1 for any x. Can this be considered as generating a valid probability distribution? ℎ 𝜃 𝑥 = 𝑔(𝜃 𝑇 𝑥) 𝑔 𝑧 = 1 1 + 𝑒−𝑧
  • 20. Logistic Regression: Decision Boundary • Predicts the output to be 1 if h 𝜃(x) >= 0.5, predict 0 if h 𝜃(x) < 0.5 • This implies: predict 1 if 𝜃Tx >= 0, predict 0 if 𝜃Tx < 0 • Example: Let 𝜃 be the vector (-3, 1, 1). We have: h 𝜃(x) = -3𝑥0 + 𝑥1 + 𝑥2 Predict y = 1 if -3𝑥0 + 𝑥1 + 𝑥2 >= 0, else predict y = 0 x1 x2 1 2 3 1 2 3
  • 21. Logistic Regression Cost Function The cost function for logistic regression is usually the cross entropy error function. 𝐽 𝜃 = − 1 𝑚 1 𝑚 𝑦 𝑖 log ℎ 𝜃 𝑥 𝑖 + 1 − 𝑦 𝑖 log(1 − (ℎ 𝜃 𝑥 𝑖 ) Copyright 2016 JNResearch, All Rights Reserved Let’s take a brief digression to understand the cross entropy concept in the next 2 slides
  • 22. Cost Function - intuition 𝐽 𝜃 = − 1 𝑚 1 𝑚 𝑦 𝑖 log ℎ 𝜃 𝑥 𝑖 + 1 − 𝑦 𝑖 log(1 − (ℎ 𝜃 𝑥 𝑖 ) • The cost can be split in to two parts: cost incurred when true output y = 0 and the cost when y = 1 𝐶𝑜𝑠𝑡 ℎ 𝜃 𝑥 , 𝑦 = − log ℎ 𝜃 𝑥 𝑤ℎ𝑒𝑛 𝑦 = 1 𝐶𝑜𝑠𝑡 ℎ 𝜃 𝑥 , 𝑦 = − log 1 − ℎ 𝜃 𝑥 𝑤ℎ𝑒𝑛 𝑦 = 0 • From the above it is easy to see: Cost = 0 when y = 1 and ℎ 𝜃 𝑥 = 1 but as ℎ 𝜃 𝑥 → 0, 𝐶𝑜𝑠𝑡 → ∞ Similarly we can analyse the cost for the case where y = 0 Copyright 2016 JNResearch, All Rights Reserved
  • 23. Gradient Descent for logistic regression • Using the cross entropy error function with the logistic regression hypothesis function, we can show that the derivatives turn out to be of the same form as linear regression • Hence the same gradient descent algorithm works for logistic regression with ℎ 𝜃 𝑥 replaced with the sigmoid Copyright 2016 JNResearch, All Rights Reserved
  • 24. Overfitting and Underfitting • Consider the adjoining figure where: • Let’s say the unknown target function is close to a Sine wave and the training samples are shown as x marks • We fit different polynomials: constant, linear, cubic and higher order polynomials and the model is shown as a blue coloured curve • We observe: • Fit with constant or linear is too simplistic • Fit with polynomials of high order, say, order = 9, are complex models and they try to pass through every sample in the training data. Copyright 2016 JNResearch, All Rights Reserved
  • 25. Underfitting: High Bias Problem • Fitting a simplistic (say, linear) model to a highly non linear underlying target function would result in high training error as well as test error. • Though the training samples show a non linearity, we pretend as if everything is simple and try to fit the data with our biased view of the world! Due to this, we say this is a “bias” problem. • Suppose we need to do face recognition given an image, is it possible to use linear models and hope for great accuracy? • This is termed underfitting or high bias Copyright 2016 JNResearch, All Rights Reserved
  • 26. Overfitting: High Variance Problem • If we use a complex model (say, a high order polynomial of degree > 5) we end up trying to account for every training sample. • This is like fitting the data rather than emulating the underlying target process. • This kind of model can’t see the wood for the trees! • This approach is error prone because: • This is very sensitive to noise and outliers • Fitting very tightly to the training data is memorizing and not generalizing! Copyright 2016 JNResearch, All Rights Reserved
  • 27. Training and Test Errors • When the model is too simple for a given application, both the training and test errors are high due to underfitting • When the model is the right fit, the test error is at the minimum • If we increase the model complexity still further, the training error continues to reduce but the test error shoots up Copyright 2016 JNResearch, All Rights Reserved Model Complexity Error Training error Test error
  • 28. If a model fits the training data well, it doesn’t automatically mean that it would fit the test data well Copyright 2016 JNResearch, All Rights Reserved
  • 29. How to avoid underfitting? • Assuming we have adequate amount of training data, we could increase the model complexity • Use a higher order polynomial to fit the data • Use a more powerful model (Neural Nets, Deep Learning, etc) • Add more relevant features Copyright 2016 JNResearch, All Rights Reserved
  • 30. How to avoid overfitting? • When we use a complex model such as the one with higher order polynomials, we need to keep the weights (theta values) small. • Without any constraints in choosing the parameters, complex models tend to have larger weights when they attempt to fit every training example. • This is similar to having too many knobs to turn and each knob set to whatever value without constraints in order to fit the data • One way to minimize the impact of overfitting is to use regularization Copyright 2016 JNResearch, All Rights Reserved
  • 31. Regularization • Small values of parameters lead to simpler hypothesis and reduces overfitting • We can keep the parameters under check by introducing a regularization term to the cost function as below: J(𝜃) = 1 2𝑚 ∗ 𝑖=1 𝑚 (ℎ 𝜃(𝑥 𝑖 ) − 𝑦 𝑖 )2 + 𝜆 𝑖 𝜃𝑗 2 𝜆 𝑖𝑠 𝑡ℎ𝑒 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 Copyright 2016 JNResearch, All Rights Reserved 𝐽 𝜃 = − 1 𝑚 1 𝑚 𝑦 𝑖 log ℎ 𝜃 𝑥 𝑖 + 1 − 𝑦 𝑖 log(1 − (ℎ 𝜃 𝑥 𝑖 ) + 𝜆 2𝑚 𝑖 𝜃𝑗 2
  • 32. Multiclass Classification • The logistic regression we discussed is a binary classifier • Many real world applications may require a multi class classification • Examples • Text classification: Suppose we want to categorize a document in to (Technical, General, Sports, Movies, Business). If there is a news article that reads something like: “Sachin Tendulkar was awarded Bharat Ratna”, we might label this both as general news as well as an item pertaining to sports. Copyright 2016 JNResearch, All Rights Reserved
  • 33. One vs All Logistic Classifier • To perform multiclass classifier for K classes, build K independent binary logistic classifiers each with their parameter vectors. • A binary classifier for label j is trained to predict for y = j and treat all other classes as not(y=j) Copyright 2016 JNResearch, All Rights Reserved x1 x2 x1 x2 x1 x2
  • 34. How does Naïve Bayes Relates to Logistic Regression? • Some interesting questions to think about are: • Does logistic regression have equivalence to Naïve Bayes Classifier? • What kind of decision surface does the Naïve Bayes classify? • With a few simple mathematical steps the equivalence between the two models can be demonstrated Copyright 2016 JNResearch, All Rights Reserved
  • 35. Softmax Classifier • A multiclass logistic regression allows labelling over K classes (instead of just binary) by running K independent regressors in parallel. • Thus, the outputs 𝑖 ∈ 𝐾 𝑎𝑛𝑑 𝑗 ∈ 𝐾 𝑤ℎ𝑒𝑟𝑒 𝑖 ≠ 𝑗 are produced independently, except that they share the same input. • On many situations, it may be more useful to generate a probability distribution over the output classes for a given input. • Softmax classifier is a multinomial logistic regression classifier that addresses this requirement Copyright 2016 JNResearch, All Rights Reserved
  • 36. Softmax Hypothesis • Recall that the logistic is a linear regression with a sigmoid discriminator output • Let the real valued output from the linear transformation 𝜃 𝑇 𝑥 be a score s 𝑠 = 𝑓 𝑋; 𝜃 𝑤ℎ𝑒𝑟𝑒 𝑓 𝑖𝑠 𝑎 𝑙𝑖𝑛𝑒𝑎𝑟 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑠 𝑡ℎ𝑒 𝑖𝑛𝑝𝑢𝑡 • The softmax output is a probability distribution given by: 𝑃 𝑌 = 𝑘 𝑋 = exp(𝑠 𝑘) 𝑗 exp(𝑠𝑗) • It is easy to show that P(Y|X) as above is a valid probability distribution. Copyright 2016 JNResearch, All Rights Reserved
  • 37. Cost Function for Softmax • The hypothesis expression of a softmax classifier is a probability distribution over the output classes • Given the training data, we would like to maximize the likelihood or equivalently, the log likelihood • Log is a monotonic function of its input, maximizing log(x) is equivalent to maximizing x. • We use log due to mathematical convenience • Maximizing log likelihood is same as minimizing the negative log likelihood (NLL) • Thus, our cost function is: 𝐿𝑖 = − log( 𝑃 𝑌 = 𝑦𝑖 𝑋 = 𝑋𝑗 ) 𝐿𝑖 = − log( 𝑃 𝑌 = 𝑦𝑖 𝑋 = 𝑋𝑗 ) = −log( exp(𝑠𝑖) 𝑧 exp(𝑠𝑧) ) Copyright 2016 JNResearch, All Rights Reserved
  • 38. Code Walkthrough Copyright 2016 JNResearch, All Rights Reserved
  • 39. Linear Models: Hands on • We will work on 2 sample datasets: • A toy dataset using a simulator • Mobile devices specs dataset • Our goal is to perform both regression and classification tasks on each of these datasets Copyright 2016 JNResearch, All Rights Reserved
  • 40. Steps • Target function generator is a simulator that generates linear decision surfaces. Go through the code so that it can be modified later • linear_model_test.py implements a linear and logistic regression using Keras framework. Go through the code in this file and run the code. • The program will request user input: (Number of samples and fraction of data for validation). You can start with 1000 and 0.2 • You can start with 2 dimensions to start with and can increase subsequently • For different configurations measure the training as well as validation accuracy Copyright 2016 JNResearch, All Rights Reserved
  • 41. Steps • Review the dataset mobile_phones_data.csv from the datasets directory. • You can use the function get_data_from_csv(…) for getting relevant fields from the dataset for our experiments • Run the code mobile_linear_reg.py and observe the accuracy. Modify the code in order to get the value of accuracy displayed on the screen • You are required to modify the code under the while True loop in order to test the program.
  • 42. Generative Versus Discriminative Models • Key notion behind the generative models is that the abstract (or hidden or latent) state generates the observations. A given abstract state has a likelihood of generating the given observations. • Naïve Bayes Classifier is an example of generative model • As opposed to the generative models, the discriminative models take the observations as given and try to figure out what hidden state might best explain the observations. • Logistic Regression is an example of discriminative model Copyright 2016 JNResearch, All Rights Reserved Y X1 X2 Y X1 X2
  • 43. Generative Versus Discriminative Models • Generative models are characterized by joint distributions. • We ask the question: What is the probability of the observations X and the abstract label Y to co occur? That is: P(X, Y) • Discriminative models are conditional models. They are characterized by the conditional probability distribution • Given the observations (features), what is the class label that is most likely? • As we model the joint distribution in generative models, we need both the prior P(Y) as well as the likelihood P(X|Y) in order to determine the posterior P(Y|X) • Discriminative model do not need to model hidden states P(Y) as they directly learn P(Y|X) from the observations or features Copyright 2016 JNResearch, All Rights Reserved
  • 44. Back to the earlier case study… • How do we apply logistic or softmax regression to the twitter search problem we discussed? • For a domain specific problem like this (where the labels are fixed and given), we can take advantage of some prior knowledge, instead of depending only on observed data • Example of prior knowledge: The term “democrats” is associated with US Congress, Sonia with Indian National Congress, Internet with Mobile World Congress and so on. Copyright 2016 JNResearch, All Rights Reserved
  • 45. Using Logistic for text classification problem • A key issue to handle when using logistic or softmax for NLP applications is the arbitrary variable length of sentences. • When we use the words in a sentence as features (as we did in the case of Naïve Bayes), the feature length n becomes a variable • Logistic requires a fixed number of features and the associated theta parameters. • Some options to consider to fit the input to a logistic: • Regardless of the size of the input sentence, always form an n-feature vector. If the sentence length is > n, chop off extra words and if it is < n, pad it suitably • Split a long sentence in fixed size blocks of n • As we observe, such pre processing of input for the sake of fitting it in to a Logistic regression might degrade the prediction accuracy • Solution: Use a different model  Copyright 2016 JNResearch, All Rights Reserved
  • 46. MaxEnt Models • MaxEnt models support an elegant way to map a variable length input in to a fixed size feature vector. • This is a powerful model that has equivalence to logistic regression • Many NLP problems can be reformulated as classification problems • E.g. Language Modelling, Tagging Problems (these will be covered in detail later) • MaxEnt is widely used for various text processing tasks. • Task is to estimate the probability of a class given the context • The term context may refer to a single word or group of words • The key idea is to use a bunch of “feature functions” that map the input in to a fixed length feature vector. Often these are Boolean functions. • MaxEnt models are also termed as log-linear models Copyright 2016 JNResearch, All Rights Reserved
  • 47. Summary • Linear models assume a linear decision boundary and are simple models • Though the models are simple, they are extensively used as: • Stand alone classifiers • As a layer or a building block in multi layer models (such as Neural Networks, CNNs, RNNs) • MaxEnt models are widely used in NLP applications Copyright 2016 JNResearch, All Rights Reserved