Lecture 5 machine learning updated

Introduction
• Machine Learning is concerned with the study of building computer
programs that automatically improve and/or adapt their performance
through experience.
• Machine learning can be thought of as “programming by example" .
• The goal of machine learning is to devise learning algorithms that do
the learning automatically without human intervention or assistance.
Rather than program the computer to solve the task directly, in
machine learning, we seek methods by which the computer will come
up with its own program based on examples that we provide.

Introduction
• A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with
experience E.
• A learning system is characterized by the following elements:
• task (or tasks) T
• experience E
• performance measure P

Introduction
• Ex: a learning system for playing tic-tac-toe game (or nuggets and
crosses) will have the following corresponding elements:
• T: Play tic-tac-toe;
• P: Percentage of games won (and eventually drawn);
• E: Playing against itself (can also be playing against others).

Introduction
• Some examples of machine learning problems are
• character (including digit) recognition
• handwriting recognition
• face detection
• spam filtering
• sound recognition
• spoken language understanding
• stock market prediction
• weather prediction
• medical diagnosis
• fraud detection
• fingerprint matching

Terminology
• Machine learning
• Data science
• Data mining
• Data Analysis
• Statistical learning
• Knowledge discovery in databases
• Pattern discovery

Data? Where?
• Google: processes 24 peta bytes of data per day.
• Facebook: 10 million photos uploaded every hour.
• Youtube: 1 hour of video uploaded every second.
• Twitter: 400 million tweets per day.
• Astronomy: Satellite data is in hundreds of PB.
• “By 2020 the digital universe will reach 44
zettabytes...” (That’s 44 trillion gigabytes!)

Data types
• Data comes in different sizes and also flavors (types):
• Texts
• Numbers
• Clickstreams
• Graphs
• Tables
• Images
• Transactions
• Videos
• Some or all of the above!

The data science process
Exploratory data analysis

Generic learning system
• A generic learning system can be defined by the following
components
• Goal: Defined with respect to the task to be performed by the system;
• Model: A mathematical function which maps perception to actions;
• Learning rules: Used to update the model parameters with new experience in
a way which optimizes the performance measures with respect to the goals.
Learning rules help the algorithm search for the best model;
• Experience: A set of perception (and possibly the corresponding actions).

Generic scheme of a learning system

The basic notions
• An example (sometimes also called an instance) is the object that is
being classified. For instance, if we consider the cancer tumor
patients classification, then the patients are the examples.
• An example is described by a set of attributes, also known as features
or variables. Ex: the cancer tumor patients classification, a patient
might be described by attributes such as gender, age, weight, tumor
size, tumor shape, blood pressure, etc.

The basic notions
• The label is the category that we are trying to predict. For instance, in
the cancer patient classification, the labels can be “avascular”,
“vascular” and “angiogenesis”.
Then,
• During training, the learning algorithm is supplied with labeled
examples,
while during testing, only unlabeled examples are provided.
In certain situation we can assume that only two labels are possible
that we might as well call 0 and 1 (for instance, cancer or not cancer).
This will make the things much simple.

Learning Steps
The main steps in a learning process are as follows:
• Data and assumptions
Data refers to the data available for the learning task and assumptions represent
what we can assume about the problem.
• Representation
We should define how to represent the examples to be classified. There are many
representation methods for the same data. The choice of representation may
determine whether the learning task is very easy or very difficult.
• Method and estimation
Method takes into account what are the possible hypotheses and estimation helps
to adjust the predictions based on the feedback (such as updating the parameters
when there is a mistake, etc).

Learning Steps
• Evaluation
This evaluates how well the method is working (for instance,
considering the ratio of wrong classified data and the whole dataset).
• Model selection
It tells us whether we can rethink the approach to do even better or to
make it more flexible, or, whether we should choose an entirely
different model that would be more suitable.

Learning Systems Classification
Classification Based on Goal, Tasks, Target Function
• Prediction: the system predicts the desired output for a given input based
on previous input/output pairs.
Example: prediction of a stock value given other (input) parameters
values like market index, interest rates, currency conversion, etc.
• Regression: the system estimates a function of many variables
(multivariate) or single variable (univariate) from scattered data.
Example: a simple univariate regression problem is x4+ x3+ x2+x+1

• Classification (categorization): the system classifies an object into one
of several categories (or classes) based on features of the object.
Example: A diagnosis system which has to classify a patient’s cancer
tumor into one of the three categories: avascular, vascular,
angiogenesis.
• Clustering: the system task is to organize a group of objects into
homogeneous segments.
Example: a satellite image analysis system which groups land areas
into forest, urban and water body, for better utilization of natural
resources.

• Planning: the system has to generate an optimal sequence of actions
to solve a particular problem.
Example: robot path planning (to perform a certain task or to move
from one place to another, etc).

Classification Based on the Model
• Decision trees
• Linear separators (perceptron model)
• Neural networks
• Genetic programming
• Evolutionary algorithms
• Graphical models
• Support vector machines
• Hidden Markov models

Classification Based on the Learning Rules
• gradient descent
• least square error
• expectation maximization
• margin maximization

Classification Based on Experience
• Supervised learning:
In supervised learning, the machine is given the desired outputs and its goal
is to learn to produce the correct output given a new input.
Examples : an automated vehicle where a set of vision inputs and the
corresponding steering actions are available to the learner.
• Unsupervised learning: (exploratory learning)
In unsupervised learning the goal of the machine is to build a model of
input that can be used for reasoning, decision making, predicting things, and
communicating.
Examples: Finding out malicious network attacks from a sequence of anomalous
data packets is an example of unsupervised learning

• Active learning:
Here not only a teacher is available, the learner has the freedom to ask
the teacher for suitable perception-action example pairs which will help the learner
to improve its performance.
Example: Consider a news recommender system which tries to learn a user’s
preference and categorize news articles as interesting or uninteresting to the user.
• Reinforcement learning:
In reinforcement learning, the machine can also produce actions which
affect the state of the world, and receive rewards (or punishments). The
goal is to learn to act in a way that maximizes rewards in the long term.
Example: a robot in an unknown terrain where its get a punishment when its hits
an obstacle and reward when it moves smoothly.

Machine Learning
• Learn models from data
• Machine learning
• Supervised learning
• Unsupervised learning
• Reinforcement learning
• Examples:
• Google
• Netflix
• Amazon

Supervised and unsupervised learning
• Unsupervised learning:
• Learning a model from unlabeled data.
• Supervised learning:
• Learning a model from labeled data.

Supervised and unsupervised learning

Unsupervised learning
Methods: K-means, gaussian mixtures, hierarchical clustering, spectral clustering, etc.

Unsupervised Learning • Clustering
• Dimensionality
reduction

Clustering examples
• Clustering of the population by their demographics.
• Clustering of geographic objects (mineral deposits, houses, etc.)
• Clustering of stars
• Audio signal separation
• Image segmentation

K-Means visualization
• https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
• Local file

Supervised learning
Methods: Support Vector Machines, neural networks, decision trees, K-nearest neighbors, naive
Bayes, etc.

Supervised learning
• Classification:

Supervised learning
• Non linear classification

Supervised learning
• Regression
Example: Income in function of age, weight of the fruit in function of its length.

Training and testing
Use generated model

K-nearest neighbours
• Not every ML method builds a model! (Ex: K-nearest neighbours)
• Our first ML method: KNN.
• Main idea: Uses the similarity between examples.
• Assumption: Two similar examples should have same labels.
• Assumes all examples (instances) are points in the d dimensional
space Rd.

• KNN uses the standard Euclidian distance to define nearest neighbors.

An approximate decision boundary for K = 3

• Pros:
• Simple to implement.
• Works well in practice.
• Does not require to build a model, make assumptions, tune parameters.
• Can be extended easily with news examples.
• Cons:
• Requires large space to store the entire training dataset.
• Slow! Given n examples and d features. The method takes O(n × d) to run.
• Suffers from the curse of dimensionality.

• Applications
• Information retrieval.
• Handwritten character classification using nearest neighbour in large
databases.
• Recommender systems (user like you may like similar movies).
• Breast cancer diagnosis.
• Medical data mining (similar patient symptoms).
• Pattern recognition in general.

Training and Testing
How can we be confident about f?

Avoid overfitting
In general, use simple models!
• Reduce the number of features manually or do feature selection.
• Do a model selection (ML).
• Use regularization (keep the features but reduce their importance by
setting small parameter values) (ML).
• Do a cross-validation to estimate the test error.

Confusion matrix
• Calculate number of errors that the classifier is doing.

Reinforcement Learning
• Agent interacts and learns from a stochastic environment.
• No supervisor, only reward signals.
• Ex:
• Automated vehicle control
• An unmanned helicopter learning to fly and perform stunts
• Game playing
• Playing backgammon, Atari breakout, Tetris, Tic Tac Toe
• Medical treatment planning
• Planning a sequence of treatments based on the effect of past treatments
• Chat bots
• Agent figuring out how to make a conversation

Reference
• ColumbiaX: CSMM.101x Artificial Intelligence (AI)

Linear Models
• Linear Regression
• Linear Classification: Perceptron

Linear Regression
Method 3
Gradient descent

Linear Regression
• Gradient descent

Decision trees
• The terminology Tree is graphic.
• However, a decision tree is grown from the root downward. The idea
is to send the examples down the tree, using the concept of
information entropy

Decision trees
Entropy  Control how a decision tree decides where to split the data.
Ex:
If all examples are same class  Entropy=0
If examples are evenly split between classes  Entropy=1.0
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 =
𝑖
−(𝑃𝑖)𝑙𝑜𝑔2(𝑃𝑖)

Decision trees
• How to decide which attribute is best to split on based on entropy?

Decision trees
• At the first split starting from the root, we choose the attribute that has the max gain.
• Then, we re-start the same process at each of the children no des (if node not pure).

Basics
• How likely something is to happen.
• Ex:
• Tossing a Coin
• Throwing Dice

Terminology
• Experiment or Trial: an action where the result is uncertain.
• Sample Space: all the possible outcomes of an experiment.
• Sample Point: just one of the possible outcomes
• Event: a single result of an experiment

Probability: Complement
• P(A) means "Probability of Event A“
• P(A') means "Probability of the complement of Event A“
• P(A) + P(A') = 1

Probability: Types of Events
• Independent (each event is not affected by other events),
• Dependent (also called "Conditional", where an event is affected by
other events)[Conditional Probability]
• Mutually Exclusive (events can't happen at the same time)

Probability Tree Diagrams
The probability of getting at least one Head from two tosses is 0.25+0.25+0.25 = 0.75

Probability: Independent Events
• Independent Events are not affected by previous events.
• We can calculate the chances of two or more independent events by
multiplying the chances.
• P(A and B) = P(A) × P(B)
(Probability of A and B
equals the probability
of A times the probability
of B)

Dependent Events
• They can be affected by previous events

Dependent Events: Notation
• P(A) means "Probability Of Event A“
• Ex: Consider above example
• Event A is  get a Blue Marble first
• P(A) = 2/5
• Event B is  get a Blue Marble second
• If we got a Blue Marble first the chance is now 1/4
• If we got a Red Marble first the chance is now 2/4
• P(B|A) means "Event B given Event A“ (event A has already happened, now
what is the chance of event B?)
• P(B|A) is also called the "Conditional Probability" of B given A.

Dependent Events: Notation
• And in our case:
• P(B|A) = ¼
• So the probability of getting 2 blue marbles is:
• And we write it as

Mutually Exclusive Events
• When two events (call them "A" and "B") are Mutually Exclusive it is
impossible for them to happen together:
• P(A and B) = 0 ("The probability of A and B together equals 0 (impossible)")
• But the probability of A or B is the sum of the individual probabilities:
• P(A or B) = P(A) + P(B)
• If not Mutually Exclusive:
• P(A or B) = P(A) + P(B) − P(A and B)
• P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

False Positives and False Negatives

Bayes' Theorem
• How computers learn about people?

Bayes' Theorem
• Bayes’ Theorem is a way of finding a probability when we know
certain other probabilities.
•P(A|B) is "Probability of A given B", the probability of A given that B happens
•P(A) is Probability of A
•P(B|A) is "Probability of B given A", the probability of B given that A happens
•P(B) is Probability of B

Bayes' Theorem: Example
• One of the famous uses for Bayes Theorem is False Positives and False
Negatives.
We want to know the chance of having the allergy when test says "Yes",
written P(Allergy|Yes)

But we can calculate it by adding up those with, and those without the allergy:

A special version of the Bayes' formula

Why probabilities?
• Why are we bringing here Bayes rule?

Lecture 5 machine learning updated

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Lecture 5 machine learning updated

Similar a Lecture 5 machine learning updated (20)

Más de Vajira Thambawita

Más de Vajira Thambawita (20)

Último

Último (20)

Lecture 5 machine learning updated