SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
INT3405 Machine Learning
Lecture 1 - Introduction
Ta Viet Cuong, Le Duc Trong, Tran Quoc Long
VNU-UET
2022
1 / 23
Table of content
What is Machine Learning
Probability & Random variable
Probability Distributions & Maximum Likelihood Estimation
2 / 23
Machine learning
Machine learning is the study of computer algorithms that allow
computer programs to automatically improve through experience
(Tom Mitchell)
▶ T: a task with clearly defined input and output
▶ P: a performance measure assessing how good an algorithm is
on the task
▶ E: a set of experience (i.e. data) provided to the algorithm
3 / 23
Example
(Single) Face detection
▶ T: input = 224x224 RGB image, output = (x1,y1,x2,y2)
top-left & bottom-right corner of the face in the input
▶ P: IoU
▶ E: a set of (million) (image, (x1, y1, x2, y2)) pairs
Exercises: Specify T, P, E for
▶ Predicting tomorrow’s weather given geographic information,
satellite images, and a trailing window of past weather.
▶ Answering question, expressed in free-form text.
▶ Identifying all of people depicted in an image and draws
outlines around each.
▶ Recommending users with products that they are likely to
enjoy while browsing.
4 / 23
Types of Machine Learning
▶ Supervised learning: learn input-output relationship
(E = {(xi , yi )} where xi ’s are inputs and yi ’s are desired
targets)
▶ Unsupervised / Self-supervised learning: learn data features,
clusters or distribution (E = {xi }, inputs only, no targets)
▶ Reinforcement learning: learn good action policy for an agent
in an environment (E = {(s, a) → (s′, r)} where s, s′ are
states, a is action, r is reward)
5 / 23
Key phases in Machine Learning
Phase Programming aspect
Data preparation storing, retrieving, transforming data
Data modelling model libraries, machine learning algorithms
Training model optimization, fine tuning, validation
Inference deploying, logging, testing, mobile, web, api
6 / 23
Prerequisite for Machine learning
Math
▶ Linear Algebra
▶ Calculus
▶ Probability and Statistics
▶ Optimization
Programming
▶ Data structure and
algorithms
▶ Python/C++
▶ Libaries: numpy, pandas,
scikit-learn, pytorch
▶ Framework: jupyter, django,
fastapi, Android, IOS
7 / 23
Probability
Definitions:
▶ Sample space: Ω is the set of all possible outcomes or results
(of a random experiment).
▶ Event space: The set F ⊂ 2Ω is a σ-algebra of the sets of Ω.
Each element in F is an event (subset of Ω).
▶ A σ-algebra must satisfy: (i) F ̸= ∅, (ii) A ∈ F ⇒ Ω  A ∈ F,
(iii) Ai ∈ F, ∀i ⇒
S∞
i=1 Ai ∈ F
▶ Probability measure: a function P : F → R+ satisfies the
following properties:
▶ P(Ω) = 1, P(∅) = 0
▶ Ai ∈ F, Ai ∩ Aj = ∅, ∀i ̸= j ⇒ P(
S∞
i=1 Ai ) =
P∞
i=1 P(Ai )
As a result, the probability of a random event is specified by a
probability triple (Ω, F, P).
8 / 23
Probability
Example
Consider a random experiment: A closed box contains 100
marbles, of which 40 are red and 60 are blue. Take out one marble
randomly.
▶ Sample space: Ω is the set of 100 marbles in the box.
▶ Event space: F = {∅, Ω, red marble, blue marble}, i.e F
includes 4 sets of Ω. Notice that F is a σ-algebra of Ω.
▶ Probability measure: If the chances of taking every marble are
all equal, then
▶ P(∅) = 0, (Ω) = 1, P(red) = 0.4, P(blue) = 0.6
▶ Event ∅: no marble are taken (happen with probability 0).
▶ Event Ω: A red or blue marble is taken (happen with
probability 1).
▶ Event red marble: the marble taken is red (probability 0.4).
▶ Event blue marble: the marble taken is blue (probability 0.6).
9 / 23
Probability
Bayes’ theorem
Consider two events A, B, with P(A) ̸= 0, then
P(B|A) =
P(A ∩ B)
P(A)
=
P(A|B)P(B)
P(A)
where
▶ P(B|A): the probability of event B occurring given that A is
true (a-posterior).
▶ P(A|B): the likelihood of A given a fixed B
▶ P(B): marginal or prior probability.
Independence
Two events A and B are independent iff P(A ∩ B) = P(A)P(B)
10 / 23
Probability
Example: COVID-19
▶ Test results are accurate on a sicked person with 90% (True
positive rate).
▶ Test results are accurate on a healthy person with 99% (True
negative rate).
▶ 3% of the population have COVID-19.
Question: what is the probability that a random person who tests
positive is really a sicked person?
▶ Event A: positive test result.
▶ Event B: has disease.
P(A|B) × P(B) = 0.9 × 0.03 = 0.027
P(A) = P(A|B) × P(B) + P(A| − B) × P(−B)
= 0.9 × 0.03 + 0.01 × 0.97 = 0.0367
⇒ P(B|A) = 73.569%
11 / 23
Random variable
A random variable X is a measurable function X on the sample
space
X : Ω → R
Example:
▶ Randomly take 10 marbles (with replacement). The number
of blue marble in the 10 taken marbles is a random variable.
▶ Pick randomly 1 person in 100 people, the height of that
person is a random variable.
12 / 23
Types of random variables
▶ Discrete
X ∈ {1, 2, . . . C}
with parameters: θc = P(X = c), c = 1, 2, . . . C
▶ Continuous
X ∈ R
▶ Cumulative density function (CDF): F(x) = P(X ≤ x)
▶ Probability density function (PDF): p(x) = F′
(x)
▶ Bayes’s formula for PDF:
p(x, y) = p(y|x)p(x) = p(x|y)p(y)
13 / 23
Properties of random distribution
▶ Expectation
E[X] =
X
c
cP(X = c) =
Z
R
xp(x)dx
E[f (X)] =
Z
R
f (x)p(x)dx
▶ Variance
V[X] = E[(X − E[X])2
]
14 / 23
Properties of expectation
E[aX + bY + c] = aE[X] + bE[Y ] + c
V[aX] = a2
V[X]
V[X] = E[X2
] − (E[X])2
V[X] = V[E[X|Y ]] + E[V[X|Y ]]
If X, Y are independent
E[X · Y ] = E[X] · E[Y ]
V[X + Y ] = V[X] + V[Y ]
15 / 23
Properties of expectation
EY EX [X|Y ] =
Z
R
Z
R
xp(x|y)dx

p(y)dy =
Z
R
xp(x)dx = E[X]
Z
R
Z
R
xp(x|y)dx

p(y)dy =
Z
R
Z
R
xp(x|y)p(y) dxdy
=
Z
R
Z
R
xp(x, y) dxdy
=
Z
R
Z
R
xp(y|x)p(x) dxdy
=
Z
R
Z
R
p(y|x) dy

| {z }
1
xp(x) dx
16 / 23
Bernoulli distribution
X ∈ {0, 1} with probability P(X = 1) = θ, written as X ∼ Ber(θ).
We also have P(X = 0) = 1 − θ.
▶ A biased coin: θ = probability of head
▶ Binary classification: P(y = 1|x) = Ber(θ(x))
→ probability of class 1 is a function of input
17 / 23
Parameter estimation
Toss a coin (sampling) N times, the number of times heads come
up (number 1) is s times, what is the parameter θ of the coin
(Bernoulli distribution)?
An intuitive guess: θ = s
N , why does this number make sense?
Let xi ∈ {0, 1} is the values from the ith toss.
The probability of the data D = {x1, x2, . . . , xN} under the model
X ∼ Ber(θ) is
L(θ) = P(D) = P(x1, x2, . . . , xN) =
N
Y
i=1
P(xi )
=
N
Y
i=1
θxi
(1 − θ)1−xi
18 / 23
Maximum likelihood Estimation - MLE
L(θ) is the likelihood of θ with respect to the dataset D
MLE: Find θ for which L(θ) is maximized.
ℓ(θ) = log L(θ) =
n
X
i=1
xi log θ + (1 − xi ) log(1 − θ)
ℓ′
(θ) =
n
X
i=1
xi /θ − (1 − xi )/(1 − θ) = 0
1
θ
n
X
i=1
xi
| {z }
s
=
1
1 − θ
n
X
i=1
(1 − xi )
| {z }
N−s
s(1 − θ) = (N − s)θ
θMLE
=
s
N
19 / 23
How good is the MLE?
▶ Unbiased: E[θMLE ] = θ
▶ Variance goes to 0: V[θMLE ] = θ(1 − θ)/N
▶ Consistent: P{|θMLE − θ| ≥ ϵ}
n→∞
−→ ∞
▶ Normality:
√
N(θMLE − θ)
d
−→ N(0, 1)
20 / 23
Binomial distribution
The probability of getting exactly s heads in N independent
Bernoulli trials of tossing a coin is a Binomial distribution.
X ∼ Bin(s|N, θ) ⇒ P(X = s) = Cs
Nθs
(1 − θ)N−s
If, after taking the experiments n times, we get the data
D = {s1, s2, . . . , sn}, then what is the sensible value of θ? (Hint:
using MLE)
L(θ) = P(D) = P(s1, x2, . . . , sn) =
n
Y
i=1
P(si )
=
n
Y
i=1
Csi
N θsi
(1 − θ)N−si
21 / 23
Binomial distribution (cont)
ℓ(θ) = log L(θ) = const +
n
X
i=1
si log θ + (N − si ) log(1 − θ)
ℓ′
(θ) =
n
X
i=1
si /θ − (N − si )/(1 − θ) = 0
1
θ
n
X
i=1
si =
1
1 − θ
n
X
i=1
(N − si )
θ =
1
n
N
X
i=1
si
N
22 / 23
Gaussian distribution
The distribution X ∼ N(x|µ, σ2) is a Gaussian distribution with
the density function
p(X = x) =
1
√
2πσ2
e−
(x−µ)2
2σ2
▶ Regression: p(y|x) = N(y|µ(x), σ2) or y = µ(x) + ϵ with
ϵ ∼ N(ϵ|0, σ2)
Exercise: Given the data D = {x1, x2, . . . xn}, what are reasonable
values of the parameters µ, σ2?
23 / 23

Más contenido relacionado

Similar a 2223hk1_slide01_ML2022-2.pdf

Hypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rulesHypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rulesYoung-Geun Choi
 
Monte Carlo Methods 2017 July Talk in Montreal
Monte Carlo Methods 2017 July Talk in MontrealMonte Carlo Methods 2017 July Talk in Montreal
Monte Carlo Methods 2017 July Talk in MontrealFred J. Hickernell
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsGabriel Peyré
 
Model Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesModel Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesGabriel Peyré
 
2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information Theory2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information TheoryJoe Suzuki
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Session 03 Probability & sampling Distribution NEW.pptx
Session 03 Probability & sampling Distribution NEW.pptxSession 03 Probability & sampling Distribution NEW.pptx
Session 03 Probability & sampling Distribution NEW.pptxMuneer Akhter
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousJoe Suzuki
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksFederico Cerutti
 

Similar a 2223hk1_slide01_ML2022-2.pdf (20)

lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdflec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
 
Hypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rulesHypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rules
 
Monte Carlo Methods 2017 July Talk in Montreal
Monte Carlo Methods 2017 July Talk in MontrealMonte Carlo Methods 2017 July Talk in Montreal
Monte Carlo Methods 2017 July Talk in Montreal
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse Problems
 
Model Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesModel Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular Gauges
 
2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information Theory2013 IEEE International Symposium on Information Theory
2013 IEEE International Symposium on Information Theory
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Ab cancun
Ab cancunAb cancun
Ab cancun
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Session 03 Probability & sampling Distribution NEW.pptx
Session 03 Probability & sampling Distribution NEW.pptxSession 03 Probability & sampling Distribution NEW.pptx
Session 03 Probability & sampling Distribution NEW.pptx
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
sada_pres
sada_pressada_pres
sada_pres
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or Continuous
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

2223hk1_slide01_ML2022-2.pdf

  • 1. INT3405 Machine Learning Lecture 1 - Introduction Ta Viet Cuong, Le Duc Trong, Tran Quoc Long VNU-UET 2022 1 / 23
  • 2. Table of content What is Machine Learning Probability & Random variable Probability Distributions & Maximum Likelihood Estimation 2 / 23
  • 3. Machine learning Machine learning is the study of computer algorithms that allow computer programs to automatically improve through experience (Tom Mitchell) ▶ T: a task with clearly defined input and output ▶ P: a performance measure assessing how good an algorithm is on the task ▶ E: a set of experience (i.e. data) provided to the algorithm 3 / 23
  • 4. Example (Single) Face detection ▶ T: input = 224x224 RGB image, output = (x1,y1,x2,y2) top-left & bottom-right corner of the face in the input ▶ P: IoU ▶ E: a set of (million) (image, (x1, y1, x2, y2)) pairs Exercises: Specify T, P, E for ▶ Predicting tomorrow’s weather given geographic information, satellite images, and a trailing window of past weather. ▶ Answering question, expressed in free-form text. ▶ Identifying all of people depicted in an image and draws outlines around each. ▶ Recommending users with products that they are likely to enjoy while browsing. 4 / 23
  • 5. Types of Machine Learning ▶ Supervised learning: learn input-output relationship (E = {(xi , yi )} where xi ’s are inputs and yi ’s are desired targets) ▶ Unsupervised / Self-supervised learning: learn data features, clusters or distribution (E = {xi }, inputs only, no targets) ▶ Reinforcement learning: learn good action policy for an agent in an environment (E = {(s, a) → (s′, r)} where s, s′ are states, a is action, r is reward) 5 / 23
  • 6. Key phases in Machine Learning Phase Programming aspect Data preparation storing, retrieving, transforming data Data modelling model libraries, machine learning algorithms Training model optimization, fine tuning, validation Inference deploying, logging, testing, mobile, web, api 6 / 23
  • 7. Prerequisite for Machine learning Math ▶ Linear Algebra ▶ Calculus ▶ Probability and Statistics ▶ Optimization Programming ▶ Data structure and algorithms ▶ Python/C++ ▶ Libaries: numpy, pandas, scikit-learn, pytorch ▶ Framework: jupyter, django, fastapi, Android, IOS 7 / 23
  • 8. Probability Definitions: ▶ Sample space: Ω is the set of all possible outcomes or results (of a random experiment). ▶ Event space: The set F ⊂ 2Ω is a σ-algebra of the sets of Ω. Each element in F is an event (subset of Ω). ▶ A σ-algebra must satisfy: (i) F ̸= ∅, (ii) A ∈ F ⇒ Ω A ∈ F, (iii) Ai ∈ F, ∀i ⇒ S∞ i=1 Ai ∈ F ▶ Probability measure: a function P : F → R+ satisfies the following properties: ▶ P(Ω) = 1, P(∅) = 0 ▶ Ai ∈ F, Ai ∩ Aj = ∅, ∀i ̸= j ⇒ P( S∞ i=1 Ai ) = P∞ i=1 P(Ai ) As a result, the probability of a random event is specified by a probability triple (Ω, F, P). 8 / 23
  • 9. Probability Example Consider a random experiment: A closed box contains 100 marbles, of which 40 are red and 60 are blue. Take out one marble randomly. ▶ Sample space: Ω is the set of 100 marbles in the box. ▶ Event space: F = {∅, Ω, red marble, blue marble}, i.e F includes 4 sets of Ω. Notice that F is a σ-algebra of Ω. ▶ Probability measure: If the chances of taking every marble are all equal, then ▶ P(∅) = 0, (Ω) = 1, P(red) = 0.4, P(blue) = 0.6 ▶ Event ∅: no marble are taken (happen with probability 0). ▶ Event Ω: A red or blue marble is taken (happen with probability 1). ▶ Event red marble: the marble taken is red (probability 0.4). ▶ Event blue marble: the marble taken is blue (probability 0.6). 9 / 23
  • 10. Probability Bayes’ theorem Consider two events A, B, with P(A) ̸= 0, then P(B|A) = P(A ∩ B) P(A) = P(A|B)P(B) P(A) where ▶ P(B|A): the probability of event B occurring given that A is true (a-posterior). ▶ P(A|B): the likelihood of A given a fixed B ▶ P(B): marginal or prior probability. Independence Two events A and B are independent iff P(A ∩ B) = P(A)P(B) 10 / 23
  • 11. Probability Example: COVID-19 ▶ Test results are accurate on a sicked person with 90% (True positive rate). ▶ Test results are accurate on a healthy person with 99% (True negative rate). ▶ 3% of the population have COVID-19. Question: what is the probability that a random person who tests positive is really a sicked person? ▶ Event A: positive test result. ▶ Event B: has disease. P(A|B) × P(B) = 0.9 × 0.03 = 0.027 P(A) = P(A|B) × P(B) + P(A| − B) × P(−B) = 0.9 × 0.03 + 0.01 × 0.97 = 0.0367 ⇒ P(B|A) = 73.569% 11 / 23
  • 12. Random variable A random variable X is a measurable function X on the sample space X : Ω → R Example: ▶ Randomly take 10 marbles (with replacement). The number of blue marble in the 10 taken marbles is a random variable. ▶ Pick randomly 1 person in 100 people, the height of that person is a random variable. 12 / 23
  • 13. Types of random variables ▶ Discrete X ∈ {1, 2, . . . C} with parameters: θc = P(X = c), c = 1, 2, . . . C ▶ Continuous X ∈ R ▶ Cumulative density function (CDF): F(x) = P(X ≤ x) ▶ Probability density function (PDF): p(x) = F′ (x) ▶ Bayes’s formula for PDF: p(x, y) = p(y|x)p(x) = p(x|y)p(y) 13 / 23
  • 14. Properties of random distribution ▶ Expectation E[X] = X c cP(X = c) = Z R xp(x)dx E[f (X)] = Z R f (x)p(x)dx ▶ Variance V[X] = E[(X − E[X])2 ] 14 / 23
  • 15. Properties of expectation E[aX + bY + c] = aE[X] + bE[Y ] + c V[aX] = a2 V[X] V[X] = E[X2 ] − (E[X])2 V[X] = V[E[X|Y ]] + E[V[X|Y ]] If X, Y are independent E[X · Y ] = E[X] · E[Y ] V[X + Y ] = V[X] + V[Y ] 15 / 23
  • 16. Properties of expectation EY EX [X|Y ] = Z R Z R xp(x|y)dx p(y)dy = Z R xp(x)dx = E[X] Z R Z R xp(x|y)dx p(y)dy = Z R Z R xp(x|y)p(y) dxdy = Z R Z R xp(x, y) dxdy = Z R Z R xp(y|x)p(x) dxdy = Z R Z R p(y|x) dy | {z } 1 xp(x) dx 16 / 23
  • 17. Bernoulli distribution X ∈ {0, 1} with probability P(X = 1) = θ, written as X ∼ Ber(θ). We also have P(X = 0) = 1 − θ. ▶ A biased coin: θ = probability of head ▶ Binary classification: P(y = 1|x) = Ber(θ(x)) → probability of class 1 is a function of input 17 / 23
  • 18. Parameter estimation Toss a coin (sampling) N times, the number of times heads come up (number 1) is s times, what is the parameter θ of the coin (Bernoulli distribution)? An intuitive guess: θ = s N , why does this number make sense? Let xi ∈ {0, 1} is the values from the ith toss. The probability of the data D = {x1, x2, . . . , xN} under the model X ∼ Ber(θ) is L(θ) = P(D) = P(x1, x2, . . . , xN) = N Y i=1 P(xi ) = N Y i=1 θxi (1 − θ)1−xi 18 / 23
  • 19. Maximum likelihood Estimation - MLE L(θ) is the likelihood of θ with respect to the dataset D MLE: Find θ for which L(θ) is maximized. ℓ(θ) = log L(θ) = n X i=1 xi log θ + (1 − xi ) log(1 − θ) ℓ′ (θ) = n X i=1 xi /θ − (1 − xi )/(1 − θ) = 0 1 θ n X i=1 xi | {z } s = 1 1 − θ n X i=1 (1 − xi ) | {z } N−s s(1 − θ) = (N − s)θ θMLE = s N 19 / 23
  • 20. How good is the MLE? ▶ Unbiased: E[θMLE ] = θ ▶ Variance goes to 0: V[θMLE ] = θ(1 − θ)/N ▶ Consistent: P{|θMLE − θ| ≥ ϵ} n→∞ −→ ∞ ▶ Normality: √ N(θMLE − θ) d −→ N(0, 1) 20 / 23
  • 21. Binomial distribution The probability of getting exactly s heads in N independent Bernoulli trials of tossing a coin is a Binomial distribution. X ∼ Bin(s|N, θ) ⇒ P(X = s) = Cs Nθs (1 − θ)N−s If, after taking the experiments n times, we get the data D = {s1, s2, . . . , sn}, then what is the sensible value of θ? (Hint: using MLE) L(θ) = P(D) = P(s1, x2, . . . , sn) = n Y i=1 P(si ) = n Y i=1 Csi N θsi (1 − θ)N−si 21 / 23
  • 22. Binomial distribution (cont) ℓ(θ) = log L(θ) = const + n X i=1 si log θ + (N − si ) log(1 − θ) ℓ′ (θ) = n X i=1 si /θ − (N − si )/(1 − θ) = 0 1 θ n X i=1 si = 1 1 − θ n X i=1 (N − si ) θ = 1 n N X i=1 si N 22 / 23
  • 23. Gaussian distribution The distribution X ∼ N(x|µ, σ2) is a Gaussian distribution with the density function p(X = x) = 1 √ 2πσ2 e− (x−µ)2 2σ2 ▶ Regression: p(y|x) = N(y|µ(x), σ2) or y = µ(x) + ϵ with ϵ ∼ N(ϵ|0, σ2) Exercise: Given the data D = {x1, x2, . . . xn}, what are reasonable values of the parameters µ, σ2? 23 / 23