SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Sweet AI
Statistics
Sweet AI
Variables
Variables
Quantitative
(Histogram)
Discrete
(number of students in a class)
Continuous
(weight)
Interval (Temp)
Ratio (Height, Age)
Categorical/ Qualitative
(Bar plot)
Binary
(spam/safe)
Nominal
(non-sortable: colors, genre)
Ordinal
(sortable: grades, product rating)
Sweet AI
Probability
Probability
Independent event
Dependent event Conditional probability P(A|B) =
P A∩𝐵
𝑃(𝐵)
Multiplication rule/ Intersection
Depended event:
P A ∩ 𝐵 = 𝑃 𝐴 ∗ 𝑃 𝐵 𝐴 𝑜𝑟 𝑃 𝐵 ∗ 𝑃(𝐴|𝐵)
Indepenedent event:
P A ∩ 𝐵 = 𝑃 𝐴 ∗ 𝑃(𝐵)
Addition rule/ Union P A ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − P A ∩ 𝐵
Complement rule 𝑃(𝐴 ) = 1 − 𝑃(𝐴)
Bayes Theorem P(A|B) =
𝑃 B 𝐴 𝑃(𝐴)
𝑃(𝐵)
Permutation (order matter)
n: number of set, r: number of spots
Repetition nr ex: AB, BA, AA, BB
No repetition
𝑛!
(𝑛−𝑟)!
ex: AB, BA
Combination (order doesn’t matter)
Repetition
(𝑛+𝑟 −1)!
𝑟!(𝑛−1)!
ex: AA, BA, BB
No repetition
𝑛!
𝑟!(𝑛−𝑟)!
ex: AB
Sweet AI Basic Concepts
Concept Description
Population The entire dataset that you want to draw conclusions about. e.g., all the school’s students of the USA
Sample
A smaller set randomly drawn from the population. e.g., 700 volunteer students from different schools in the USA
Outlier/ Noise/
Anomalies
Datapoints that are at abnormal distance from the other observations, and they can skew the model.
Variate
Univariate à one variable
Bivariate à two variable
Multivariate à more than two variables
Sampling Methods
Probability
Simple
random
Systematic Stratified Cluster
Non-probability
Convenience Snowball Quota Purposive
Sweet AI
Statistical Measures
Statistical Measures
Central Tendency
Mean
Median
Mode
Central Dispersion
Range
Variance
Standard Deviation
IQR
Association
Covariance
Correlation
Sweet AI
Basic Measurement Concepts
Central
Tendency
Description Example
Mean/ Average
( 𝜇, ̅
𝑥 )
The total of the numbers divide by the number of numbers.
Sensitive to outlier.
[4, 3, 7, 2, 3, 6]: 4 + 3 + 7 + 2 + 3 + 6 = 25 / 7 à 4.16
Median ( Med, M) Sort the numbers and find the middle number [4, 3, 7, 2, 3, 6]: [2, 3, 3, 4, 6, 7] à 3.5
Mode The most common occurring number [4, 3, 7, 2, 3, 6]: à 3
Sweet AI
Central Dispersion
Dispersion Description Example
Range The difference between smallest and largest number [4, 3, 7, 2, 3, 6]: 7 – 2 à 5
Variance (𝜎2
) Shows how spread-out the data points are, and measures the width of the
distribution around mean
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛: 𝜎2 =
∑/01
2
(#$ % &)
(
𝑆𝑎𝑚𝑝𝑙𝑒: 𝑆2 =
∑/01
2
(#$ % #)
( %)
[4, 3, 7, 2, 3, 6]: 𝜇 = 5 à dist(-1, -2, 2, -3, -2, 1)2
à 𝜎2 = 23/6 = 3.83
Standard
Deviation (𝜎)
How spread out the data is around the mean and used to identify outliers.
data points that are more than one sd from mean might be consider
unusual 𝜎 = 𝜎2
[4, 3, 7, 2, 3, 6]: 𝜎 = 3.83 = 1.95
Standard
Error (SE)
Population Sd estimates how spread-out individual values are from the population mean.
Standard error estimates the accuracy of a sample and how far a sample mean is likely to be from the population mean.
𝑆𝐸 ̅
𝑥 =
*
+
(𝜎: population standard deviation, n: # datapoints in the sample) à return the result as mean ±𝑆𝐸
Sweet AI
Concept Description Example
Quartiles All datapoints are considered and sorted
ascendingly, find median, then find median of two
other sets:
• Q1: lower/ first
• Q2: median/ middle/ second
• Q3: upper/ third
• [2, 3, 3, 4, 6, 7, 8, 12, 19, 19, 24, 26]
• [ 2,3,3 | 4,6,7 | 8,12,19 | 19,24,26]
Interquartile
Range (IQR)
IQR = Q3 – Q1
Outlier = Q1 – 1.5 x IQR
Outlier = Q3 + 1.5 x IQR
Percentiles 99 values that split the sample into 100 equal size subsamples
Central Dispersion
Q1 Q3
IQR
Whisker
Whisker
Fence at 1.5 IQR
Sweet AI
Association
Association Description
Covariance Measures the relationship between two variable in two dimensions.
Positive value à two variables move in the same direction
Negative value à two variables move in the inverse direction
Closer to zero indicates weak relationship
Farther from 0 indicates stronger relationship
Pearson Correlation
Coefficient/ Pearson’s r
Measure the strength and direction of a
linear relationship two variables.
-1 (strong negative relationship) < r < +1
(strong positive relationship)
P = 0 à no correlation
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛: 𝐶𝑜𝑣 𝑋, 𝑌 =
∑(𝑋𝑖 − D
𝑋)(𝑌𝑖 − D
𝑌)
𝑁
𝑆𝑎𝑚𝑝𝑙𝑒: 𝐶𝑜𝑣 𝑋, 𝑌 =
∑(𝑋𝑖 − D
𝑋)(𝑌𝑖 −
D
𝑌)
𝑁 − 1
Image from U of Wisconsin.
𝜌𝑋, 𝑌 =
𝑐𝑜𝑣(𝑋, 𝑌)
𝜎𝑋𝜎𝑌
=
∑(𝑥𝑖 − ̅
𝑥)(𝑦𝑖 − 1
𝑦)
∑(𝑥𝑖 − ̅
𝑥)2 ∑ 𝑦𝑖 − 1
𝑦 2
Sweet AI
Distribution
Credit: Harold Toomey, WyzAnt Tutor
Sweet AI
Distribution
Discrete/ mass function
Continuous/ density function
Sweet AI
Hypothesis Testing
Hypotheses
Alternative (H1/Ha)
e.g., a male salary is higher than a female salary for a same job position
Null (H0)
e.g., a male salary is equal to a female salary for a same job position
Non-Directional
Directional
Statistical
Sweet AI
Hypothesis Testing
Hypothesis Test
Parametric Test
Regression
Simple Linear
Regression
Multiple Linear
Regression
Logistic Regression
Comparison
t-test
ANOVA
MANOVA
Correlation
Pearson's r
Non-Parametric Test
Spearman's r
Chi square test
ANOSIM
Wilcoxon
Sign test
Sweet AI
Hypothesis Testing
State H0 & H1 Collect testing samples
Select & Execute
Statistical Test
Infer the results
(Reject/fail to Reject H0)
Ho: Men are, on average,
not getting higher salary
than women.
Ha: Men are, on average,
getting higher salary
than women.
Equal proportion of men
& women in a variety of
industries in scope of a
country
One-tail t-test
Average diff 20k and p-
value 0.002, which is
consistent with H1
Terminology Description
Significance level
/confidence
level(𝛼)
A threshold to decide whether a test statistic is statistically significant. Statical significance means high likely a
relationship between variables is not caused by chance. 𝛼 is lays in the area inside the tail(s) of the H0
𝛼 = 1 – (confidence level /100) à Common practice 𝛼 : 0.01, 0.05, 0.1
P-value
(probability
Value)
Determines plausibility of null hypotheses, whether H0 should be rejected or not! P(Sample statistics| H0 True)
0 < p-value < 1
P-value ≥ 𝛼 : results are not statically significant, H0 not rejected/failed, the null must fly!
P-value < 𝛼 : results are statically significant, H0 rejected/failed, the null must go!
Sweet AI Basic Concepts
H0 is ... True False
Rejected Type I Error
𝛼
Correct
Not Rejected Correct Type II Error
𝛽
https://www.abtasty.com
• Used to test if two groups of data are different from each other and we don’t know standard deviation of population
• Normal Distribution Formula:
• To calculate percentile of a datapoint we should standardize a Normal Distribution to a Standard Normal Distribution
• Standard Normal Distribution has 𝜇 = 0 & 𝜎 = 1
• How to determine x’s percentile/probability or how far from typical is this result?
1. Standardize the values of normal distribution and calculating z-score by population 𝜇 and population 𝜎
v for a single raw datum x: 𝒛 =
𝒙 − 𝝁
𝝈
v for n independent and distributed samples(X): 𝒁 =
7
𝒙 − 𝝁
𝝈/ 𝒏
v for proportion 𝒁 =
:
𝒑−𝒑
𝒑(𝟏−𝒑)/𝒏
"
𝑝: 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛, 𝑝: hypothesized population proportion, n: sample size
2. Looking at z-table to map a z-score to the area under a normal distribution curve and return P-value
3. Compare p-value with 𝛼: 𝑖𝑓 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≥ 𝛼: Fail to reject H0 else Reject H0
Sweet AI
Z-test
Sweet AI
Z-test
www.z-table.com
Sweet AI
Student t-test
• Used to test if two groups of data are different from each other and we don’t know standard deviation of population
• Assumption:
• Normal distribution
• Similar variance for each group/sample
• Same number of datapoint in each group/sample (20-30), more than this we should use z-test
• H0: There is no difference between groups
• H1: There is a difference between groups
Types of t-test Description Formula
Degree of
freedom
One-sample t-test Test if a population mean is equal to some value 𝝁
̅
𝑥: sample mean, 𝜇: population mean, s: sample standard deviation, n:
sample size
𝑡 − 𝑣𝑎𝑙𝑢𝑒 =
̅
𝑥 − 𝜇
𝑆
𝑛
df = n -1
Dependent/Paired-
samples t-test
Test whether two population means are equal by sampling the same
population twice , s: sample variances
𝑡 − 𝑣𝑎𝑙𝑢𝑒 =
∑𝑑
𝑛 ∑𝑑 2 − ∑𝑑 2
𝑛 − 1
df = n -1
Independent two-sample
t-test/ unpaired samples
t-test
Test if two population means are equal, two independent samples of
different size with unequal variance
t − 𝑣𝑎𝑙𝑢𝑒 =
𝑠𝑖𝑔𝑛𝑎𝑙
𝑛𝑜𝑖𝑠𝑒
=
̅
𝑥1 − ̅
𝑥2
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
df = n2 + n1 -2
Sweet AI
Student t-test
stanford.edu
t-value < 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 → Do Not Reject H0
t-value > 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 → Reject H0
Degrees of freedom (df)
1. Calculate t-value and df
2. Determine on one or two tail test and the level
of confidence
3. Look up critical value from t-table and
determine to reject or fail to reject H0
Sweet AI
t-test vs. z-test
Start
Known 𝜎
sample size < 30
Is population highly
skewed?
t - test sign test
sample size >= 30
Is population highly
skewed?
z - test Alternative methods
Not known 𝜎
Is population highly
skewed?
t-test Alternative methods
Yes
No
Yes
No No Yes
• t-test and z-test are used to
determine and compare the
significance of a set of data.
Sweet AI
Analysis Of Variance (ANOVA)
• ANOVA determines the effects of several categorical independent variables on one numerical dependent variable.
ANOVA
One way 1 independent categorical variable on a single dependent variable
Two way 2 independent categorical variables on a single dependent variable
N-way Multiple independent categorical variables
Sweet AI
Analysis Of Variance (ANOVA)
1. Calculate variance between group and within groups
2. Calculate degree of freedom
3. Compute F-value. F − 𝑣𝑎𝑙𝑢𝑒 =
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛𝑔 𝑔𝑟𝑜𝑢𝑝𝑠
=
𝑆𝑆𝐵𝐺/𝑑𝑓1
𝑆𝑆𝑊𝐺/𝑑𝑓2
, df1 = n -1, df2 = (n – 1)m , n: # sample in each group, m: # groups
4. Find critical value/F-score from F Distribution table using df1, df2 and a selected alpha http://www.socr.ucla.edu/Applets.dir/F_Table.html
5. Compare F-value and F-score, if f-value < fcritical : fail to reject H0 else H0 is rejected
Sweet AI
t-test vs. z-test
Sweet AI
Hypothesis Testing towardsdatascience.com
Sweet AI
Hypothesis Testing
Sweet AI
Python Library
Type of Test Scipy Code
Determine Gaussian distribution of data from scipy.stats import shapiro/ normaltest
stat, p = shapiro(data) # p > 0.05 has Gaussian distribution
Determine linear relationship of two samples from scipy.stats import pearsonr
stat, p = pearsonr(data1, data2) # p > 0.05 more likely they are independent
Determine monotonic relationship of two samples from scipy.stats import spearmanr/ kendalltau
stat, p = spearmanr(data1, data2) # p > 0.05 more likely they are independent
Determine relationship of two categorical variables from scipy.stats import chi2_contingency
stat, p, dof, expected = chi2_contingency(table) # p > 0.05 more likely they are independent
Determine z-score or percentile from scipy import stats
stats.norm.cdf(z) or stats.norm.ppf(p)
Determine if the means of two independent normally distributed
samples are significantly different (student t-test)
from scipy.stats import ttest_ind
stat, p = ttest_ind(data1, data2) # p > 0.05 more likely the same distribution
Determine if the means of two paired normally distributed
samples are significantly different (student t-test)
from scipy.stats import ttest_rel
stat, p = ttest_rel(data1, data2) # p > 0.05 more likely the same distribution
Determine if the means of two or more independent normally
distributed samples are significantly different (ANOVA)
from scipy.stats import f_oneway
stat, p = f_oneway(data1, data2, data3) # p > 0.05 more likely the same distribution
Determine if the distribution of two independent samples are
equal (Mann- Whitney U test)
from scipy.stats import mannwhitneyu
stat, p = mannwhitneyu(data1, data2) # p > 0.05 more likely the same distribution

Más contenido relacionado

La actualidad más candente

One-way ANOVA research paper
One-way ANOVA research paperOne-way ANOVA research paper
One-way ANOVA research paper
Jose Dela Cruz
 

La actualidad más candente (20)

Applied Statistics In Business
Applied Statistics In BusinessApplied Statistics In Business
Applied Statistics In Business
 
Introduction to Regression Analysis and R
Introduction to Regression Analysis and R   Introduction to Regression Analysis and R
Introduction to Regression Analysis and R
 
Statistical tests/prosthodontic courses
Statistical tests/prosthodontic coursesStatistical tests/prosthodontic courses
Statistical tests/prosthodontic courses
 
Chapter14
Chapter14Chapter14
Chapter14
 
PG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statisticsPG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statistics
 
Ds vs Is discuss 3.1
Ds vs Is discuss 3.1Ds vs Is discuss 3.1
Ds vs Is discuss 3.1
 
Reporting a single sample t-test
Reporting a single sample t-testReporting a single sample t-test
Reporting a single sample t-test
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Statistics - Basics
Statistics - BasicsStatistics - Basics
Statistics - Basics
 
Applied statistics part 5
Applied statistics part 5Applied statistics part 5
Applied statistics part 5
 
Bbs10 ppt ch03
Bbs10 ppt ch03Bbs10 ppt ch03
Bbs10 ppt ch03
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
T test statistics
T test statisticsT test statistics
T test statistics
 
Applied statistics part 3
Applied statistics part 3Applied statistics part 3
Applied statistics part 3
 
Applied statistics part 4
Applied statistics part  4Applied statistics part  4
Applied statistics part 4
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
 
Anova (1)
Anova (1)Anova (1)
Anova (1)
 
Presentation on Regression Analysis
Presentation on Regression AnalysisPresentation on Regression Analysis
Presentation on Regression Analysis
 
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of DataPG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
 
One-way ANOVA research paper
One-way ANOVA research paperOne-way ANOVA research paper
One-way ANOVA research paper
 

Similar a BasicStatistics.pdf

Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014
RSS6
 
Chi square test social research refer.ppt
Chi square test social research refer.pptChi square test social research refer.ppt
Chi square test social research refer.ppt
Snehamurali18
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
mousaderhem1
 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSS
ANSHU TIWARI
 
Elementary statistics for Food Indusrty
Elementary statistics for Food IndusrtyElementary statistics for Food Indusrty
Elementary statistics for Food Indusrty
Atcharaporn Khoomtong
 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
cockekeshia
 

Similar a BasicStatistics.pdf (20)

Medical statistics2
Medical statistics2Medical statistics2
Medical statistics2
 
Stat2013
Stat2013Stat2013
Stat2013
 
Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014
 
Applied Statistics And Doe Mayank
Applied Statistics And Doe MayankApplied Statistics And Doe Mayank
Applied Statistics And Doe Mayank
 
MPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptxMPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptx
 
Chi square test social research refer.ppt
Chi square test social research refer.pptChi square test social research refer.ppt
Chi square test social research refer.ppt
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSS
 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & group
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptx
 
Elementary statistics for Food Indusrty
Elementary statistics for Food IndusrtyElementary statistics for Food Indusrty
Elementary statistics for Food Indusrty
 
elementary statistic
elementary statisticelementary statistic
elementary statistic
 
Stat topics
Stat topicsStat topics
Stat topics
 
Medical Statistics Part-II:Inferential statistics
Medical Statistics Part-II:Inferential  statisticsMedical Statistics Part-II:Inferential  statistics
Medical Statistics Part-II:Inferential statistics
 
Chi2 Anova
Chi2 AnovaChi2 Anova
Chi2 Anova
 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptx
 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi square
 
Data analysis
Data analysisData analysis
Data analysis
 
Quantitative Data analysis
Quantitative Data analysisQuantitative Data analysis
Quantitative Data analysis
 

Último

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 

Último (20)

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 

BasicStatistics.pdf

  • 2. Sweet AI Variables Variables Quantitative (Histogram) Discrete (number of students in a class) Continuous (weight) Interval (Temp) Ratio (Height, Age) Categorical/ Qualitative (Bar plot) Binary (spam/safe) Nominal (non-sortable: colors, genre) Ordinal (sortable: grades, product rating)
  • 3. Sweet AI Probability Probability Independent event Dependent event Conditional probability P(A|B) = P A∩𝐵 𝑃(𝐵) Multiplication rule/ Intersection Depended event: P A ∩ 𝐵 = 𝑃 𝐴 ∗ 𝑃 𝐵 𝐴 𝑜𝑟 𝑃 𝐵 ∗ 𝑃(𝐴|𝐵) Indepenedent event: P A ∩ 𝐵 = 𝑃 𝐴 ∗ 𝑃(𝐵) Addition rule/ Union P A ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − P A ∩ 𝐵 Complement rule 𝑃(𝐴 ) = 1 − 𝑃(𝐴) Bayes Theorem P(A|B) = 𝑃 B 𝐴 𝑃(𝐴) 𝑃(𝐵) Permutation (order matter) n: number of set, r: number of spots Repetition nr ex: AB, BA, AA, BB No repetition 𝑛! (𝑛−𝑟)! ex: AB, BA Combination (order doesn’t matter) Repetition (𝑛+𝑟 −1)! 𝑟!(𝑛−1)! ex: AA, BA, BB No repetition 𝑛! 𝑟!(𝑛−𝑟)! ex: AB
  • 4. Sweet AI Basic Concepts Concept Description Population The entire dataset that you want to draw conclusions about. e.g., all the school’s students of the USA Sample A smaller set randomly drawn from the population. e.g., 700 volunteer students from different schools in the USA Outlier/ Noise/ Anomalies Datapoints that are at abnormal distance from the other observations, and they can skew the model. Variate Univariate à one variable Bivariate à two variable Multivariate à more than two variables Sampling Methods Probability Simple random Systematic Stratified Cluster Non-probability Convenience Snowball Quota Purposive
  • 5. Sweet AI Statistical Measures Statistical Measures Central Tendency Mean Median Mode Central Dispersion Range Variance Standard Deviation IQR Association Covariance Correlation
  • 6. Sweet AI Basic Measurement Concepts Central Tendency Description Example Mean/ Average ( 𝜇, ̅ 𝑥 ) The total of the numbers divide by the number of numbers. Sensitive to outlier. [4, 3, 7, 2, 3, 6]: 4 + 3 + 7 + 2 + 3 + 6 = 25 / 7 à 4.16 Median ( Med, M) Sort the numbers and find the middle number [4, 3, 7, 2, 3, 6]: [2, 3, 3, 4, 6, 7] à 3.5 Mode The most common occurring number [4, 3, 7, 2, 3, 6]: à 3
  • 7. Sweet AI Central Dispersion Dispersion Description Example Range The difference between smallest and largest number [4, 3, 7, 2, 3, 6]: 7 – 2 à 5 Variance (𝜎2 ) Shows how spread-out the data points are, and measures the width of the distribution around mean 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛: 𝜎2 = ∑/01 2 (#$ % &) ( 𝑆𝑎𝑚𝑝𝑙𝑒: 𝑆2 = ∑/01 2 (#$ % #) ( %) [4, 3, 7, 2, 3, 6]: 𝜇 = 5 à dist(-1, -2, 2, -3, -2, 1)2 à 𝜎2 = 23/6 = 3.83 Standard Deviation (𝜎) How spread out the data is around the mean and used to identify outliers. data points that are more than one sd from mean might be consider unusual 𝜎 = 𝜎2 [4, 3, 7, 2, 3, 6]: 𝜎 = 3.83 = 1.95 Standard Error (SE) Population Sd estimates how spread-out individual values are from the population mean. Standard error estimates the accuracy of a sample and how far a sample mean is likely to be from the population mean. 𝑆𝐸 ̅ 𝑥 = * + (𝜎: population standard deviation, n: # datapoints in the sample) à return the result as mean ±𝑆𝐸
  • 8. Sweet AI Concept Description Example Quartiles All datapoints are considered and sorted ascendingly, find median, then find median of two other sets: • Q1: lower/ first • Q2: median/ middle/ second • Q3: upper/ third • [2, 3, 3, 4, 6, 7, 8, 12, 19, 19, 24, 26] • [ 2,3,3 | 4,6,7 | 8,12,19 | 19,24,26] Interquartile Range (IQR) IQR = Q3 – Q1 Outlier = Q1 – 1.5 x IQR Outlier = Q3 + 1.5 x IQR Percentiles 99 values that split the sample into 100 equal size subsamples Central Dispersion Q1 Q3 IQR Whisker Whisker Fence at 1.5 IQR
  • 9. Sweet AI Association Association Description Covariance Measures the relationship between two variable in two dimensions. Positive value à two variables move in the same direction Negative value à two variables move in the inverse direction Closer to zero indicates weak relationship Farther from 0 indicates stronger relationship Pearson Correlation Coefficient/ Pearson’s r Measure the strength and direction of a linear relationship two variables. -1 (strong negative relationship) < r < +1 (strong positive relationship) P = 0 à no correlation 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛: 𝐶𝑜𝑣 𝑋, 𝑌 = ∑(𝑋𝑖 − D 𝑋)(𝑌𝑖 − D 𝑌) 𝑁 𝑆𝑎𝑚𝑝𝑙𝑒: 𝐶𝑜𝑣 𝑋, 𝑌 = ∑(𝑋𝑖 − D 𝑋)(𝑌𝑖 − D 𝑌) 𝑁 − 1 Image from U of Wisconsin. 𝜌𝑋, 𝑌 = 𝑐𝑜𝑣(𝑋, 𝑌) 𝜎𝑋𝜎𝑌 = ∑(𝑥𝑖 − ̅ 𝑥)(𝑦𝑖 − 1 𝑦) ∑(𝑥𝑖 − ̅ 𝑥)2 ∑ 𝑦𝑖 − 1 𝑦 2
  • 10. Sweet AI Distribution Credit: Harold Toomey, WyzAnt Tutor
  • 11. Sweet AI Distribution Discrete/ mass function Continuous/ density function
  • 12. Sweet AI Hypothesis Testing Hypotheses Alternative (H1/Ha) e.g., a male salary is higher than a female salary for a same job position Null (H0) e.g., a male salary is equal to a female salary for a same job position Non-Directional Directional Statistical
  • 13. Sweet AI Hypothesis Testing Hypothesis Test Parametric Test Regression Simple Linear Regression Multiple Linear Regression Logistic Regression Comparison t-test ANOVA MANOVA Correlation Pearson's r Non-Parametric Test Spearman's r Chi square test ANOSIM Wilcoxon Sign test
  • 14. Sweet AI Hypothesis Testing State H0 & H1 Collect testing samples Select & Execute Statistical Test Infer the results (Reject/fail to Reject H0) Ho: Men are, on average, not getting higher salary than women. Ha: Men are, on average, getting higher salary than women. Equal proportion of men & women in a variety of industries in scope of a country One-tail t-test Average diff 20k and p- value 0.002, which is consistent with H1
  • 15. Terminology Description Significance level /confidence level(𝛼) A threshold to decide whether a test statistic is statistically significant. Statical significance means high likely a relationship between variables is not caused by chance. 𝛼 is lays in the area inside the tail(s) of the H0 𝛼 = 1 – (confidence level /100) à Common practice 𝛼 : 0.01, 0.05, 0.1 P-value (probability Value) Determines plausibility of null hypotheses, whether H0 should be rejected or not! P(Sample statistics| H0 True) 0 < p-value < 1 P-value ≥ 𝛼 : results are not statically significant, H0 not rejected/failed, the null must fly! P-value < 𝛼 : results are statically significant, H0 rejected/failed, the null must go! Sweet AI Basic Concepts H0 is ... True False Rejected Type I Error 𝛼 Correct Not Rejected Correct Type II Error 𝛽 https://www.abtasty.com
  • 16. • Used to test if two groups of data are different from each other and we don’t know standard deviation of population • Normal Distribution Formula: • To calculate percentile of a datapoint we should standardize a Normal Distribution to a Standard Normal Distribution • Standard Normal Distribution has 𝜇 = 0 & 𝜎 = 1 • How to determine x’s percentile/probability or how far from typical is this result? 1. Standardize the values of normal distribution and calculating z-score by population 𝜇 and population 𝜎 v for a single raw datum x: 𝒛 = 𝒙 − 𝝁 𝝈 v for n independent and distributed samples(X): 𝒁 = 7 𝒙 − 𝝁 𝝈/ 𝒏 v for proportion 𝒁 = : 𝒑−𝒑 𝒑(𝟏−𝒑)/𝒏 " 𝑝: 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛, 𝑝: hypothesized population proportion, n: sample size 2. Looking at z-table to map a z-score to the area under a normal distribution curve and return P-value 3. Compare p-value with 𝛼: 𝑖𝑓 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≥ 𝛼: Fail to reject H0 else Reject H0 Sweet AI Z-test
  • 18. Sweet AI Student t-test • Used to test if two groups of data are different from each other and we don’t know standard deviation of population • Assumption: • Normal distribution • Similar variance for each group/sample • Same number of datapoint in each group/sample (20-30), more than this we should use z-test • H0: There is no difference between groups • H1: There is a difference between groups Types of t-test Description Formula Degree of freedom One-sample t-test Test if a population mean is equal to some value 𝝁 ̅ 𝑥: sample mean, 𝜇: population mean, s: sample standard deviation, n: sample size 𝑡 − 𝑣𝑎𝑙𝑢𝑒 = ̅ 𝑥 − 𝜇 𝑆 𝑛 df = n -1 Dependent/Paired- samples t-test Test whether two population means are equal by sampling the same population twice , s: sample variances 𝑡 − 𝑣𝑎𝑙𝑢𝑒 = ∑𝑑 𝑛 ∑𝑑 2 − ∑𝑑 2 𝑛 − 1 df = n -1 Independent two-sample t-test/ unpaired samples t-test Test if two population means are equal, two independent samples of different size with unequal variance t − 𝑣𝑎𝑙𝑢𝑒 = 𝑠𝑖𝑔𝑛𝑎𝑙 𝑛𝑜𝑖𝑠𝑒 = ̅ 𝑥1 − ̅ 𝑥2 𝑠1 2 𝑛1 + 𝑠2 2 𝑛2 df = n2 + n1 -2
  • 19. Sweet AI Student t-test stanford.edu t-value < 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 → Do Not Reject H0 t-value > 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 → Reject H0 Degrees of freedom (df) 1. Calculate t-value and df 2. Determine on one or two tail test and the level of confidence 3. Look up critical value from t-table and determine to reject or fail to reject H0
  • 20. Sweet AI t-test vs. z-test Start Known 𝜎 sample size < 30 Is population highly skewed? t - test sign test sample size >= 30 Is population highly skewed? z - test Alternative methods Not known 𝜎 Is population highly skewed? t-test Alternative methods Yes No Yes No No Yes • t-test and z-test are used to determine and compare the significance of a set of data.
  • 21. Sweet AI Analysis Of Variance (ANOVA) • ANOVA determines the effects of several categorical independent variables on one numerical dependent variable. ANOVA One way 1 independent categorical variable on a single dependent variable Two way 2 independent categorical variables on a single dependent variable N-way Multiple independent categorical variables
  • 22. Sweet AI Analysis Of Variance (ANOVA) 1. Calculate variance between group and within groups 2. Calculate degree of freedom 3. Compute F-value. F − 𝑣𝑎𝑙𝑢𝑒 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛𝑔 𝑔𝑟𝑜𝑢𝑝𝑠 = 𝑆𝑆𝐵𝐺/𝑑𝑓1 𝑆𝑆𝑊𝐺/𝑑𝑓2 , df1 = n -1, df2 = (n – 1)m , n: # sample in each group, m: # groups 4. Find critical value/F-score from F Distribution table using df1, df2 and a selected alpha http://www.socr.ucla.edu/Applets.dir/F_Table.html 5. Compare F-value and F-score, if f-value < fcritical : fail to reject H0 else H0 is rejected
  • 24. Sweet AI Hypothesis Testing towardsdatascience.com
  • 26. Sweet AI Python Library Type of Test Scipy Code Determine Gaussian distribution of data from scipy.stats import shapiro/ normaltest stat, p = shapiro(data) # p > 0.05 has Gaussian distribution Determine linear relationship of two samples from scipy.stats import pearsonr stat, p = pearsonr(data1, data2) # p > 0.05 more likely they are independent Determine monotonic relationship of two samples from scipy.stats import spearmanr/ kendalltau stat, p = spearmanr(data1, data2) # p > 0.05 more likely they are independent Determine relationship of two categorical variables from scipy.stats import chi2_contingency stat, p, dof, expected = chi2_contingency(table) # p > 0.05 more likely they are independent Determine z-score or percentile from scipy import stats stats.norm.cdf(z) or stats.norm.ppf(p) Determine if the means of two independent normally distributed samples are significantly different (student t-test) from scipy.stats import ttest_ind stat, p = ttest_ind(data1, data2) # p > 0.05 more likely the same distribution Determine if the means of two paired normally distributed samples are significantly different (student t-test) from scipy.stats import ttest_rel stat, p = ttest_rel(data1, data2) # p > 0.05 more likely the same distribution Determine if the means of two or more independent normally distributed samples are significantly different (ANOVA) from scipy.stats import f_oneway stat, p = f_oneway(data1, data2, data3) # p > 0.05 more likely the same distribution Determine if the distribution of two independent samples are equal (Mann- Whitney U test) from scipy.stats import mannwhitneyu stat, p = mannwhitneyu(data1, data2) # p > 0.05 more likely the same distribution