SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Correlation Between Continuous
& Categorical Variables
CIVL 7012/8012
2
Association between variables
2
• Continuous and continuous variable
– Pearson’s correlation coefficient
• Categorical and categorical variable
– Chi-square test
– Cramer’s V
– Bonferroni correction
• Categorical and Continuous variable
– Point biserial correlation
3
Association between Continuous
Variables
• Compute Pearson’s correlation coefficient (r)
– r<0.3, weak correlation
– 0.3<r<0.7, moderate correlation
– r>0.7, high correlation
4
Association between categorical
variables
• Pearson’s correlation coefficient can not be
applied.
• What are some of the methods
• How to compute them
• What will be the conclusion
5
Set up hypothesis
• Null hypothesis: Assumes that there is no
association between the two variables.
• Alternative hypothesis: Assumes that there is
an association between the two variables.
6
Categorical variable
Observed Male Female
Married 456 516
Widowed 58 123
Divorced 142 172
Separated 29 50
Never married 188 207
Example: Two categorical variables: marital status and gender
Question: How do we measure degree of association?
Since these are categorical variables Pearson’s correlation coefficient will not work
Reference: https://peterstatistics.com
7
Pearson Chi-square test for independence
• Calculate estimated values
Expected Male Female
Married 437.1747 534.8253
Widowed 81.40804 99.59196
Divorced 141.2272 172.7728
Separated 35.53168 43.46832
Never married 177.6584 217.3416
𝐸𝑖,𝑗 =
𝑅𝑖 × 𝐶𝑗
𝑁
Observed Male Female
Married 456 516
Widowed 58 123
Divorced 142 172
Separated 29 50
Never married 188 207
8
Calculate chi-sq for each pair
(O-E)2/E Male Female
Married 0.810646 0.662634
Widowed 6.730738 5.501811
Divorced 0.004229 0.003457
Separated 1.2007 0.981471
Never married 0.601988 0.492074
𝑂𝑖,𝑗 − 𝐸𝑖,𝑗
2
𝐸𝑖,𝑗
Pearson Chi-square value (sum of all cells): 16.98975
9
Degrees of freedom and significance
• Degrees of freedom = (r-1) *(c-1)
– In this example: (5-1)*(2-1) = 4
• Significance: Chi-square (16.98975, 4) = 0.00194
• Reject null hypothesis
• Conclusion: there is an association between the
two variables.
10
Cramer’s V (1)
Cramer's V= 𝑠𝑞𝑟𝑡 (𝜒2
/ [𝑛 𝑞 − 1 ])
• q= min (# of rows, # of columns)
• Cramer’s V interpretation
– 0: The variables are not associated
– 1: The variables are perfectly associated
– 0.25: The variables are weakly associated
– .75: The variables are moderately associated
11
Cramer’s V (2)
• In this case
– Not associated
Observed> Male Female Total
Married 456 516 972
Widowed 58 123 181
Divorced 142 172 314
Separated 29 50 79
Never married 188 207 395
Total 873 1068 1941
Pearson Chi-square value: 16.98975
# of rows (r) 5
# of cols (c) 2
q 2
Cramer's V 0.093558
12
Bonferroni correction
Observed Male Female Total
Married 456 516 972
Widowed 58 123 181
Divorced 142 172 314
Separated 29 50 79
Never married 188 207 395
Total 873 1068 1941
Expected Male Female
Married 437.1747 534.8253
Widowed 81.40804 99.59196
Divorced 141.2272 172.7728
Separated 35.53168 43.46832
Never married 177.6584 217.3416
Adjusted Residuals
(O-E)2/E Male Female
Married 1.717883 -1.71788
Widowed -3.67295 3.672949
Divorced 0.095753 -0.09575
Separated -1.50823 1.508229
Never married 1.172004 -1.172
𝜒2
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙_𝑖, 𝑗 =
𝑂𝑖,𝑗 − 𝐸𝑖,𝑗
𝐸𝑖,𝑗 ∗ 1 −
𝑅𝑖
𝑛
1 −
𝐶𝑗
𝑛
Significance level 0.05
# of tests 10
Adjusted sig level 0.005
Only widowed male and female has significance association
13
Correlation between continuous and
categorial variables
• Point Biserial correlation
– product-moment correlation in which one variable
is continuous and the other variable is binary
(dichotomous)
– Categorical variable does not need to have
ordering
– Assumption: continuous data within each group
created by the binary variable are normally
distributed with equal variances and possibly
different means
14
Point Biserial correlation
• Suppose you want to find the correlation
between
– a continuous random variable Y and
– a binary random variable X which takes the values
zero and one.
• Assume that n paired observations (Yk, Xk), k
= 1, 2, …, n are available.
– If the common product-moment correlation r is
calculated from these data, the resulting
correlation is called the point-biserial correlation.
15
Point Biserial correlation
• Point biserial correlation is defined by
16
Hypothesis test

Más contenido relacionado

Similar a L17_CategoricalVariableAssociation.pdf

4Correlational analysis presentation .pdf
4Correlational analysis presentation .pdf4Correlational analysis presentation .pdf
4Correlational analysis presentation .pdf
MitikuTeka1
 

Similar a L17_CategoricalVariableAssociation.pdf (13)

Correlation and Regression Analysis.pptx
Correlation and Regression Analysis.pptxCorrelation and Regression Analysis.pptx
Correlation and Regression Analysis.pptx
 
Karl pearson's coefficient of correlation
Karl pearson's coefficient of correlationKarl pearson's coefficient of correlation
Karl pearson's coefficient of correlation
 
Unit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdfUnit 1 Correlation- BSRM.pdf
Unit 1 Correlation- BSRM.pdf
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptx
 
Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptx
 
Correlation continued
Correlation continuedCorrelation continued
Correlation continued
 
CORRELATION.pptx
CORRELATION.pptxCORRELATION.pptx
CORRELATION.pptx
 
Inferential statistics correlations
Inferential statistics correlationsInferential statistics correlations
Inferential statistics correlations
 
Correlation Computations Thiyagu
Correlation Computations ThiyaguCorrelation Computations Thiyagu
Correlation Computations Thiyagu
 
Measures of Dispersion
Measures of DispersionMeasures of Dispersion
Measures of Dispersion
 
Correlation
CorrelationCorrelation
Correlation
 
4Correlational analysis presentation .pdf
4Correlational analysis presentation .pdf4Correlational analysis presentation .pdf
4Correlational analysis presentation .pdf
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 

L17_CategoricalVariableAssociation.pdf

  • 1. Correlation Between Continuous & Categorical Variables CIVL 7012/8012
  • 2. 2 Association between variables 2 • Continuous and continuous variable – Pearson’s correlation coefficient • Categorical and categorical variable – Chi-square test – Cramer’s V – Bonferroni correction • Categorical and Continuous variable – Point biserial correlation
  • 3. 3 Association between Continuous Variables • Compute Pearson’s correlation coefficient (r) – r<0.3, weak correlation – 0.3<r<0.7, moderate correlation – r>0.7, high correlation
  • 4. 4 Association between categorical variables • Pearson’s correlation coefficient can not be applied. • What are some of the methods • How to compute them • What will be the conclusion
  • 5. 5 Set up hypothesis • Null hypothesis: Assumes that there is no association between the two variables. • Alternative hypothesis: Assumes that there is an association between the two variables.
  • 6. 6 Categorical variable Observed Male Female Married 456 516 Widowed 58 123 Divorced 142 172 Separated 29 50 Never married 188 207 Example: Two categorical variables: marital status and gender Question: How do we measure degree of association? Since these are categorical variables Pearson’s correlation coefficient will not work Reference: https://peterstatistics.com
  • 7. 7 Pearson Chi-square test for independence • Calculate estimated values Expected Male Female Married 437.1747 534.8253 Widowed 81.40804 99.59196 Divorced 141.2272 172.7728 Separated 35.53168 43.46832 Never married 177.6584 217.3416 𝐸𝑖,𝑗 = 𝑅𝑖 × 𝐶𝑗 𝑁 Observed Male Female Married 456 516 Widowed 58 123 Divorced 142 172 Separated 29 50 Never married 188 207
  • 8. 8 Calculate chi-sq for each pair (O-E)2/E Male Female Married 0.810646 0.662634 Widowed 6.730738 5.501811 Divorced 0.004229 0.003457 Separated 1.2007 0.981471 Never married 0.601988 0.492074 𝑂𝑖,𝑗 − 𝐸𝑖,𝑗 2 𝐸𝑖,𝑗 Pearson Chi-square value (sum of all cells): 16.98975
  • 9. 9 Degrees of freedom and significance • Degrees of freedom = (r-1) *(c-1) – In this example: (5-1)*(2-1) = 4 • Significance: Chi-square (16.98975, 4) = 0.00194 • Reject null hypothesis • Conclusion: there is an association between the two variables.
  • 10. 10 Cramer’s V (1) Cramer's V= 𝑠𝑞𝑟𝑡 (𝜒2 / [𝑛 𝑞 − 1 ]) • q= min (# of rows, # of columns) • Cramer’s V interpretation – 0: The variables are not associated – 1: The variables are perfectly associated – 0.25: The variables are weakly associated – .75: The variables are moderately associated
  • 11. 11 Cramer’s V (2) • In this case – Not associated Observed> Male Female Total Married 456 516 972 Widowed 58 123 181 Divorced 142 172 314 Separated 29 50 79 Never married 188 207 395 Total 873 1068 1941 Pearson Chi-square value: 16.98975 # of rows (r) 5 # of cols (c) 2 q 2 Cramer's V 0.093558
  • 12. 12 Bonferroni correction Observed Male Female Total Married 456 516 972 Widowed 58 123 181 Divorced 142 172 314 Separated 29 50 79 Never married 188 207 395 Total 873 1068 1941 Expected Male Female Married 437.1747 534.8253 Widowed 81.40804 99.59196 Divorced 141.2272 172.7728 Separated 35.53168 43.46832 Never married 177.6584 217.3416 Adjusted Residuals (O-E)2/E Male Female Married 1.717883 -1.71788 Widowed -3.67295 3.672949 Divorced 0.095753 -0.09575 Separated -1.50823 1.508229 Never married 1.172004 -1.172 𝜒2 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙_𝑖, 𝑗 = 𝑂𝑖,𝑗 − 𝐸𝑖,𝑗 𝐸𝑖,𝑗 ∗ 1 − 𝑅𝑖 𝑛 1 − 𝐶𝑗 𝑛 Significance level 0.05 # of tests 10 Adjusted sig level 0.005 Only widowed male and female has significance association
  • 13. 13 Correlation between continuous and categorial variables • Point Biserial correlation – product-moment correlation in which one variable is continuous and the other variable is binary (dichotomous) – Categorical variable does not need to have ordering – Assumption: continuous data within each group created by the binary variable are normally distributed with equal variances and possibly different means
  • 14. 14 Point Biserial correlation • Suppose you want to find the correlation between – a continuous random variable Y and – a binary random variable X which takes the values zero and one. • Assume that n paired observations (Yk, Xk), k = 1, 2, …, n are available. – If the common product-moment correlation r is calculated from these data, the resulting correlation is called the point-biserial correlation.
  • 15. 15 Point Biserial correlation • Point biserial correlation is defined by