SlideShare una empresa de Scribd logo
1 de 28
Logistic Regression in Case-
Control study using – A
statistical tool
Satish Gupta
What is R?
 The R statistical programming language is a free open
source package.
 The language is very powerful for writing programs.
 Many statistical functions are already built in.
 Contributed packages expand the functionality to
cutting edge research.
Getting Started
 Go to www.r-project.org
 Downloads: CRAN (Comprehensive R Archive
Network)
 Set your Mirror: location close to you.
 Select Windows 95 or later, MacOS or UNIX
platforms
Getting Started
Basic operators and calculations
Comparison operators
 equal: ==
 not equal: !=
 greater/less than: > <
 greater/less than or equal: >= <=
Example: 1 == 1 # Returns TRUE
Basic operators and calculations
Logical operators
 AND: &
x <- 1:10; y <- 10:1 # Creates the sample vectors 'x' and 'y'.
x > y & x > 5 # Returns TRUE where both comparisons return TRUE.
 OR: |
x == y | x != y # Returns TRUE where at least one comparison is
TRUE.
 NOT: !
!x > y # The '!' sign returns the negation (opposite) of a logical
vector.
Basic operators and calculations
Calculations
 Four basic arithmetic functions: addition, subtraction,
multiplication and division
1 + 1; 1 - 1; 1 * 1; 1 / 1 # Returns results of basic arithmetic
calculations.
 Calculations on vectors
x <- 1:10; sum(x); mean(x), sd(x); sqrt(x) # Calculates for
the vector x its sum, mean, standard deviation and square root.
x <- 1:10; y <- 1:10; x + y # Calculates the sum for each element
in the vectors x and y.
R-Graphics
R provides comprehensive graphics utilities for
visualizing and exploring scientific data. It includes:
 Scatter plots
 Line plots
 Bar plots
 Pie charts
 Heatmaps
 Venn diagrams
 Density plots
 Box plots
Data handling in R
 Load data: mydata = read.csv(“/path/mydata.csv”)
 See data on screen: data(mydata)
 See top part of data: head(mydata)
 Specific number of rows and column of data:
mydata[1:10,1:3]
 To get a type of data: class(mydata)
 Changing class of data: newdata = as.matrix(mydata)
 Summary of data: summary(mydata)
 Selecting (KEEPING) variables (columns)
newdata = mydata[c(1,3:5)]
Data handling in R
 Selecting observations
newdata= subset(mydata, age>=20 | age <10,
select=c(ID, weight)
newdata= subset(mydata, sex==“Male” & age >25,
select=weight:income)
 Excluding (DROPPING) variables (columns)
newdata = mydata[c(-3,-5)]
mydata$v3 = NULL
R-Library
 There are many tools defined as “package” are present in R for
different kind of analysis including data from genetics and
genomics.
 Depending upon the availability of library, it can be
downloaded from two sources
Using CRAN (Comprehensive R Archive Network) as:
install.packages(“package_name”)
Using Bioconductor as:
source("http://bioconductor.org/biocLite.R")
biocLite(“package_name”)
R-Library
 To load a package,
library() #Lists all libraries/packages that are available on a system.
library(genetics) #Package for genetics data analysis
library(help=genetics) #Lists all functions/objects of “genetics”
package
?function #Opens documentation of a function
What is Logistic Regression?
 Logistic regression describes the relationship between
a dichotomous response variable and a set of
explanatory variables.
 Logistic regression is often used because the
relationship between the DV (a discrete variable) and
a predictor is non-linear.
 A General Model:
Logistic Regression
JJ
disease
disease
disease XX
p
p
p βββ +++=
−
= 110)
1
log()logit(
Where:
pdisease is the probability that an individual has a particular
disease.
β0 is the intercept
β1, β2 …βJ are the coefficients (effects) of genetic factors
X1, X2 …XJ are the variables of genetic factors
Assumptions
 Logistic regression does not make any assumptions
of normality, linearity, and homogeneity of variance
for the independent variables.
 Because it does not impose these requirements, it is
preferred to discriminant analysis when the data does
not satisfy these assumptions.
Questions ??
 What is the relative importance of each predictor variable?
 How does each predictor variable affect the outcome?
 Does a predictor variable make the solution better or
worse or have no effect?
 Are there interactions among predictors?
 Does adding interactions among predictors
(continuous or categorical) improve the model?
 What is the strength of association between the outcome
variable and a set of predictors?
 Often in model comparison you want non-significant
differences so strength of association is reported for
even non-significant effects.
Types of Logistic Regression
 Unconditional logistic regression
 Conditional logistic regression
** Rule of thumbs
 Use conditional logistic regression if matching has been done,
and unconditional if there has been no matching.
 When in doubt, use conditional because it always gives
unbiased results. The unconditional method is said to
overestimate the odds ratio if it is not appropriate.
Data Format
Status Matset Se_Quartiles GPX1 GPX4 SEP15 TXN2
1 1 <60 CT TT AG AG
0 1 >60 – 70 CC CC GG GG
1 2 <60 TT CC AG AA
0 2 >70 – 80 CC CT GG GG
1 3 >80 CC CC AA AA
0 3 >60 – 70 CT TT GG GG
1 4 <60 CC CC AA AG
0 4 >70 – 80 TT TT GG GG
1 5 >80 CC CC AG AA
0 5 <60 CC CC GG GG
1 6 >70 – 80 CT TT AA AA
0 6 >80 CC CC GG AG
1 7 >60 – 70 TT CC AA AG
Data and Library loading
 Load and use data in R (Using Lung cancer data from
PLoS One 2013, 8(3):e59051).
lung = read.csv(“/path/lung.csv”, sep= “t”, header = TRUE)
 Load the library and use data for analysis
library(epicalc)
use(lung)
Data Analysis
 Performing conditional logistic regression (Case vs. Control)
clogit_lung = clogit(Status ~ Se_Quartiles + strata(Matset), data = .data)
clogistic.display(clogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.4(0.15 – 1.09) 0.074
>70 – 80 0.11(0.03 – 0.33) <0.001
>80 0.10(0.03 – 0.34) <0.001
Data Analysis
 Performing conditional logistic regression (Case vs. Control),
clogit_lung = clogit(Status ~ GPX1+ strata(Matset), data = .data)
clogistic.display(clogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
GPX1: ref.=CC 0.032
CT 0.44(0.22 – 0.86) 0.017
TT 0.42(0.13 – 1.38) 0.151
Data Analysis
 Performing conditional logistic regression (Case vs. Control),
clogit_lung = clogit(Status ~ Se_Quartiles + GPX1+ strata(Matset), data = .data)
clogistic.display(clogit_lung)
 
crude
OR(95%CI)
adj.
OR(95%CI)
P(Wald's
test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.4(0.15 – 1.09) 0.32(0.11 – 0.96) 0.042
>70 – 80 0.11(0.03 – 0.33) 0.09(0.02 – 0.3) <0.001
>80 0.1(0.03 – 0.34) 0.05(0.01 – 0.23) <0.001
GPX1:ref.=CC 0.006
CT 0.44(0.22 – 0.86) 0.26(0.11 – 0.65) 0.004
TT 0.42(0.13 – 1.38) 0.44(0.09 – 2.18) 0.313
Environmental
Factor
Genetic Factor
Data Analysis
 Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data =
.data)
logistic.display(ulogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.41 (0.17 – 1.02) 0.054
>70 – 80 0.13 (0.05 – 0.34) <0.001
>80 0.17 (0.07 – 0.42) <0.001
Data Analysis
 Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ GPX1 , family=binomial, data = .data)
logistic.display(ulogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=CC 0.034
CT 0.45 (0.24 – 0.85) 0.014
TT 0.44 (0.14 – 1.36) 0.156
Data Analysis
 Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data =
.data)
logistic.display(ulogit_lung)
crude
OR(95%CI)
adj.
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.41 (0.17 – 1.02) 0.43 (0.17 – 1.08) 0.074
>70 – 80 0.13 (0.05 – 0.34) 0.13 (0.05 – 0.34) <0.001
>80 0.17 (0.07 – 0.42) 0.15 (0.06 – 0.39) <0.001
GPX1:ref.=CC 0.024
CT 0.45 (0.24 – 0.85) 0.40(0.20 – 0.80) 0.01
TT 0.44 (0.14 – 1.36) 0.42 (0.12 – 1.41) 0.161
Something More 
 Changing the default reference
GPX1 = relevel(GPX1, ref = "TT")
pack()
 Saving the result
result = clogistic.display(clogit_lung)
write.csv(result$table, file=“path/result.csv“, sep = “t”)
write.table(result$table, file=“path/result.xls“, sep = “t”)
Summary: regression models
 Regression models can be used to describe the
average effect of predictors on outcomes in your data
set.
 They can tell how likely that the effect is just be due
to chance.
 They can look at each predictor “adjusting for” the
others (estimating what would happen if all others
were held constant.)
Thanks to,
Prof. Virasakdi Chongsuvivatwong
Epidemiology Unit,
Faculty of Medicine,
Prince of Songkla University, Thailand

Más contenido relacionado

La actualidad más candente

4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regressionA M
 
5.7 poisson regression in the analysis of cohort data
5.7 poisson regression in the analysis of  cohort data5.7 poisson regression in the analysis of  cohort data
5.7 poisson regression in the analysis of cohort dataA M
 
correlation.final.ppt (1).pptx
correlation.final.ppt (1).pptxcorrelation.final.ppt (1).pptx
correlation.final.ppt (1).pptxChieWoo1
 
missing-data-and-multiple-imputation-in-clinical-epidemiolog
missing-data-and-multiple-imputation-in-clinical-epidemiolog missing-data-and-multiple-imputation-in-clinical-epidemiolog
missing-data-and-multiple-imputation-in-clinical-epidemiolog simbycris
 
bio statistics for clinical research
bio statistics for clinical researchbio statistics for clinical research
bio statistics for clinical researchRanjith Paravannoor
 
Reporting chi square goodness of fit test of independence in apa
Reporting chi square goodness of fit test of independence in apaReporting chi square goodness of fit test of independence in apa
Reporting chi square goodness of fit test of independence in apaKen Plummer
 
Data Analysis with SPSS : One-way ANOVA
Data Analysis with SPSS : One-way ANOVAData Analysis with SPSS : One-way ANOVA
Data Analysis with SPSS : One-way ANOVADr Ali Yusob Md Zain
 
Lecture - ANCOVA 4 Slides.pdf
Lecture - ANCOVA 4 Slides.pdfLecture - ANCOVA 4 Slides.pdf
Lecture - ANCOVA 4 Slides.pdfmuhammad shahid
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionKaushik Rajan
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)MikeBlyth
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
Tutorial repeated measures ANOVA
Tutorial   repeated measures ANOVATutorial   repeated measures ANOVA
Tutorial repeated measures ANOVAKen Plummer
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regressionAkhilesh Joshi
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepDan Wellisch
 

La actualidad más candente (20)

Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
spss teaching
spss teachingspss teaching
spss teaching
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
 
5.7 poisson regression in the analysis of cohort data
5.7 poisson regression in the analysis of  cohort data5.7 poisson regression in the analysis of  cohort data
5.7 poisson regression in the analysis of cohort data
 
In Anova
In  AnovaIn  Anova
In Anova
 
correlation.final.ppt (1).pptx
correlation.final.ppt (1).pptxcorrelation.final.ppt (1).pptx
correlation.final.ppt (1).pptx
 
missing-data-and-multiple-imputation-in-clinical-epidemiolog
missing-data-and-multiple-imputation-in-clinical-epidemiolog missing-data-and-multiple-imputation-in-clinical-epidemiolog
missing-data-and-multiple-imputation-in-clinical-epidemiolog
 
bio statistics for clinical research
bio statistics for clinical researchbio statistics for clinical research
bio statistics for clinical research
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Reporting chi square goodness of fit test of independence in apa
Reporting chi square goodness of fit test of independence in apaReporting chi square goodness of fit test of independence in apa
Reporting chi square goodness of fit test of independence in apa
 
Data Analysis with SPSS : One-way ANOVA
Data Analysis with SPSS : One-way ANOVAData Analysis with SPSS : One-way ANOVA
Data Analysis with SPSS : One-way ANOVA
 
Lecture - ANCOVA 4 Slides.pdf
Lecture - ANCOVA 4 Slides.pdfLecture - ANCOVA 4 Slides.pdf
Lecture - ANCOVA 4 Slides.pdf
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
 
On p-values
On p-valuesOn p-values
On p-values
 
Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)Logistic regression (blyth 2006) (simplified)
Logistic regression (blyth 2006) (simplified)
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Tutorial repeated measures ANOVA
Tutorial   repeated measures ANOVATutorial   repeated measures ANOVA
Tutorial repeated measures ANOVA
 
Data management through spss
Data management through spssData management through spss
Data management through spss
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-Step
 

Destacado

Destacado (15)

ACCUPASS活動通 行銷廣告版位說明
ACCUPASS活動通 行銷廣告版位說明ACCUPASS活動通 行銷廣告版位說明
ACCUPASS活動通 行銷廣告版位說明
 
Spatial Data Science with R
Spatial Data Science with RSpatial Data Science with R
Spatial Data Science with R
 
Confounder and effect modification
Confounder and effect modificationConfounder and effect modification
Confounder and effect modification
 
手把手教你 R 語言分析實務
手把手教你 R 語言分析實務手把手教你 R 語言分析實務
手把手教你 R 語言分析實務
 
R統計軟體簡介
R統計軟體簡介R統計軟體簡介
R統計軟體簡介
 
Bias and confounding
Bias and confoundingBias and confounding
Bias and confounding
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Dummy variable
Dummy variableDummy variable
Dummy variable
 
CM KaggleTW Share
CM KaggleTW ShareCM KaggleTW Share
CM KaggleTW Share
 
R programming
R programmingR programming
R programming
 
Antenatal care
Antenatal careAntenatal care
Antenatal care
 
Variables
VariablesVariables
Variables
 
Variables
 Variables Variables
Variables
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
SAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORSSAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORS
 

Similar a Logistic Regression in Case-Control Study

Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysisRaman Kannan
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxGairuzazmiMGhani
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression modelsHamideh Iraj
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data AnalyticsABHISHEKDAHALE
 
Accounting serx
Accounting serxAccounting serx
Accounting serxzeer1234
 
Accounting serx
Accounting serxAccounting serx
Accounting serxzeer1234
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learningAkhilesh Joshi
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Adrian Olszewski
 
analysis part 02.pptx
analysis part 02.pptxanalysis part 02.pptx
analysis part 02.pptxefrembeyene4
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization Andrea Rubio
 
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATADETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATAIJCSEA Journal
 

Similar a Logistic Regression in Case-Control Study (20)

Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptx
 
Data mining with R- regression models
Data mining with R- regression modelsData mining with R- regression models
Data mining with R- regression models
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
 
Accounting serx
Accounting serxAccounting serx
Accounting serx
 
Gene expression profiling ii
Gene expression profiling  iiGene expression profiling  ii
Gene expression profiling ii
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
 
analysis part 02.pptx
analysis part 02.pptxanalysis part 02.pptx
analysis part 02.pptx
 
working with python
working with pythonworking with python
working with python
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization
 
Quality data management
Quality data managementQuality data management
Quality data management
 
Quality data management
Quality data managementQuality data management
Quality data management
 
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATADETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
 

Último

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 

Último (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 

Logistic Regression in Case-Control Study

  • 1. Logistic Regression in Case- Control study using – A statistical tool Satish Gupta
  • 2. What is R?  The R statistical programming language is a free open source package.  The language is very powerful for writing programs.  Many statistical functions are already built in.  Contributed packages expand the functionality to cutting edge research.
  • 3. Getting Started  Go to www.r-project.org  Downloads: CRAN (Comprehensive R Archive Network)  Set your Mirror: location close to you.  Select Windows 95 or later, MacOS or UNIX platforms
  • 5. Basic operators and calculations Comparison operators  equal: ==  not equal: !=  greater/less than: > <  greater/less than or equal: >= <= Example: 1 == 1 # Returns TRUE
  • 6. Basic operators and calculations Logical operators  AND: & x <- 1:10; y <- 10:1 # Creates the sample vectors 'x' and 'y'. x > y & x > 5 # Returns TRUE where both comparisons return TRUE.  OR: | x == y | x != y # Returns TRUE where at least one comparison is TRUE.  NOT: ! !x > y # The '!' sign returns the negation (opposite) of a logical vector.
  • 7. Basic operators and calculations Calculations  Four basic arithmetic functions: addition, subtraction, multiplication and division 1 + 1; 1 - 1; 1 * 1; 1 / 1 # Returns results of basic arithmetic calculations.  Calculations on vectors x <- 1:10; sum(x); mean(x), sd(x); sqrt(x) # Calculates for the vector x its sum, mean, standard deviation and square root. x <- 1:10; y <- 1:10; x + y # Calculates the sum for each element in the vectors x and y.
  • 8. R-Graphics R provides comprehensive graphics utilities for visualizing and exploring scientific data. It includes:  Scatter plots  Line plots  Bar plots  Pie charts  Heatmaps  Venn diagrams  Density plots  Box plots
  • 9. Data handling in R  Load data: mydata = read.csv(“/path/mydata.csv”)  See data on screen: data(mydata)  See top part of data: head(mydata)  Specific number of rows and column of data: mydata[1:10,1:3]  To get a type of data: class(mydata)  Changing class of data: newdata = as.matrix(mydata)  Summary of data: summary(mydata)  Selecting (KEEPING) variables (columns) newdata = mydata[c(1,3:5)]
  • 10. Data handling in R  Selecting observations newdata= subset(mydata, age>=20 | age <10, select=c(ID, weight) newdata= subset(mydata, sex==“Male” & age >25, select=weight:income)  Excluding (DROPPING) variables (columns) newdata = mydata[c(-3,-5)] mydata$v3 = NULL
  • 11. R-Library  There are many tools defined as “package” are present in R for different kind of analysis including data from genetics and genomics.  Depending upon the availability of library, it can be downloaded from two sources Using CRAN (Comprehensive R Archive Network) as: install.packages(“package_name”) Using Bioconductor as: source("http://bioconductor.org/biocLite.R") biocLite(“package_name”)
  • 12. R-Library  To load a package, library() #Lists all libraries/packages that are available on a system. library(genetics) #Package for genetics data analysis library(help=genetics) #Lists all functions/objects of “genetics” package ?function #Opens documentation of a function
  • 13. What is Logistic Regression?  Logistic regression describes the relationship between a dichotomous response variable and a set of explanatory variables.  Logistic regression is often used because the relationship between the DV (a discrete variable) and a predictor is non-linear.
  • 14.  A General Model: Logistic Regression JJ disease disease disease XX p p p βββ +++= − = 110) 1 log()logit( Where: pdisease is the probability that an individual has a particular disease. β0 is the intercept β1, β2 …βJ are the coefficients (effects) of genetic factors X1, X2 …XJ are the variables of genetic factors
  • 15. Assumptions  Logistic regression does not make any assumptions of normality, linearity, and homogeneity of variance for the independent variables.  Because it does not impose these requirements, it is preferred to discriminant analysis when the data does not satisfy these assumptions.
  • 16. Questions ??  What is the relative importance of each predictor variable?  How does each predictor variable affect the outcome?  Does a predictor variable make the solution better or worse or have no effect?  Are there interactions among predictors?  Does adding interactions among predictors (continuous or categorical) improve the model?  What is the strength of association between the outcome variable and a set of predictors?  Often in model comparison you want non-significant differences so strength of association is reported for even non-significant effects.
  • 17. Types of Logistic Regression  Unconditional logistic regression  Conditional logistic regression ** Rule of thumbs  Use conditional logistic regression if matching has been done, and unconditional if there has been no matching.  When in doubt, use conditional because it always gives unbiased results. The unconditional method is said to overestimate the odds ratio if it is not appropriate.
  • 18. Data Format Status Matset Se_Quartiles GPX1 GPX4 SEP15 TXN2 1 1 <60 CT TT AG AG 0 1 >60 – 70 CC CC GG GG 1 2 <60 TT CC AG AA 0 2 >70 – 80 CC CT GG GG 1 3 >80 CC CC AA AA 0 3 >60 – 70 CT TT GG GG 1 4 <60 CC CC AA AG 0 4 >70 – 80 TT TT GG GG 1 5 >80 CC CC AG AA 0 5 <60 CC CC GG GG 1 6 >70 – 80 CT TT AA AA 0 6 >80 CC CC GG AG 1 7 >60 – 70 TT CC AA AG
  • 19. Data and Library loading  Load and use data in R (Using Lung cancer data from PLoS One 2013, 8(3):e59051). lung = read.csv(“/path/lung.csv”, sep= “t”, header = TRUE)  Load the library and use data for analysis library(epicalc) use(lung)
  • 20. Data Analysis  Performing conditional logistic regression (Case vs. Control) clogit_lung = clogit(Status ~ Se_Quartiles + strata(Matset), data = .data) clogistic.display(clogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.4(0.15 – 1.09) 0.074 >70 – 80 0.11(0.03 – 0.33) <0.001 >80 0.10(0.03 – 0.34) <0.001
  • 21. Data Analysis  Performing conditional logistic regression (Case vs. Control), clogit_lung = clogit(Status ~ GPX1+ strata(Matset), data = .data) clogistic.display(clogit_lung) OR(95%CI) P(Wald's test) P(LR-test) GPX1: ref.=CC 0.032 CT 0.44(0.22 – 0.86) 0.017 TT 0.42(0.13 – 1.38) 0.151
  • 22. Data Analysis  Performing conditional logistic regression (Case vs. Control), clogit_lung = clogit(Status ~ Se_Quartiles + GPX1+ strata(Matset), data = .data) clogistic.display(clogit_lung)   crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.4(0.15 – 1.09) 0.32(0.11 – 0.96) 0.042 >70 – 80 0.11(0.03 – 0.33) 0.09(0.02 – 0.3) <0.001 >80 0.1(0.03 – 0.34) 0.05(0.01 – 0.23) <0.001 GPX1:ref.=CC 0.006 CT 0.44(0.22 – 0.86) 0.26(0.11 – 0.65) 0.004 TT 0.42(0.13 – 1.38) 0.44(0.09 – 2.18) 0.313 Environmental Factor Genetic Factor
  • 23. Data Analysis  Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data) logistic.display(ulogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.41 (0.17 – 1.02) 0.054 >70 – 80 0.13 (0.05 – 0.34) <0.001 >80 0.17 (0.07 – 0.42) <0.001
  • 24. Data Analysis  Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ GPX1 , family=binomial, data = .data) logistic.display(ulogit_lung) OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=CC 0.034 CT 0.45 (0.24 – 0.85) 0.014 TT 0.44 (0.14 – 1.36) 0.156
  • 25. Data Analysis  Performing unconditional logistic regression (Case vs. Control), ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data) logistic.display(ulogit_lung) crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test) Quartiles: ref.=<60 <0.001 >60 – 70 0.41 (0.17 – 1.02) 0.43 (0.17 – 1.08) 0.074 >70 – 80 0.13 (0.05 – 0.34) 0.13 (0.05 – 0.34) <0.001 >80 0.17 (0.07 – 0.42) 0.15 (0.06 – 0.39) <0.001 GPX1:ref.=CC 0.024 CT 0.45 (0.24 – 0.85) 0.40(0.20 – 0.80) 0.01 TT 0.44 (0.14 – 1.36) 0.42 (0.12 – 1.41) 0.161
  • 26. Something More   Changing the default reference GPX1 = relevel(GPX1, ref = "TT") pack()  Saving the result result = clogistic.display(clogit_lung) write.csv(result$table, file=“path/result.csv“, sep = “t”) write.table(result$table, file=“path/result.xls“, sep = “t”)
  • 27. Summary: regression models  Regression models can be used to describe the average effect of predictors on outcomes in your data set.  They can tell how likely that the effect is just be due to chance.  They can look at each predictor “adjusting for” the others (estimating what would happen if all others were held constant.)
  • 28. Thanks to, Prof. Virasakdi Chongsuvivatwong Epidemiology Unit, Faculty of Medicine, Prince of Songkla University, Thailand

Notas del editor

  1. Coeffcients are calculated my MLE
  2. In order to test hypotheses in logistic regression, we have used the likelihood ratio test and the Wald test.
  3. If the confidence interval includes 0 we can say that there is no significant difference between the means of the two populations, at a given level of confidence. The width of the confidence interval gives us some idea about how uncertain we are about the difference in the means. A very wide interval may indicate that more data should be collected before anything definite can be said. A confidence interval that includes 1.0 means that the association between the exposure and outcome could have been found by chance alone and that the association is not statistically significant.
  4. Binomial is specifying a choice of variance and link functions. Variance is binomial and link is logit function.