1. Logistic regression
Dr. Khaled Mahmoud Abd Elaziz
Lecturer of public health and preventive
medicine, Faculty of Medicine-
Ain Shams University
2. Logistic regression is very similar to linear
regression.
When we use logistic regression?
We use it when we have a (binary outcome) of
interest and a number of explanatory
variables.
Outcome:
e.g. the presence of absence of a
symptom, presence or absence of a disease
3. Logistic regression is very similar to linear
regression.
When we use logistic regression?
We use it when we have a (binary outcome) of
interest and a number of explanatory
variables.
Outcome:
e.g. the presence of absence of a
symptom, presence or absence of a disease
4. From the equation of the logistic
regression model we can do:
1-we can determine which explanatory
variables can influence the outcome.
Which means which variables had the
highest OR or the risk in production of
the outcome
(1= has the disease 0= doesn’t have the
disease)
5. From the equation of the logistic
regression model we can do:
2- we can use an individual
values of the explanatory
variables to evaluate he or she
will have a particular outcome
6. we start the logistic regression model by
creating a binary variable to represent
the outcome (Dependant variable) (1=
has the disease 0=doesn’t have the
disease)
We take the probability P of an individual
has the highest coded category (has
the disease) as the dependant variable.
We use the logit logistic transformation in
the regression equation
7. The logit is the natural logarithm of the odds ratio of
‘disease’
Logit (P)= ln P/ 1-p
The logistic regression equation
Logit (p)= a + b1X1+ b2X2 + b3X3 +……… + biXi
X= Explanatory variables
P= estimated value of true probability that an
individual with a particular set of values for X has
the disease. P corresponds to the proportion with
the disease, it has underlying binominal
distribution
b= estimated logistic regression coefficients
The exponential of a particular coefficient for
example eb1 is an estimated of the odds ratio.
8. For a particular value of X1 the estimated
odds of the disease while adjusting for all
other X’s in the equation.
As the logistic regression is fitted on a log
scale the effects of X’s are multiplicative on
the odds of the disease . This means that
their combined effect is the product of their
separate effects.
This is unlike linear regression where the
effects of X’s on the dependant variables
are additive.
9. Plain English:
1-Take the significant variables in the
univariate analysis
2-Set the P value that you will take those
variables to be put in the models e.g. 0.05 or
0.1
3-if all variables in the univariate analysis are
insignificant ? Don’t bother doing logisitic
regression. There is no question here about
those variables for prediction of the disease
10. Plain English:
4- the idea of doing a logisitic regression we have
two many variables that are significant with the
outcome we are looking for and we want to know
which is more stronger in prediction of the disease
outcome
5- we look in the output of the statistical program for
Odds ratio and CI, significance of the variable,
manipulate to select of the best combination of
explanatory variables
11. Plain English:
4- the idea of doing a logisitic regression we have
two many variables that are significant with the
outcome we are looking for and we want to know
which is more stronger in prediction of the disease
outcome
Mathematical model that describes the relationship
between an outcome with one or more explanatory
variables
5- we look in the output of the statistical program for
Odds ratio and CI, significance of the
variable, manipulate to select of the best
combination of explanatory variables
12. Example:
A study was done to test the relationship
between HHV8 infection and sexual
behavior of men, were asked about histories
of sexually transmitted diseases in the past (
gonorrhea, syphilis, HSV2, and HIV)
The explanatory variables were the presence
of each of the four infection coded as 0 if the
patient has no history or 1 if the patient had
a history of that infection and the patient age
in years
13. Dependant outcome HHV8 infection
Parameter
estimate
P OR 95% CI
Intercept -2.2242 0.006
Gonorrhea 0.5093 0.243 1.664 0.71-3.91
Syphilis 1.1924 0.093 3.295 0.82-13.8
HSV2 0.7910 0.0410 2.206 1.03-4.71
HIV 1.6357 0.0067 5.133 1.57-
16.73
Age 0.0062 0.76 1.006 0.97-1.05
14. Example:
Chi square for covariate= 24.5 P=0.002
Indicating at least one of the covariates
is significantly associated with HHV-8
serostatus.
HSV-2 positively associated with HHV8
infection P=0.04
HIV is positively associated with HHV 8
infection P=0.007
15. Those with a history of HSV-2 having 2.21
times odds of being HHV-8 positive
compared to those with negative history
after adjusting for other infections
Those with a history of HIV having 5.1 times
odds of being HHV-8 positive compared to
those with negative history after adjusting
for other infections
16. Multiplicative effect of the model suggests a
man who is both HSV2 and HIV seropositive
is estimated to have 2.206 X 5.133 = 11.3
times the odds of HHV 8 infection compared
to a man negative for both after adjusting for
the other two infections.
In this example gonorrhea had a significant
chi-square but when entered in the model it
was not significant
(no indication of independent relationship
between a history of gonorrhea and HHV8
seropositivity)
17. There is no significant relationship
between HHV8 seropositivity and
age, the odds ratio indicates that the
estimated odds of HHV8 seropositivity
increases by 0.6% for each additional
year of age.
18. What is the probability of 51 year old man has
HHV8 infection if he has gonorrhea positive
and HSV2 positive but doesn’t have the two
other diseases (Syphilis and HIV)?
Add up the regression coefficients
Constant +b1 +b2 +b3X age
-2.2242 + 0.5093+0.7910+ (0.0062X51)=
-0.6077
19. probability of this person=
P= ez / 1+ ez
P= e (-0.6077)/ 1+ e (-0.6077) =0.35