(Pre-requisite)

(AST405) Lifetime data analysis

Author

Md Rasel Biswas

0 Parametric Regression Models

0.1 Linear Regression Model

Independent two-sample t-test

  • Population A

    • Response Y1N(μ1,σ2)

    • Random sample {y11,y12,,y1n1}

  • Population B

    • Response Y2N(μ2,σ2)

    • Random sample {y21,y22,,y2n2}

Objective H0:μ1=μ2


  • Test statistic t=y¯1y¯2se(y¯1y¯2)=y¯1y¯2sp(1/n1)+(1/n2)tn1+n22
    • sp2=(n11)s12+(n21)s22n1+n22

    • y¯i=(1/ni)j=1niyij

  • (1α)100% confidence interval for (μ1μ2) (y¯1y¯2)±tn1+n22,1α/2se(y¯1y¯2)

Melanoma mortality data

  • Incidence of melanoma can be related to the amount of sunshine, equivalently to the latitude of the area

  • Data were collected on malignant melanoma of the skin of white males during the period 1950–69 for each state of the US

  • Data on mortality rate (per 10 million), 1965 population (in a million), latitude, longitude, and the whether the state borders on the ocean are recorded


Download mordat data

glimpse(mordat)
Rows: 49
Columns: 6
$ state     <chr> "Alabama", "Arizona", "Arkansas", "California", "Colorado", …
$ mortality <dbl> 219, 160, 170, 182, 149, 159, 200, 177, 197, 214, 116, 124, …
$ latitude  <dbl> 33.0, 34.5, 35.0, 37.5, 39.0, 41.8, 39.0, 39.0, 28.0, 33.0, …
$ longitude <dbl> 87.0, 112.0, 92.5, 119.5, 105.5, 72.8, 75.5, 77.0, 82.0, 83.…
$ pop       <dbl> 3.46, 1.61, 1.96, 18.60, 1.97, 2.83, 0.50, 0.76, 5.80, 4.36,…
$ ocean     <fct> 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, …

  • For a state, does “contiguous to the ocean” (Yes or No) significantly affect melanoma mortality rate? contiguous to ocean={Yesocean=1Noocean=0

Distribution of mortality rate by contiguous to ocean

Independent two-sample t-test

test1 <- t.test(mortality ~ ocean, var.equal = T, 
            data = mordat)
broom::tidy(test1) |>
  select(1:8) |> kable(digits = 3)
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
-31.487 138.741 170.227 -3.684 0.001 47 -48.68 -14.293
  • “Contiguity to the ocean” has a significant effect on average mortality rate (p<.001)

Simple linear regression model

  • Objective is to compare average mortality rate (Y) between two group of states (ocean=0 or ocean=1)

(1)E(Y|ocean=0)=μ0

(2)E(Y|ocean=1)=μ1

  • Expressing and in one equation (3)E(Y|x1)=μ0+(μ1μ0)x1 where x1={1if ocean=10if ocean=0

  • Consider a model for mortality (Y) on ocean (x1) E(Y|x1)=μ0+(μ1μ0)x1=β0+β1x1 where β0=μ0andβ1=(μ1μ0)

  • Comparison between two groups of states H0:β1=0H0:μ1=μ2


  • Another way of defining simple linear model, let Y(x) be response corresponding to a subject with predictor x and assume (Y|x)=Y(x)N(μ(x),σ2)
  • Parametric regression model μ(x)=E(Y|x)=β0+β1xV(Y|x)=σ2

  • Data {(yi,xi),i=1,,n}

  • Assume y’s are independent and YiN(μ(xi),σ2)

  • Simple linear regression model

μ(x)=E(Y|x)=β0+β1xV(Y|x)=σ2

  • Both maximum likelihood and ordinary least square methods can be used to estimate the parameters of linear regression models

Estimation of parameters

  • Log-likelihood function (β0,β1)=logi=1n(1/σ)ϕ(yiμ(xi)σ)=nlogσ+i=1nlogϕ(yiβ0β1xiσ)

  • MLEs (β^0,β^1)=argmaxΘ(β0,β1)


  • Estimating the effect of “contiguous to ocean” (x1) on mortality rate (Y) using simple linear regression model E(Y|x1)=β0+β1x1

  • Fitting a linear model for mortality with “contiguous to ocean” as the only predictor
mod_oc <- lm(mortality ~ ocean, data = mordat)
tbl_regression(mod_oc, intercept = T)
Characteristic Beta 95% CI1 p-value
(Intercept) 139 127, 150 <0.001
ocean


    0
    1 31 14, 49 <0.001
1 CI = Confidence Interval
  • “Contiguity to the ocean” has a significant effect on average mortality rate (p<.001)

  • Fitted model E(Y|x1)=138.741+31.487x1={138.741if x1=0170.227if x1=1

  • Inland states (ocean=0) are expected to have about 31 fewer melanoma deaths (per 10 million population) compared to other states

t-test and simple linear regression model

  • For binary predictors, inference based on independent two-sample t-test and simple linear regression mode are similar

  • The method of simple linear regression model for binary predictor can also be used for continuous predictor

Simple linear regression model with a continuous predictor

  • Model I: a model for mortality (Y) on latitude (x2) E(Y|x2)=β0+β2x2V(Y|x2)=σ2

Scatter plot of mortality vs latitude

mod_l1 <- lm(mortality ~ latitude, data = mordat)
tbl_regression(mod_l1, intercept = T)
Characteristic Beta 95% CI1 p-value
(Intercept) 389 341, 437 <0.001
latitude -6.0 -7.2, -4.8 <0.001
1 CI = Confidence Interval
  • Latitude has a significant effect on the melanoma mortality rate

  • For one-degree increase in latitude (i.e., moving toward the north), melanoma mortality rate decrease by 6 deaths per 10 million population


ANOVA table for Model I

tidy(aov(mod_l1)) |> kable(digits = 3)
term df sumsq meansq statistic p.value
latitude 1 36464.20 36464.200 99.797 0
Residuals 47 17173.06 365.384 NA NA

Fit of Model I

Another fit of the model “Mortality on latitude”

  • Model I: A model for mortality on latitude (x2) E(Y|x2)=β0+β2x2

  • Model II: A model for mortality on latitude (x) E(Y|x)=β0+βxx


mod_l2 <- lm(mortality ~ latitudef, 
             data = mordat %>% 
                 mutate(latitudef = factor(latitude)))
tbl_regression(mod_l2, intercept = T)
Characteristic Beta 95% CI1 p-value
(Intercept) 197 163, 231 <0.001
latitudef


    28
    31.2 -7.0 -55, 41 0.8
    31.5 32 -16, 80 0.2
    32.8 10 -38, 58 0.7
    33 20 -22, 61 0.3
    33.8 -19 -67, 29 0.4
    34.5 -37 -85, 11 0.12
    35 -42 -83, -0.35 0.048
    35.5 -6.5 -48, 35 0.7
    36 -11 -59, 37 0.6
    37.5 -23 -64, 18 0.3
    37.8 -50 -98, -2.5 0.040
    38.5 -49 -90, -7.4 0.024
    38.8 -61 -109, -13 0.015
    39 -21 -58, 16 0.2
    39.5 -55 -103, -7.5 0.026
    40 -73 -121, -25 0.005
    40.2 -58 -96, -19 0.006
    40.8 -65 -113, -17 0.010
    41.5 -75 -123, -27 0.004
    41.8 -49 -90, -7.9 0.022
    42.2 -62 -103, -20 0.006
    43 -54 -95, -13 0.013
    43.5 -80 -128, -32 0.002
    43.8 -68 -116, -20 0.008
    44 -53 -94, -11 0.015
    44.5 -84 -125, -43 <0.001
    44.8 -111 -159, -63 <0.001
    45.2 -80 -128, -32 0.002
    46 -81 -129, -33 0.002
    47 -88 -136, -40 0.001
    47.5 -81 -122, -40 <0.001
1 CI = Confidence Interval

ANOVA table for Model II

tidy(aov(mod_l2)) |> kable(digits = 4)
term df sumsq meansq statistic p.value
latitudef 31 49326.799 1591.1871 6.2755 1e-04
Residuals 17 4310.467 253.5569 NA NA

Comparison between Model I and Model II (LRT)

tidy(anova(mod_l1, mod_l2)) |> kable(digits = 3)
term df.residual rss df sumsq statistic p.value
mortality ~ latitude 47 17173.065 NA NA NA NA
mortality ~ latitudef 17 4310.467 30 12862.6 1.691 0.128

Fit of Model II

Comparison between the fits of Models I and II

Residuals of Models I and II

Which model (I or II) do you prefer for analyzing the data?


Estimate and corresponding confidence intervals obtained using Models I and II

Multiple linear regression model

  • A model for mortality rate (Y) on ocean (x1) and latitude (x2) E(Y|x1,x2)=β0+β1x1+β2x2V(Y|x1,x2)=σ2

mod_2 <- lm(mortality ~ ocean + latitude, 
            data = mordat)
tbl_regression(mod_2, intercept = T)
Characteristic Beta 95% CI1 p-value
(Intercept) 361 317, 404 <0.001
ocean


    0
    1 20 11, 30 <0.001
latitude -5.5 -6.5, -4.4 <0.001
1 CI = Confidence Interval
  • Both contiguity to ocean and latitude have significant effects on the mortality rate

  • On average, inland states have about 20 fewer melanoma deaths compared to other states provided latitude is fixed

  • For one-degree increase of latitude, the average mortality rate decrease by 5 deaths per 10 million population provided contiguity to the ocean is fixed


Comparison of the fits of the simple (blue line) and multiple linear regression models for mortality

0.2 Logistic Regression Model

Contingency table

  • Data obtained from different types of study designs (e.g., prospective, retrospective, and cross-sectional) can be expressed in such contingency table to examine the exposure-disease relationship
Disease
Exposure Yes No Total
Yes a b n_1
No c d n_2
Total m_1 m_2 n

Prospective study design

  • Let Y1 and Y2 be the number of exposed and non-exposed diseased subjects, which follow two independent binomial distributions Y1B(n1,p1)andY2B(n2,p2)

    • p1=Pr(D=1|E=1)

    • p2=Pr(D=1|E=0)

  • Estimate of model parameters p^1=(a/n1)andp^2=(c/n2)


  • Risk difference RD=p1p2RD^=p^1p^2se(RD^)=[(p^1q^1/n1)+(p^2q^2/n2)]1/2

  • Risk ratio RR=(p1/p2)RR^=(p^1/p^2)se(logRR^)=[q^1n1p^1+q^2n2p^2]1/2

  • Odds probability o(x)=p(x)1p(x)p(x)=o(x)1+o(x)

  • Odds ratio OR=p1/q1p2/q2OR^=p^1/q^1p^2/q^2=adbcse(logOR^)=[1a+1b+1c+1d]1/2

Case-control study design

  • Let Y1 and Y2 be the number of exposed subjects in case and control groups, respectively, following two independent binomial distributions Y1B(m1,p1)andY2B(m2,p2)
    • p1=Pr(E=1|D=1)

    • p2=Pr(E=1|D=0)

  • Estimate of parameters p^1=(a/m1)andp^2=(b/m2)

  • Odds ratio OR=p1/q1p2/q2OR^=p^1/q^1p^2/q^2=adbc

    • Disease-odds-ratio = Exposure-odds-ratio

    • For case-control study, the RR can be approximated by OR if the disease is rare

Cross-sectional study

  • Data of dichotomous disease-exposure study can be considered as an outcome of n tosses of a four-faced dye with the faces {ED},{ED¯},{E¯D},{E¯D¯}

    • Estimators of odds ratio OR^=adbc

    • Multinomial distribution is associated with the outcome of a cross-sectional study

Western Collaborative Group Study (WCGS)

  • A large epidemiological study designed to investigate the association between the Type-A behavior pattern and coronary heart disease (CHD)

  • Type-A behavior is composed of competitiveness, excessive drive, and an enhanced sense of time urgency

  • Download wcgs data


head(wcgs) |> select(1:10) |> kable()
age arcus behpat bmi chd69 chol dbp dibpat height id
50 1 A1 31.32101 0 249 90 Type A 67 2343
51 0 A1 25.32858 0 194 74 Type A 73 3656
59 1 A1 28.69388 0 258 94 Type A 70 3526
51 1 A1 22.14871 0 173 80 Type A 69 22057
44 0 A1 22.31303 0 214 80 Type A 71 12927
47 0 A1 27.11768 0 206 76 Type A 64 16029
  • Outcome: chd69
  • Exposure: dibpat

CHD
dibpat 1 0 total $\hat{p}$ $\hat{o}$
Type A 177 1411 1588 0.111 0.125
Type B 78 1486 1564 0.050 0.053
  • RD^=0.1110.05=0.061

  • Subjects with Type-A behavior have about 6.1 % higher risk of developing CHD compared with others


CHD
dibpat 1 0 total $\hat{p}$ $\hat{o}$
Type A 177 1411 1588 0.111 0.125
Type B 78 1486 1564 0.050 0.053
  • RR^=(0.111/0.05)=2.22

  • The risk of developing CHD for Type-A subjects is about 2.2 times the risk for Type-B subjects

  • In other words, the risk of developing CHD is about 122% higher for Type-A subjects compared to Type-B subjects


CHD
dibpat 1 0 total $\hat{p}$ $\hat{o}$
Type A 177 1411 1588 0.111 0.125
Type B 78 1486 1564 0.050 0.053
  • OR^=(0.125/0.053)=2.358

  • The odds of developing CHD for Type-A subjects is about 2.4 times that the risk for Type-B subjects

  • In other words, the odds of developing CHD is about 136% higher for Type-A subjects compared to Type-B subjects


RD^±se(RD^)=0.061±0.01logRR^±se(logRR^)=0.798±0.131logOR^±se(logOR^)=0.858±0.141

  • Obtain confidence intervals for RD, RR, and OR

Regression model for binary response

  • Let Y(x) be binary response obtained from a subject with predictor value x and assume Y(x)B(1,p(x)) where p(x)=Pr(Y(x)=1)=Pr(Y=1|x)=E(Y|x)

Regression models for p(x)

  • Assume Y(x)B(1,p(x)) (Model I)p(x)=β0+β1x
    • <p(x)<

    • Linear probability model


  • Assume Y(x)B(1,p(x)) (Model II)logp(x)=β0+β1xp(x)=exp(β0+β1x)

    • 0<p(x)<

    • Log-binomial model


  • Assume Y(x)B(1,p(x)) logitp(x)=logp(x)1p(x)=β0+β1x(Model III)p(x)=exp(β0+β1x)1+exp(β0+β1x)

    • 0<p(x)<1

    • Logistic regression model (related to the distribution function of logistic distribution)


  • Assume Y(x)B(1,p(x)) Φ1(p(x))=β0+β1x(Model IV)p(x)=Φ(β0+β1x)

    • 0<p(x)<1

    • Probit regression model


  • Assume Y(x)B(1,p(x)) log[log(1p(x))]=β0+β1x(Model V)p(x)=1exp[eβ0+β1x]

    • 0<p(x)<1

    • Regression model with complementary log-log link (related to the distribution function of extreme-value distribution)


Model I: p(x)=β0+β1x

  • Probability of developing CHD among subjects with Type-A (x=1) and Type-B behavior (x=0) p(1)=P(Y=1|x=1)=β0+β1p(0)=P(Y=1|x=0)=β0

    • Effect of behavior type on CHD RD=p(1)p(0)=β1

Data {(yi,xi),i=1,,n}

y’s are independent and assume YiB(1,p(xi))

  • Linear probability model p(xi)=β0+β1xi

Log-likelihood function (β0,β1)=logi=1nyip(xi)(1yi)1p(xi)=i=1n{(β0+β1xi)logyi+(1β0β1xi)log(1yi)}

  • MLEs (β^0,β^1)=argmaxΘ(β0,β1)

glm() funciton

R function glm() is used to fit a generalized linear model

glm(formula, data, family)
  • family specifies the distribution of response and link function, which can be either a string or a function, and commonly used family and link function in glm()

    • binomial("logit") binomial distribution and logit link function

    • binomial("log") binomial distribution and log link function

    • gaussian("identity") Gaussian distribution and identity link function

    • poisson("log") Poisson distribution and log link function

Model I: p(x)=β0+β1x

bmod_1 <- glm(chd69 ~ dibpat, family = binomial("identity"), 
              data = wcgs)
tbl_regression(bmod_1, intercept = T)
Characteristic Beta 95% CI1 p-value
(Intercept) 0.05 0.04, 0.06 <0.001
dibpat


    Type B
    Type A 0.06 0.04, 0.08 <0.001
1 CI = Confidence Interval
  • Since response chd69 is a binary variable and regression function is assumed to be identically linked with the parameter of the response distribution, a binomial("identity") is used as family

  • Effect of behavior type on CHD RD^=β^1=0.062

Model II: logp(x)=β0+β1x

  • Probability of developing CHD among subjects with Type-A (x=1) and Type-B (x=0) behavior p(1)=P(Y=1|x=1)=exp(β0+β1)p(0)=P(Y=1|x=0)=exp(β0)

  • Effect of Type-A behavior RR=p(1)p(0)=exp(β1)logRR=β1


Log-likelihood function (β0,β1)=logi=1nyip(xi)(1yi)1p(xi)=i=1n{exp(β0+β1xi)logyi+[1exp(β0+β1xi)]log(1yi)}

  • MLEs (β^0,β^1)=argmaxΘ(β0,β1)

bmod_2 <- glm(chd69 ~ dibpat, family = binomial("log"), 
              data = wcgs)
tbl_regression(bmod_2, intercept = T)
Characteristic log(RR)1 95% CI1 p-value
(Intercept) -3.0 -3.2, -2.8 <0.001
dibpat


    Type B
    Type A 0.80 0.55, 1.1 <0.001
1 RR = Relative Risk, CI = Confidence Interval
  • Since response chd69 is a binary variable and regression function is assumed to be linked with the log-transformed parameter of the response distribution, a binomial("log") is used as family

  • Effect of behavior type on CHD logRR^=β^1=0.804RR^=exp(β^1)=2.235

Model III: logitp(x)=β0+β1x

  • Probability of developing CHD among subjects with Type-A (x=1) and Type-B (x=0) behavior p(1)=P(Y=1|x=1)=exp(β0+β1)1+exp(β0+β1)p(0)=P(Y=1|x=0)=exp(β0)1+exp(β0)

  • Effect of behavior type OR=p(1)/[1p(1)]p(0)/[1p(0)]=exp(β0+β1)exp(β0)=exp(β1)logOR=β1

Log-likelihood function (β0,β1)=logi=1nyip(xi)(1yi)1p(xi)=i=1n{exp(β0+β1xi)1+exp(β0+β1xi)logyi+log(1yi)1+exp(β0+β1xi)}

  • MLEs (β^0,β^1)=argmaxΘ(β0,β1)

bmod_3 <- glm(chd69 ~ dibpat, family = binomial("logit"), 
              data = wcgs)
tbl_regression(bmod_3, intercept = T)
Characteristic log(OR)1 95% CI1 p-value
(Intercept) -2.9 -3.2, -2.7 <0.001
dibpat


    Type B
    Type A 0.87 0.60, 1.2 <0.001
1 OR = Odds Ratio, CI = Confidence Interval
  • Since response chd69 is a binary variable and the regression function is assumed to be linked with a logit-transformed parameter of the response distribution, a binomial("logit") is used as family

  • Effect of behavior type on CHD logOR^=β^1=0.871OR^=exp(β^1)=2.39

Binary response and continuous predictor

  • Effect of age (x2) on CHD (Model I) p(x)=β0+β1x2
mod_1a <- glm(chd69 ~ age, family = binomial("identity"), 
              data = wcgs)
tbl_regression(mod_1a, intercept = T)
Characteristic Beta 95% CI1 p-value
(Intercept) -0.17 -0.25, -0.10 <0.001
age 0.01 0.00, 0.01 <0.001
1 CI = Confidence Interval
  • Age has a significant effect on CHD, and for an increase of 10 years, risk of developing CHD increase by 0.055

  • What is the risk of developing CHD for a subject of age 45?
predict.glm(mod_1a, newdata = list(age = 45), 
            type = "response")
         1 
0.07393095 

  • Predicting linear predictor for a subject of age 45
predict.glm(mod_1a, newdata = list(age = 45), 
            type = "link")
         1 
0.07393095 

Binary response and continuous predictor

  • Effect of age (x2) on CHD (Model II) logp(x)=β0+β1x2
mod_2a <- glm(chd69 ~ age, family = binomial("log"), 
              data = wcgs)
tbl_regression(mod_2a, intercept = T)
Characteristic log(RR)1 95% CI1 p-value
(Intercept) -5.7 -6.7, -4.8 <0.001
age 0.07 0.05, 0.09 <0.001
1 RR = Relative Risk, CI = Confidence Interval
  • Age has a significant effect on CHD, and the risk of developing CHD for a subject John is 1.07 (=e0.068) times that of a subject who is one year younger than John

  • For one year increase of age, the risk of developing CHD is increased by 7%


What is the risk of developing CHD for a subject of age 45?

predict.glm(mod_2a, newdata = list(age = 45), 
            type = "response")
         1 
0.06891375 

Predicting linear predictor for a subject of age 45

predict.glm(mod_2a, newdata = list(age = 45), 
            type = "link")
      1 
-2.6749 

Binary response and continuous predictor

Effect of age (x2) on CHD (Model III) logitp(x)=β0+β1x2

mod_3a <- glm(chd69 ~ age, family = binomial("logit"), 
              data = wcgs)
tbl_regression(mod_3a, intercept = T)
Characteristic log(OR)1 95% CI1 p-value
(Intercept) -6.0 -7.1, -4.9 <0.001
age 0.07 0.05, 0.10 <0.001
1 OR = Odds Ratio, CI = Confidence Interval
  • Age has a significant effect on CHD, and the odds of developing CHD for a subject John is 1.078 (=e0.075) times that of a subject who is one year younger than John

  • For one year increase of age, the odds of developing CHD is increased by 7.8%


  • What is the risk of developing CHD for a subject of age 45?
predict.glm(mod_3a, newdata = list(age = 45), 
            type = "response")
         1 
0.06913919 
  • Predicting linear predictor for a subject of age 45
predict.glm(mod_3a, newdata = list(age = 45), 
            type = "link")
        1 
-2.599988 

Comparisons among the fits of the regression models I, II, and III with age as the predictor

Multiple logistic regression models

  • Model for binary response CHD (Y) with predictors behavior type (x1) and (x2) p(x)=β0+β1x1+β2x2
mod_1b <- glm(chd69 ~ dibpat + age, family = binomial("identity"),
              data = wcgs)
tbl_regression(mod_1b, intercept = T)
Characteristic Beta 95% CI1 p-value
(Intercept) -0.15 -0.22, -0.08 <0.001
dibpat


    Type B
    Type A 0.05 0.03, 0.07 <0.001
age 0.00 0.00, 0.01 <0.001
1 CI = Confidence Interval
  • The effect of behavior type on CHD is constant over age, and the effect of age is constant over two levels of behavior types

  • Estimated risks for the subjects John (age 40, Type-A), Allan (age 40, Type-A), Tom (age 50, Type-B), and Steve (age 40, Type-B)
# A tibble: 4 × 3
  subject   age dibpat
* <chr>   <dbl> <chr> 
1 John       40 Type A
2 Allan      40 Type B
3 Tom        50 Type A
4 Steve      50 Type B

predict.glm(mod_1b, newdata = ndat1b,
            type = "response")
      John      Allan        Tom      Steve 
0.07782783 0.02795346 0.12257767 0.07270330 
  • Estimate of the effect of behavior type in terms of RD, RR, and OR from the pairs John-Allan and Tom-Steve

Estimate of RD, OR, and RR for assessing the effects of behavior type on CHD at different values of age using linear probability model

  • Model for binary response CHD (Y) with predictors behavior type (x1) and (x2) logp(x)=β0+β1x1+β2x2
mod_2b <- glm(chd69 ~ dibpat + age, family = binomial("log"),
              data = wcgs)
tbl_regression(mod_2b, intercept = T)
Characteristic log(RR)1 95% CI1 p-value
(Intercept) -5.9 -6.8, -4.9 <0.001
dibpat


    Type B
    Type A 0.74 0.48, 1.0 <0.001
age 0.06 0.04, 0.08 <0.001
1 RR = Relative Risk, CI = Confidence Interval
  • The Effect of behavior type on CHD is constant over age, and the effect of age is constant over two levels of behavior types

ndat1b
# A tibble: 4 × 3
  subject   age dibpat
* <chr>   <dbl> <chr> 
1 John       40 Type A
2 Allan      40 Type B
3 Tom        50 Type A
4 Steve      50 Type B
predict.glm(mod_2b, newdata = ndat1b,
            type = "response")
      John      Allan        Tom      Steve 
0.06893735 0.03298930 0.12762092 0.06107175 
  • Estimate of the effect of behavior type in terms of RD, RR, and OR from the pairs John-Allan and Tom-Steve

Estimate of RD, OR, and RR for assessing the effects of behavior type on CHD at different values of age using log-binomial model

  • Model for binary response CHD (Y) with predictors behavior type (x1) and (x2) logitp(x)=β0+β1x1+β2x2
mod_3b <- glm(chd69 ~ dibpat + age, family = binomial("logit"),
              data = wcgs)
tbl_regression(mod_3b, intercept = T)
Characteristic log(OR)1 95% CI1 p-value
(Intercept) -6.2 -7.3, -5.1 <0.001
dibpat


    Type B
    Type A 0.81 0.53, 1.1 <0.001
age 0.07 0.05, 0.09 <0.001
1 OR = Odds Ratio, CI = Confidence Interval
  • The effect of behavior type on CHD is constant over age, and the effect of age is constant over two levels of behavior types

ndat1b
# A tibble: 4 × 3
  subject   age dibpat
* <chr>   <dbl> <chr> 
1 John       40 Type A
2 Allan      40 Type B
3 Tom        50 Type A
4 Steve      50 Type B
predict.glm(mod_3b, newdata = ndat1b,
            type = "response")
      John      Allan        Tom      Steve 
0.06885560 0.03198807 0.12857668 0.06185677 
  • Estimate of the effect of behavior type in terms of RD, RR, and OR from the pairs John-Allan and Tom-Steve

Estimate of RD, OR, and RR for assessing the effects of behavior type on CHD at different values of age using logistic regression model

Interaction

  • Consider a model for binary response CHD with three predictors “behavior type” (x1), “age” (x2), and “arcus senilis” (x3)

    • age (x2) is a continuous predictor

    • behavior type (x1) and arcus senilis (x2) are binary predictor

x1={1‘Type-A‘0‘Type-B‘x3={1arcus senilis = Yes0arcus senilis = No


  • Logistic regression model without the interaction term logitp(x)=β0+β1x1+β2x2+β3x3

    • OR^=exp(β2) Effect of behavior type on the odds of CHD after adjusting for age and arcus senilis

mod_3c <- glm(chd69 ~  dibpat + age + arcus, 
              data = wcgs, family = binomial("logit"))
tbl_regression(mod_3c, intercept = T)
Characteristic log(OR)1 95% CI1 p-value
(Intercept) -6.0 -7.1, -5.0 <0.001
dibpat


    Type B
    Type A 0.80 0.52, 1.1 <0.001
age 0.06 0.04, 0.09 <0.001
arcus


    0
    1 0.32 0.04, 0.59 0.023
1 OR = Odds Ratio, CI = Confidence Interval
  • OR^=2.225=exp(0.8)

  • Odds of developing CHD for a subject with Type-A behavior is 2.2 times than that of a subject with Type-B behavior provided age and arcus senilis is constant



  • Logistic regression model with an interaction term between “behavior type” and “arcus senilis” logitp(x)=β0+β1x1+β2x2+β3x3+θx1x3={(β0+β3)+(β1+θ)x1+β2x2for x3=1β0+β1x1+β2x2for x3=0

    • If θ0 Association between behavior type and CHD is different in different groups of arcus senilis OR={exp(β1+θ)for x3=1exp(β1)for x3=0

mod_3d <- glm(chd69 ~  dibpat + age + arcus + arcus:dibpat, 
              data = wcgs, family = binomial("logit"))
tbl_regression(mod_3d, intercept = T)
Characteristic log(OR)1 95% CI1 p-value
(Intercept) -6.2 -7.3, -5.1 <0.001
dibpat


    Type B
    Type A 0.99 0.63, 1.4 <0.001
age 0.06 0.04, 0.09 <0.001
arcus


    0
    1 0.64 0.17, 1.1 0.007
dibpat * arcus


    Type A * 1 -0.48 -1.0, 0.09 0.10
1 OR = Odds Ratio, CI = Confidence Interval
  • The interaction of arcus senilis and behavior type is significant, so the effect of behavior type on CHD is different between subjects with and without arcus senilis

  • Association between behavior type and CHD among subjects with and without arcus senilis OR^={1.664for x3=12.693for x3=0