Chapter 6

(AST405) Lifetime data analysis

Author

Md Rasel Biswas

6 Parametric Regression Models

6.1 Log-location-scale (Accelerated Failure Time) Regression Models

Linear regression model

  • Distributional assumption for the response (Y|x)=Y(x)N(μ(x),σ2)
  • Regression model for the parameters μ(x)=β0+β1x1++βpxp=xβvar(Y|x)=σ2

  • Instead of the parameters, linear regression model can be defined in terms of other functions, such as survivor function (6.1)SY(y)=Pr(Y>y)=1Φ(yμ(x)σ)

Regression models for lifetimes

  • Similar to continuous and binary responses, regression analysis of lifetimes involves specifications for the distribution of a lifetime (T) given a vector of p-dimensional (say) covariate x (T|x)=T(x)

  • For parametric regression models for lifetimes T, parameters (e.g. scale and shape parameters) need to be defined as a function of measured covariates (linear predictors)

  • It requires selecting a link function (e.g. identity, log, logit, etc.) for relating model parameters with linear predictors

  • Similar to linear and logistic regression models, maximum likelihood method of estimation is used to estimate parameters of the model

Log-location-scale AFT model

  • For a lifetime that follows a distribution of the log-location-scale family of distributions, the survivor function of lifetime T for a given covariate vector x is defined as (6.2)S(t|x)=S0([t/α(x)]δ)

    • Scale parameter α(x) is defined as a function of covariate vector x

    • Shape parameter δ does not depend on x

    • Survivor function of the corresponding standardized distribution S0(x)=S0(logx) is defined earlier


  • For a log-lifetime that follows a distribution of the location-scale family of distribution, the survivor function of log-lifetime Y for a given covariate vector x is defined as (6.3)S(y|x)=S0(yu(x)b)

    • Location parameter u(x) is defined as a function covariate vector x

    • Scale parameter b does not depend on x

  • The model () for log-lifetime is similar to the linear regression model () with μ(x)=u(x),σ=b,andΦ(x)=1S0(x)


  • The model for lifetime () or log-lifetime () is known as accelerated failure time (AFT) model S(t|x)=S0([t/α(x)]δ)S(y|x)=S0(yu(x)b)

  • Models for the parameters α(x) and u(x) are defined so that associated parametric restrictions are satisfied, α(x)>0 and <u(x)<, e.g. u(x)=β0+β1x1++βpxp=xβα(x)=exp(xβ)


  • AFT model can also be expressed as (6.4)Yu(x)b=ZY=u(x)+bZ

    • ZS0(z), i.e. Z follows a standardized log-location-scale distribution, e.g. standard normal or extreme-value distributions with location 0 and scale 1, etc.
  • Linear regression model () can also be expressed as : Y=μ(x)+σZ,ZN(0,1)


  • In AFT model defined in terms of the distribution of lifetime T, covariates alter the time scale

    • If α(x)=exp(xβ)>1, the effect of covariate vector is to increase time (decelerate time)

    • If α(x)=exp(xβ)<1, the effect of covariate vector is to shorten time (accelerate time)


  • The accelerated failure time model is a general model for survival data, in which explanatory variables measured on an individual are assumed to act multiplicatively on the time-scale

  • Log-location-scale AFT models are a special case of AFT models where the log of survival time follows a location-scale distribution.

  • AFT models assume that covariates accelerate or decelerate the time to event.


  • The following example is described in Collett ()

  • Suppose patients are randomized to receive one of the two treatments A (standard) and B (new)

  • Under an accelerated failure time model, the survival time of an individual on the new treatment is taken to be a multiple of the survival time for an individual on the standard treatment.

  • Thus, the effect of the new treatment is to “speed up” or “slow down” the passage of time


  • For a specific time t S(t|trt=B)=S(tα|trt=A)

  • One interpretation of this model is that the lifetime of an individual on the new treatment (B) is α times the lifetime that the individual would have experienced under the standard treatment (A)

  • When the end-point of concern is the death of a patient

    • α>1 new treatment is promoting longevity
    • α<1 new treatment is worse (accelerating death)
  • The quantity α is therefore termed the acceleration factor


  • The acceleration factor can also be interpreted in terms of the median survival times of patients on the new and standard treatments, tA(50) and tB(50) SB{tB(50)}=SA{tA(50)}=0.50
  • Under AFT model SB{tB(50)}=SA{tB(50)/α}tB(50)=αtA(50)
  • Under the AFT model, the median survival time of a patient on the new treatment is α times that of a patient on the standard treatment

  • Under AFT model, the survivor functions with covariate vectors x1 and x2 can be compared as S(t|x1)=S(ct|x2)
    • If c>1, subjects with covariate x2 survives longer compared to subjects with covariate vector x1

    • If c<1 subjects with covariate x2 survives shorter compared to subjects with covariate vector x1


  • Under AFT model, S1(t)=S2(ct) for c>0, we can express the mean survival time μ2 of Population 2 can be expressed in terms of μ1, mean survival time of Population 1 as μ2=0S2(t)dt=c0S2(cu)du=c0S1(u)du=cμ1

  • In general, let φ is a population quantity such that S(φ)=θ for some θ(0,1) and S2(φ2)=θ=S1(φ1)=S2(cφ1)
  • Then φ2=cφ1, i.e., under the AFT model, the expected survival time, median survival time of population 2 all are c times as much as those of population 1

Comparison between two log-location density functions with covariate vectors x1 and x2, where u(x2)>u(x1)

Comparison between two log-location survival functions with covariate vectors x1 and x2, where u(x2)>u(x1)

Comparison between two log-location survival functions with the same scale parameters, but different location parameters

Comparison between two log-location survival functions with the same location parameters, but different scale parameters

Proportional hazards model

  • There are two approaches to regression modeling for lifetimes

    1. AFT model, where the effects of covariates are assessed by comparing corresponsing time scales

    2. Hazards model, where effects of covariates on the hazard function are studied


  • The most common hazards model is the proportional hazards model (), where hazard function for lifetime T given x is defined as h(t|x)=h0(t)r(x)

    • r(x) a positive-valued function of linear predictor, e.g. r(x)=exp(xβ), which does not include the intercept term

    • h0(t) a positive-valued function, which is known as baseline hazards function, i.e. h(t|x=0)=h0(t)

    • h0(t) could be either fully parametric or unspecified


If you take two individuals with covariates x1 and x2:

h(t|x1)h(t|x2)=h0(t)eβx1h0(t)eβx2=eβ(x1x2)

This ratio does not depend on time (t), this is exactly the proportional hazards property.


h(t|x)=h0(t)exβ

  • For a binary predictor x (1=male, 0=female), the hazard ratio can be defined as h(t|x=1)h(t|x=0)=h0(t)eβh0(t)=eβh(t|x=1)=h(t|x=0)eβ

  • β>0 Hazard of the event is higher for male compared to female


  • Under proportional hazards model, the cumulative hazard function is defined as H(t|x)=0th(u|x)du=r(x)0th0(u)du=r(x)H0(t)

  • Under proportional hazards model, the survivor function is defined as S(t|x)=eH(t|x)=er(x)H0(t)=[S0(t)]r(x)

    • S0(t) baseline survivor function and r(x)>0

    • Interpret the survival probabilities for the following cases (a)r(x)>1and(b)r(x)<1


Under proporitonal hazards model, comparison between baseline survivor function S0(t) and S1(t|x)=[S0(t)]0.5

Under proporitonal hazards model, comparison between baseline survivor function S0(t) and S1(t|x)=[S0(t)]1.5

Under proporitonal hazards model, comparison between hazard functions H(t|x1) and H(t|x2)=1.5H(t|x1)

Parametric proportional hazards model

  • Depending on whether the baseline hazard function h0(t) is fully parametric or not, a PH model h(t|x)=h0(t)r(x) could be either parametric or semi-parametric

    • PH model is parametric if h0(t)=h1(α,t) for some parameter vector α

    • PH model is semi-parametric if h0(t) is unspecified


  • Weibull model can be defined as both AFT and PH model

Weibull regression model

  • Weibull as an AFT model S(t|x)=exp([t/α(x)]δ) where (6.5)α(x)=exp(xβAFT)

  • Weibull as a PH model h(t|x)=δα(x)[tα(x)]δ1=(δtδ1)[α(x)]δ=h1(δ,t)r(x)

    • Assume (6.6)r(x)=exp(xβPH)=[α(x)]δexp(xβPH/δ)=α(x)

  • Equating the expression of α(x) from the AFT () and PH () Weibull model, we can show exp(xβPH/δ)=α(x)=exp(xβAFT)βPH=δβAFT=1bβAFT

AFT and PH model

  • Survivor function for some constants c>0 and r(x)>0 S(t|x2)=S(ct|x1)S(t|x2)=[S(t|x1)]r(x1)/r(x2)H(t|x2)=[r(x2)/r(x1)]H(t|x1)

6.2 Inference for Log-location-scale AFT Models

Likelihood methods

  • Data {(yi,δi,xi),i=1,,n}

    • Log-lifetime or log-censoring yi=logti

    • Censoring indicator δi=I(ith observation is a failure)

    • xi=(1,xi1,,xip) is a vecor of covariates


  • Assume Yi follows a location-scale distribution with location parameter u(xi;β) and scale parameter b

  • Regression model u(xi;β)=β0+β1xi1++βpxip=xiβ

    • Vector of regression parameters β=(β0,β1,,βp)

    • Covariate vector xi contains both categorical and quantitative variables, and for accurate computation, quantitative variables are centered


The log-likelihood function (6.7)(β,b)=rlogbi=1n[δilogf0(zi)+(1δi)logS0(zi)]

  • r=i=1nδi

  • zi=yiu(xi;β)b

  • u(xi;β)=xiβ


Score functions

Elements of (p+2)-dimensional vector of score function Uj(β,b)=(β,b)βj,j=0,1,,pUb(β,b)=(β,b)b

  • Homework: Obtain the expressions of score function (Eq. 6.3.3 and 6.3.4 of textbook)

Information matrix

Elements of observed information matrix I(β,b)=[2ββ2βb2bβ2b2]

  • Homework: Obtain the expressions of information matrix (Eq. 6.3.5, 6.3.6 and 6.3.7 of textbook)

MLEs

(β^,b^)=argmax(β,b)Θ(β,b)

  • Iterative procedures (e.g. Newton-Raphson method) is used obtain MLE for β and b

  • MLEs (β^,b^) follow a (p+2)-variate normal distribution with mean (β,b) and variance matrix V^=[I(β^,b^)]1

  • Large sample based tests and confidence intervals can be obtained using the sampling distribution of (β^,b^)

Test of hypothesis

Let β=(β1,β2), where β1 is a k-dimensional vector of regression parameters, where k<p H0:β1=β10

  • Likelihood ratio test statistic Λ1=2(β^1,β^2,b^)2(β10,β~2,b~)

    • (β~,b~)=argmax(β2,b)Θ(β10,β2,b)

    • Under H0, Λ1χ(k)2


Let β=(β1,β2), where β1 is a k-dimensional vector of regression parameters, where k<p H0:β1=β10

  • Wald statistic Λ2=(β1β10)V111(β1β10)

    • V11=var(β^1) is a k×k matrix and V^=[I(β^,b^)]1=[V11V12V12V22]

      • Under H0, Λ1χ(k)2

Null hypothesis H0:βj=0

  • Test statistic Zj=β^jse(β^j)

  • 100(1α)% confidence interval for βj β^j±z1α/2se(β^j)

  • For a small sample, LRT statistic can be used to test the hypothesis and to obtain confidence interval

Quantiles

  • The pth quantile of Y given x yp(x)=xβ+bwp

  • Estimate and corresponding SEs of pth quantile
    y^p(x)=xβ^+b^wpand using delta method,se(y^p(x))=aVa

    • a=(x,wp) and wp=S01(1p)
  • 100(1α)% confidence interval for yp y^p±z1α/2se(y^p(x))

Survival probability

  • We are interest to obtain confidence interval for S(y0), which can be expressed in terms of the parameters of location-scale distribution as S(y0)=S0(y0xβb)S01(S(y0))=y0xβb=ψ(x)

  • Estimate and the corresponding SE of ψ(x) ψ^(x)=y0xβ^b^and using delta method,se(ψ^(x))=[aVa]1/2

    • a=(1/b^)(x,ψ^(x))
  • (1α)100% confidence interval for ψ(x) ψ^(x)±z1α/2se(ψ^(x))


  • Wald-type (1α)100% confidence interval for S(y0) ψ^(x)z1α/2se(ψ^(x))<ψ(x)<ψ^(x)+z1α/2se(ψ^(x))L<ψ(x)<UL<S01(S(y0))<US0(L)<S(y0)<S0(U)

6.3 Weibull AFT

  • Distributional assumption T(x)=(T|x)Weib(α(x),δ)Y(x)=(Y|x)=(logT|x)EV(u(x),b)

  • Regression model for the parameters u(x)=β0+β1x1++βpxp=xβα(x)=exp(xβ)

    • x=(1,x1,,xp)

    • β=(β0,β1,,βp)


  • Regression model for the response Y(x)=xβ+bZ

    • ZEV(0,1)

    • f0(z)=exp(zez)

    • S0(z)=exp(ez)


  • Log-likelihood function (β,b)=rlogb+i=1n[δilogf0(zi)+(1δi)logS0(zi)]=rlogb+i=1n(δiziezi)
    • zi=(yixiβ)/b
  • We can now obtain score functions, information matrix, and MLE’s for β and b (according to .)

  • We’ve already seen that the Weibull model implies a proportional hazard model

  • It is the only parametric model that is both an AFT model and a Proportional Hazards (PH) model at the same time

Leukemia survival times

  • Data on survival times for 33 leukemia patients are available, where survival times are in weeks from diagnosis

  • Data on two covariates are also available

    • White blood cell count (WBC) at diagnosis

    • Binary variable AG indicates a positive (AG=1) or negative (AG=0) test related to white blood cell characteristics



tab6_1
# A tibble: 33 × 5
    time   wbc    AG status   lwbc
   <dbl> <dbl> <int>  <dbl>  <dbl>
 1    65  2.3      1      1  0.833
 2   140  0.75     1      0 -0.288
 3   100  4.3      1      1  1.46 
 4   134  2.6      1      1  0.956
 5    16  6        1      1  1.79 
 6   106 10.5      1      0  2.35 
 7   121 10        1      1  2.30 
 8     4 17        1      1  2.83 
 9    39  5.4      1      1  1.69 
10   121  7        1      0  1.95 
# ℹ 23 more rows

  • Consider Weibull AFT model with covariates x1=AG and x2=log(wbc) (6.8)Y=β0+β1x1+β2x2+bZ

    • ZEV(0,1)

Fit Weibull regression model using R

mod62 <- survreg(Surv(time, status) ~ AG + lwbc, 
                 data = tab6_1, dist = "weibull")

mod62E <- survreg(Surv(log(time), status) ~ AG + lwbc, 
                 data = tab6_1, dist = "extreme")

MLEs of model parameters

tidy(mod62, conf.int = T) |> 
  mutate(p.value = scales::pvalue(p.value))
# A tibble: 4 × 7
  term        estimate std.error statistic p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl> <chr>      <dbl>     <dbl>
1 (Intercept)    3.84      0.534     7.19  <0.001     2.79     4.89  
2 AG             1.18      0.427     2.76  0.006      0.340    2.01  
3 lwbc          -0.366     0.150    -2.45  0.014     -0.660   -0.0731
4 Log(scale)     0.112     0.147     0.765 0.444     NA       NA     

Fitted model with x1=AG and x2=log(wbc)

Y^=3.841+1.177x10.366x2+exp(1.119)Z


  • Variance matrix of the estimated parameters
vcov(mod62) %>% round(3)
            (Intercept)     AG   lwbc Log(scale)
(Intercept)       0.286 -0.130 -0.067      0.003
AG               -0.130  0.182  0.016      0.005
lwbc             -0.067  0.016  0.022     -0.005
Log(scale)        0.003  0.005 -0.005      0.021

Summary of Weibull AFT model fit
term estimate std.error statistic p.value conf.low conf.high
β0 3.841 0.534 7.188 <0.001 2.794 4.889
β1 1.177 0.427 2.757 0.006 0.340 2.014
β2 -0.366 0.150 -2.449 0.014 -0.660 -0.073
logb 0.112 0.147 0.765 0.444 NA NA
  • AG and WBC have significant effects on leukemia survival times. Positive AG and low WBC count are associated with more prolonged survival

  • Since logb is not significant, i.e. there is not enough evidence to reject H0:logb=1, exponential AFT model would be appropriate for analyzing this data


Interpretations

exp(β^1)=exp(1.177)=3.246

  • A specific quantile (say median) lifetime of a patient with a positive AG value (i.e. x1=1) is 3.2 times that of a patient with a negative AG (i.e. x1=0) value provided WBC value remains constant

  • Note this interpretation is true for any quantile (Why?)


exp(β^2)=exp(0.366)=0.693

  • A specific quantile (say median) lifetime of a patient decreases 30.7 percent with one unit increase of log(WBC) [or 2718 unit increase of true WBC count] provided AG value remains constant

Fitted values

augment(mod62, type.predict = "response") |> 
  select(1:4) |> 
  slice(1:3)
# A tibble: 3 × 4
  `Surv(time, status)`    AG   lwbc .fitted
                <Surv> <int>  <dbl>   <dbl>
1                  65      1  0.833   111. 
2                 140+     1 -0.288   168. 
3                 100      1  1.46     88.6

augment(mod62, type.predict = "link") |> 
  mutate(.fittedE = exp(.fitted)) |> 
  select(2:4, .fittedE) |> 
  slice(1:3)
# A tibble: 3 × 4
     AG   lwbc .fitted .fittedE
  <int>  <dbl>   <dbl>    <dbl>
1     1  0.833    4.71    111. 
2     1 -0.288    5.12    168. 
3     1  1.46     4.48     88.6
  • Estimate for a subject with AG=1 and log(wbc)=.833

u^=β^0+β^1(1)+β^2(.833)=(3.841)+(1.177)(1)+(0.366)(.833)=4.713


#predict(object = mod62, newdata = tibble(AG = 1, lwbc = .833), 
#        predict = "response")
augment(x = mod62, newdata = tibble(AG = 1, lwbc = .833), 
        type.predict = "response")
# A tibble: 1 × 4
     AG  lwbc .fitted .se.fit
  <dbl> <dbl>   <dbl>   <dbl>
1     1 0.833    111.    41.3
  • Estimate for a subject with AG=1 and log(wbc)=.833

α^=exp(β^0+β^1(1)+β^2(.833))=exp((3.841)+(1.177)(1)+(0.366)(.833))=exp(4.713)=111.399


LRT

  • Likelihood ratio tests for H0:β1=0 Λ1(0)=2(β^0,β^1,β^2,logb^)2(β~0,0,β~2,logb~)

    • Λ1(0)χ(1)2
  • The corresponding Z statistic Z=sign(β^1)Λ11/2N(0,1)


Estimate of model parameters under H0:β1=0

# mod62a <- update(mod62, formula = . ~ . - AG)
mod62a <- survreg(Surv(time, status) ~  lwbc, 
                 data = tab6_1, dist = "weibull")
tidy(mod62a)
# A tibble: 3 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    4.85      0.500      9.71 2.67e-22
2 lwbc          -0.500     0.165     -3.03 2.41e- 3
3 Log(scale)     0.222     0.146      1.52 1.28e- 1

LRTa <- anova(mod62a, mod62)
Terms Resid. Df -2*LL Df Deviance Pr(>Chi)
lwbc 30 271.931 NA NA NA
AG + lwbc 29 265.013 1 6.918 0.009
  • Λ1(0)=6.918Z=2.63

Comparison between Wald- and LRT-type Z statistics
term estimate Wald LRT
β0 3.841 7.188 NA
β1 1.177 2.757 2.63
β2 -0.366 -2.449 -2.46
logb 0.112 0.765 NA

Quantiles

y^p=xβ^+log(log(1p))b^

  • Consider a subject with covariate values x1=1 and x2=log(10), the linear predictor xβ^ u^=xxβ^=β^0+β^1+log(10)β^2=4.175

  • Median survival time of the patient with covariate values x1=1 and x2=log(10) y^.50=4.175+(0.367)(1.119)=3.765t^.50=exp(3.765)=43.163weeks

  • Homework: Obtain a 95% confidence interval of the median survival time of a patient with covariate values x1=1 and x2=log(10)


Survival probability

S(y0)=exp{exp[(y0xβ^)/b^]} For a patient with covariate values x1=1 and x2=log(10), obtain S(log10) S(log10)=exp{exp[(log104.175)/1.119]}=0.816

  • Homework: Obtain the 95% CI for S(log10)

6.4 Log-normal AFT

  • Distributional assumption T(x)=(T|x)log-Norm(μ(x),σ2)Y(x)=(Y|x)=(logT|x)N(μ(x),σ2)

  • Regression model for the parameters μ(x)=β0+β1x1++βpxp=xβ

  • Regression model for the response Y(x)=xβ+σZ

    • ZN(0,1)

    • f0(z)=ϕ(z)

    • S0(z)=1Φ(z)

Times to pulmonary exacerbation

  • Patients with cystic fibrosis are susceptible to an accumulation of mucus in the lungs, which leads to pulmonary exacerbation and deterioration of lung function

  • A clinical trial was conducted to investigate the efficacy of the new drug DNase-1

    • Subjects are randomly assigned to a new treatment or a placebo
  • Time of interest is the time to first exacerbation after randomization, and data on fev (forced expiratory volume at the time of randomization) are also measured



# A tibble: 761 × 13
      id   trt  time   fev  inst entry.dt   end.dt     ivstart ivstop time0
   <int> <int> <dbl> <dbl> <int> <date>     <date>       <dbl>  <dbl> <dbl>
 1     1     1   168  28.8     1 1992-03-20 1992-09-04      NA     NA   168
 2     2     1   169  64       1 1992-03-24 1992-09-09      NA     NA   169
 3     3     0    65  67.2     1 1992-03-24 1992-09-08      65     75   168
 4     4     1   168  57.6     1 1992-03-26 1992-09-10      NA     NA   168
 5     5     0   171  57.6     1 1992-03-24 1992-09-11      NA     NA   171
 6     6     1   166  25.6     1 1992-03-27 1992-09-09      NA     NA   166
 7     7     0   168  86.4     1 1992-03-27 1992-09-11      NA     NA   168
 8     8     0    90  32       1 1992-03-28 1992-09-10      90    104   166
 9     9     1   169  86.4     2 1992-02-27 1992-08-14      NA     NA   169
10    10     0     8  28.8     2 1992-03-06 1992-08-22       8     22   169
# ℹ 751 more rows
# ℹ 3 more variables: status <dbl>, fevm <dbl>, visit <int>

  • Assume survival time T(x) follows a log-normal distribution with scale parameter α(x) and shape parameter δ

  • Consider following AFT model for log survival time Y(x)=β0+β1x1+β2x2+σZ

    • ZN(0,1)

    • x1=I(trt=1)

    • x2=fevmean(fev)


  • R codes for fitting the AFT model
mod63a <- survreg(Surv(log(time), status) ~ trt + fevm, 
                  dist = "gaussian",
                  data = tab1_4) 

tidy(mod63a)
# A tibble: 4 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   5.09     0.0684      74.4  0       
2 trt           0.336    0.0951       3.53 4.19e- 4
3 fevm          0.0159   0.00197      8.09 5.91e-16
4 Log(scale)    0.137    0.0408       3.36 7.84e- 4

  • AFT model Y=xβ+bZT=exp(xβ)exp(bZ)

    • xβ=β0+β1x1++βpxp
  • For a binary predictor xj T=exp(xβ)exp(bZ)={exp(bZ)for controlexp(βj)exp(bZ)for treatment

  • It can be shown that Ttrt=exp(β)Tcontrol


  • βtrt=0.336exp(βtrt)=1.399

    • Treatment increases the time to first pulmonary exacerbation by about 40% compared to the control when fev is fixed
  • βfev=0.016exp(βfev)=1.016

    • One-unit increase in fev results about 2% increase in lifetime provided treatment is constant

6.5 Log-logistic AFT

  • Distributional assumptions T(x)=(T|x)log-logistic(α(x),β)Y(x)=(Y|x)=(logT|x)logistic(u(x),b)

  • Regression model for the parameters u(x)=β0+β1x1++βpxp=xβα(x)=eu(x)

  • Regression model for the response Y(x)=xβ+bZ

    • ZLogistic(0,1)

    • f0(z)=ez[1+ez]2

    • S0(z)=[1+ez]1


  • Lifetime distribution T(xx)Log-Logistic(α(xx),δ)

  • The survivor function S(t|xx)=11+(t/α(xx))δ1S(t|xx)S(t|xx)=(t/α(xx))δ

    • (t/α(xx))δ the odds of failure at time t for a subject with covariate vector xx

  • For two subjects with covariate vectors xx1 and xx2 [1S(t|xx2)]/S(t|xx2)[1S(t|xx1)]/S(t|xx1)=[α(xx1)α(xx2)]δ,independent of t

  • A model of the form 1S(t|xx)S(t|xx)=(t/α(xx))δlog1S(t|xx)S(t|xx)=δlog(t)δlogα(xx) is known as the proportional odds model


  • Consider a model logα(x)=β0+β1x [1S(t|x=1)]/S(t|x=1)[1S(t|x=0)]/S(t|x=0)=eδβ1=eβ1

    • The odds of failure at time t for a subject with x=1 is exp(β) times that of the odds of failure for a subject with x=0

Times to pulmonary exacerbation

  • R codes for fitting AFT model
mod63b <- survreg(Surv(log(time), status) ~ trt + fevm, 
                  dist = "logistic",
                  data = tab1_4) 

tidy(mod63b)
# A tibble: 4 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   5.08     0.0600      84.6  0       
2 trt           0.293    0.0861       3.41 6.55e- 4
3 fevm          0.0145   0.00181      8.00 1.20e-15
4 Log(scale)   -0.489    0.0466     -10.5  8.08e-26

  • βtrt=0.293exp(βtrt)=1.341

    • Treatment increases the time to first pulmonary exacerbation by about 34% compared to the control when fev is fixed
  • βfev=0.014exp(βfev)=1.015

    • One-unit increase in fev results in a 1.5% increase in lifetime provided treatment is constant

  • Interpret the treatment effect in terms of odds of failure [1S(t|trt=1,fev=x)]/S(t|trt=1,fev=x)[1S(t|trt=0,fev=x)]/S(t|trt=0,fev=x)=exp(δ^β^1)=0.62

    • δ^=exp(logb^)=exp(0.489)=1.631

    • The odds of failure is 38% lower in the treatment group compared to the control group provided fev value is fixed



Comparison between normal and logistic regression models in analysing time to pulmonary exacerbation data
term est se est se
(Intercept) 5.093 0.068 5.078 0.060
trt 0.336 0.095 0.293 0.086
fevm 0.016 0.002 0.014 0.002
Log(scale) 0.137 0.041 -0.489 0.047

Other regression models

  • Additive hazards model h(t|xx)=h0(t;α)+r(xx;ββ)

6.6 Graphical methods and model assessment

  • Graphical methods are helpful in summarizing information and suggesting possible models

  • These methods also provide ways to check assumptions concerning the form of a lifetime distribution and its relationship to covariates

  • Exploratory analysis of a lifetime distribution given covariates would helpful to select the appropiate Model for the analysis


  • For a single quantitative covariate, a plot of lifetime or log-lifetime against the covariate or a function of it could indicate the nature of the relationship between lifetime and the covariate

  • If the proportion of censoring is small, such a plot would be helpful, different symbols can be used in those plots for censored and failure times

  • When more than one quantitative covariate and light censoring, one can consider grouping individuals so that within a group, individuals will have similar values of important covariates

  • Let there are J such groups and S^j is the Kaplan-Meier estimate for the group j=1,,J


AFT model S(t|x)=S0[logtu(x)b]

  • If u(x) is approximately constant for individuals within each group j=1,,J, and if an AFT model is appropriate, the plots of log[logS(t|x)] vslogt should be roughly parallel in horizontal direction (logt)

Proportional hazards model S(t|x)=[S0(t)]r(x)

  • If r(x) is approximately constant for individuals within each group j=1,,J, and if a proportional hazards model is appropriate, the plots of log[logS(t|x)]vslogt should be roughly parallel in vertical direction

  • If the plots of log[logS(t|x)] vs logt is roughly linear then Weibull models are suggested

  • In addition to linear, if the plots are parallel, then Weibull models with a constant shape parameter are suggested, in that case, both AFT and PH models can be considered

  • Statistical analysis of data is an iterative process involving exploration, model fitting, and model assessment

References

Collett, David. 2015. Modelling Survival Data in Medical Research. Chapman; Hall/CRC. https://doi.org/10.1201/b18041.
Cox, David R. 1972. “Regression Models and Life-Tables.” Journal of the Royal Statistical Society: Series B (Methodological) 34 (2): 187–202.