Chapter 6
(AST405) Lifetime data analysis
6 Parametric Regression Models
6.1 Log-location-scale (Accelerated Failure Time) Regression Models
Linear regression model
- Distributional assumption for the response
- Regression model for the parameters
- Instead of the parameters, linear regression model can be defined in terms of other functions, such as survivor function
Regression models for lifetimes
- Similar to continuous and binary responses, regression analysis of lifetimes involves specifications for the distribution of a lifetime
given a vector of -dimensional (say) covariate
For parametric regression models for lifetimes
, parameters (e.g. scale and shape parameters) need to be defined as a function of measured covariates (linear predictors)It requires selecting a link function (e.g. identity, log, logit, etc.) for relating model parameters with linear predictors
Similar to linear and logistic regression models, maximum likelihood method of estimation is used to estimate parameters of the model
Log-location-scale AFT model
-
For a lifetime that follows a distribution of the log-location-scale family of distributions, the survivor function of lifetime
for a given covariate vector is defined asScale parameter
is defined as a function of covariate vectorShape parameter
does not depend onSurvivor function of the corresponding standardized distribution
is defined earlier
-
For a log-lifetime that follows a distribution of the location-scale family of distribution, the survivor function of log-lifetime
for a given covariate vector is defined asLocation parameter
is defined as a function covariate vectorScale parameter
does not depend on
The model (Equation 6.3) for log-lifetime is similar to the linear regression model (Equation 6.1) with
The model for lifetime (Equation 6.2) or log-lifetime (Equation 6.3) is known as accelerated failure time (AFT) model
Models for the parameters
and are defined so that associated parametric restrictions are satisfied, and , e.g.
-
AFT model can also be expressed as
-
, i.e. follows a standardized log-location-scale distribution, e.g. standard normal or extreme-value distributions with location 0 and scale 1, etc.
-
Linear regression model (Equation 6.1) can also be expressed as Equation 6.4:
-
In AFT model defined in terms of the distribution of lifetime
, covariates alter the time scaleIf
, the effect of covariate vector is to increase time (decelerate time)If
, the effect of covariate vector is to shorten time (accelerate time)
The accelerated failure time model is a general model for survival data, in which explanatory variables measured on an individual are assumed to act multiplicatively on the time-scale
Log-location-scale AFT models are a special case of AFT models where the log of survival time follows a location-scale distribution.
AFT models assume that covariates accelerate or decelerate the time to event.
The following example is described in Collett (2015)
Suppose patients are randomized to receive one of the two treatments
(standard) and (new)Under an accelerated failure time model, the survival time of an individual on the new treatment is taken to be a multiple of the survival time for an individual on the standard treatment.
Thus, the effect of the new treatment is to “speed up” or “slow down” the passage of time
For a specific time
One interpretation of this model is that the lifetime of an individual on the new treatment (
) is times the lifetime that the individual would have experienced under the standard treatment-
When the end-point of concern is the death of a patient
-
new treatment is promoting longevity -
new treatment is worse (accelerating death)
-
The quantity
is therefore termed the acceleration factor
- The acceleration factor can also be interpreted in terms of the median survival times of patients on the new and standard treatments,
and - Under AFT model
- Under the AFT model, the median survival time of a patient on the new treatment is
times that of a patient on the standard treatment
- Under AFT model, the survivor functions with covariate vectors
and can be compared asIf
, subjects with covariate survives longer compared to subjects with covariate vectorIf
subjects with covariate survives shorter compared to subjects with covariate vector
- Under AFT model,
for , we can express the mean survival time of Population 2 can be expressed in terms of , mean survival time of Population 1 as
- In general, let
is a population quantity such that for some and - Then
, i.e., under the AFT model, the expected survival time, median survival time of population 2 all are times as much as those of population 1
Proportional hazards model
-
There are two approaches to regression modeling for lifetimes
AFT model, where the effects of covariates are assessed by comparing corresponsing time scales
Hazards model, where effects of covariates on the hazard function are studied
-
The most common hazards model is the proportional hazards model (Cox 1972), where hazard function for lifetime
given is defined as a positive-valued function of linear predictor, e.g. , which does not include the intercept term a positive-valued function, which is known as baseline hazards function, i.e. could be either fully parametric or unspecified
If you take two individuals with covariates
This ratio does not depend on time (t), this is exactly the proportional hazards property.
For a binary predictor
(1=male, 0=female), the hazard ratio can be defined as Hazard of the event is higher for male compared to female
- Under proportional hazards model, the cumulative hazard function is defined as
-
Under proportional hazards model, the survivor function is defined as
baseline survivor function andInterpret the survival probabilities for the following cases
Parametric proportional hazards model
-
Depending on whether the baseline hazard function
is fully parametric or not, a PH model could be either parametric or semi-parametricPH model is parametric if
for some parameter vectorPH model is semi-parametric if
is unspecified
- Weibull model can be defined as both AFT and PH model
Weibull regression model
- Weibull as an AFT model
where
-
Weibull as a PH model
- Assume
- Assume
- Equating the expression of
from the AFT (Equation 6.5) and PH (Equation 6.6) Weibull model, we can show
AFT and PH model
- Survivor function for some constants
and
6.2 Inference for Log-location-scale AFT Models
Likelihood methods
-
Data
Log-lifetime or log-censoring
Censoring indicator
is a vecor of covariates
Assume
follows a location-scale distribution with location parameter and scale parameter-
Regression model
Vector of regression parameters
Covariate vector
contains both categorical and quantitative variables, and for accurate computation, quantitative variables are centered
The log-likelihood function
Score functions
Elements of
- Homework: Obtain the expressions of score function (Eq. 6.3.3 and 6.3.4 of textbook)
Information matrix
Elements of observed information matrix
- Homework: Obtain the expressions of information matrix (Eq. 6.3.5, 6.3.6 and 6.3.7 of textbook)
MLEs
Iterative procedures (e.g. Newton-Raphson method) is used obtain MLE for
andMLEs
follow a -variate normal distribution with mean and variance matrixLarge sample based tests and confidence intervals can be obtained using the sampling distribution of
Test of hypothesis
Let
-
Likelihood ratio test statistic
Under
,
Let
-
Wald statistic
-
is a matrix and- Under
,
- Under
-
Null hypothesis
Test statistic
confidence interval forFor a small sample, LRT statistic can be used to test the hypothesis and to obtain confidence interval
Quantiles
The
quantile of given-
Estimate and corresponding SEs of
quantile-
and
-
confidence interval for
Survival probability
- We are interest to obtain confidence interval for
, which can be expressed in terms of the parameters of location-scale distribution as
-
Estimate and the corresponding SE of
confidence interval for
- Wald-type
confidence interval for
6.3 Weibull AFT
Distributional assumption
-
Regression model for the parameters
-
Regression model for the response
- Log-likelihood function
- We can now obtain score functions, information matrix, and MLE’s for
and (according to Section 6.2.)
We’ve already seen that the Weibull model implies a proportional hazard model
It is the only parametric model that is both an AFT model and a Proportional Hazards (PH) model at the same time
Leukemia survival times
Data on survival times for 33 leukemia patients are available, where survival times are in weeks from diagnosis
-
Data on two covariates are also available
White blood cell count (WBC) at diagnosis
Binary variable AG indicates a positive (AG=1) or negative (AG=0) test related to white blood cell characteristics
tab6_1
# A tibble: 33 × 5
time wbc AG status lwbc
<dbl> <dbl> <int> <dbl> <dbl>
1 65 2.3 1 1 0.833
2 140 0.75 1 0 -0.288
3 100 4.3 1 1 1.46
4 134 2.6 1 1 0.956
5 16 6 1 1 1.79
6 106 10.5 1 0 2.35
7 121 10 1 1 2.30
8 4 17 1 1 2.83
9 39 5.4 1 1 1.69
10 121 7 1 0 1.95
# ℹ 23 more rows
-
Consider Weibull AFT model with covariates
and
Fit Weibull regression model Equation 6.8 using R
mod62 <- survreg(Surv(time, status) ~ AG + lwbc,
data = tab6_1, dist = "weibull")
mod62E <- survreg(Surv(log(time), status) ~ AG + lwbc,
data = tab6_1, dist = "extreme")
MLEs of model parameters
tidy(mod62, conf.int = T) |>
mutate(p.value = scales::pvalue(p.value))
# A tibble: 4 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 (Intercept) 3.84 0.534 7.19 <0.001 2.79 4.89
2 AG 1.18 0.427 2.76 0.006 0.340 2.01
3 lwbc -0.366 0.150 -2.45 0.014 -0.660 -0.0731
4 Log(scale) 0.112 0.147 0.765 0.444 NA NA
Fitted model with
- Variance matrix of the estimated parameters
(Intercept) AG lwbc Log(scale)
(Intercept) 0.286 -0.130 -0.067 0.003
AG -0.130 0.182 0.016 0.005
lwbc -0.067 0.016 0.022 -0.005
Log(scale) 0.003 0.005 -0.005 0.021
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
3.841 | 0.534 | 7.188 | <0.001 | 2.794 | 4.889 | |
1.177 | 0.427 | 2.757 | 0.006 | 0.340 | 2.014 | |
-0.366 | 0.150 | -2.449 | 0.014 | -0.660 | -0.073 | |
0.112 | 0.147 | 0.765 | 0.444 | NA | NA |
AG and WBC have significant effects on leukemia survival times. Positive AG and low WBC count are associated with more prolonged survival
Since
is not significant, i.e. there is not enough evidence to reject , exponential AFT model would be appropriate for analyzing this data
Interpretations
A specific quantile (say median) lifetime of a patient with a positive AG value (i.e.
) is 3.2 times that of a patient with a negative AG (i.e. ) value provided WBC value remains constantNote this interpretation is true for any quantile (Why?)
- A specific quantile (say median) lifetime of a patient decreases 30.7 percent with one unit increase of log(WBC) [or 2718 unit increase of true WBC count] provided AG value remains constant
Fitted values
augment(mod62, type.predict = "response") |>
select(1:4) |>
slice(1:3)
# A tibble: 3 × 4
`Surv(time, status)` AG lwbc .fitted
<Surv> <int> <dbl> <dbl>
1 65 1 0.833 111.
2 140+ 1 -0.288 168.
3 100 1 1.46 88.6
augment(mod62, type.predict = "link") |>
mutate(.fittedE = exp(.fitted)) |>
select(2:4, .fittedE) |>
slice(1:3)
# A tibble: 3 × 4
AG lwbc .fitted .fittedE
<int> <dbl> <dbl> <dbl>
1 1 0.833 4.71 111.
2 1 -0.288 5.12 168.
3 1 1.46 4.48 88.6
- Estimate for a subject with
and
#predict(object = mod62, newdata = tibble(AG = 1, lwbc = .833),
# predict = "response")
augment(x = mod62, newdata = tibble(AG = 1, lwbc = .833),
type.predict = "response")
# A tibble: 1 × 4
AG lwbc .fitted .se.fit
<dbl> <dbl> <dbl> <dbl>
1 1 0.833 111. 41.3
- Estimate for a subject with
and
LRT
-
Likelihood ratio tests for
The corresponding
statistic
Estimate of model parameters under
# mod62a <- update(mod62, formula = . ~ . - AG)
mod62a <- survreg(Surv(time, status) ~ lwbc,
data = tab6_1, dist = "weibull")
tidy(mod62a)
# A tibble: 3 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.85 0.500 9.71 2.67e-22
2 lwbc -0.500 0.165 -3.03 2.41e- 3
3 Log(scale) 0.222 0.146 1.52 1.28e- 1
LRTa <- anova(mod62a, mod62)
Terms | Resid. Df | -2*LL | Df | Deviance | Pr(>Chi) |
---|---|---|---|---|---|
lwbc | 30 | 271.931 | NA | NA | NA |
AG + lwbc | 29 | 265.013 | 1 | 6.918 | 0.009 |
term | estimate | Wald | LRT |
---|---|---|---|
3.841 | 7.188 | NA | |
1.177 | 2.757 | 2.63 | |
-0.366 | -2.449 | -2.46 | |
0.112 | 0.765 | NA |
Quantiles
- Consider a subject with covariate values
and , the linear predictor
Median survival time of the patient with covariate values
andHomework: Obtain a 95% confidence interval of the median survival time of a patient with covariate values
and
Survival probability
-
Homework: Obtain the 95% CI for
6.4 Log-normal AFT
- Distributional assumption
Regression model for the parameters
-
Regression model for the response
Times to pulmonary exacerbation
Patients with cystic fibrosis are susceptible to an accumulation of mucus in the lungs, which leads to pulmonary exacerbation and deterioration of lung function
-
A clinical trial was conducted to investigate the efficacy of the new drug DNase-1
- Subjects are randomly assigned to a new treatment or a placebo
Time of interest is the time to first exacerbation after randomization, and data on fev (forced expiratory volume at the time of randomization) are also measured
# A tibble: 761 × 13
id trt time fev inst entry.dt end.dt ivstart ivstop time0
<int> <int> <dbl> <dbl> <int> <date> <date> <dbl> <dbl> <dbl>
1 1 1 168 28.8 1 1992-03-20 1992-09-04 NA NA 168
2 2 1 169 64 1 1992-03-24 1992-09-09 NA NA 169
3 3 0 65 67.2 1 1992-03-24 1992-09-08 65 75 168
4 4 1 168 57.6 1 1992-03-26 1992-09-10 NA NA 168
5 5 0 171 57.6 1 1992-03-24 1992-09-11 NA NA 171
6 6 1 166 25.6 1 1992-03-27 1992-09-09 NA NA 166
7 7 0 168 86.4 1 1992-03-27 1992-09-11 NA NA 168
8 8 0 90 32 1 1992-03-28 1992-09-10 90 104 166
9 9 1 169 86.4 2 1992-02-27 1992-08-14 NA NA 169
10 10 0 8 28.8 2 1992-03-06 1992-08-22 8 22 169
# ℹ 751 more rows
# ℹ 3 more variables: status <dbl>, fevm <dbl>, visit <int>
Assume survival time
follows a log-normal distribution with scale parameter and shape parameter-
Consider following AFT model for log survival time
- R codes for fitting the AFT model
mod63a <- survreg(Surv(log(time), status) ~ trt + fevm,
dist = "gaussian",
data = tab1_4)
tidy(mod63a)
# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 5.09 0.0684 74.4 0
2 trt 0.336 0.0951 3.53 4.19e- 4
3 fevm 0.0159 0.00197 8.09 5.91e-16
4 Log(scale) 0.137 0.0408 3.36 7.84e- 4
-
AFT model
For a binary predictor
It can be shown that
-
- Treatment increases the time to first pulmonary exacerbation by about 40% compared to the control when
fev
is fixed
- Treatment increases the time to first pulmonary exacerbation by about 40% compared to the control when
-
- One-unit increase in
fev
results about 2% increase in lifetime provided treatment is constant
- One-unit increase in
6.5 Log-logistic AFT
- Distributional assumptions
Regression model for the parameters
-
Regression model for the response
Lifetime distribution
-
The survivor function
-
the odds of failure at time for a subject with covariate vector
-
For two subjects with covariate vectors
andA model of the form
is known as the proportional odds model
-
Consider a model
- The odds of failure at time
for a subject with is times that of the odds of failure for a subject with
- The odds of failure at time
Times to pulmonary exacerbation
- R codes for fitting AFT model
mod63b <- survreg(Surv(log(time), status) ~ trt + fevm,
dist = "logistic",
data = tab1_4)
tidy(mod63b)
# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 5.08 0.0600 84.6 0
2 trt 0.293 0.0861 3.41 6.55e- 4
3 fevm 0.0145 0.00181 8.00 1.20e-15
4 Log(scale) -0.489 0.0466 -10.5 8.08e-26
-
- Treatment increases the time to first pulmonary exacerbation by about 34% compared to the control when
fev
is fixed
- Treatment increases the time to first pulmonary exacerbation by about 34% compared to the control when
-
- One-unit increase in
fev
results in a 1.5% increase in lifetime provided treatment is constant
- One-unit increase in
-
Interpret the treatment effect in terms of odds of failure
The odds of failure is 38% lower in the treatment group compared to the control group provided
fev
value is fixed
term | est | se | est | se |
---|---|---|---|---|
(Intercept) | 5.093 | 0.068 | 5.078 | 0.060 |
trt | 0.336 | 0.095 | 0.293 | 0.086 |
fevm | 0.016 | 0.002 | 0.014 | 0.002 |
Log(scale) | 0.137 | 0.041 | -0.489 | 0.047 |
Other regression models
- Additive hazards model
6.6 Graphical methods and model assessment
Graphical methods are helpful in summarizing information and suggesting possible models
These methods also provide ways to check assumptions concerning the form of a lifetime distribution and its relationship to covariates
Exploratory analysis of a lifetime distribution given covariates would helpful to select the appropiate Model for the analysis
For a single quantitative covariate, a plot of lifetime or log-lifetime against the covariate or a function of it could indicate the nature of the relationship between lifetime and the covariate
If the proportion of censoring is small, such a plot would be helpful, different symbols can be used in those plots for censored and failure times
When more than one quantitative covariate and light censoring, one can consider grouping individuals so that within a group, individuals will have similar values of important covariates
Let there are
such groups and is the Kaplan-Meier estimate for the group
AFT model
- If
is approximately constant for individuals within each group , and if an AFT model is appropriate, the plots of should be roughly parallel in horizontal direction ( )
Proportional hazards model
- If
is approximately constant for individuals within each group , and if a proportional hazards model is appropriate, the plots of should be roughly parallel in vertical direction
If the plots of
vs is roughly linear then Weibull models are suggestedIn addition to linear, if the plots are parallel, then Weibull models with a constant shape parameter are suggested, in that case, both AFT and PH models can be considered
Statistical analysis of data is an iterative process involving exploration, model fitting, and model assessment