Chapter 7

(AST405) Lifetime data analysis

Author

Md Rasel Biswas

7 Semiparametric Multiplicative Hazards Regression Models

7.1 Methods for continuous multiplicative hazards model

Models in which covariates have a multiplicative effect on the hazard function play an important role in the analysis of lifetime data
Proportional hazard (PH) model is one of such models
Depending on whether baseline hazard function is left arbitrary or not, PH model could be either semiparametric or parametric
In this section, semiparametric PH models are discussed, where baseline hazard function is left arbitrary

The hazard function is modeled as $\begin{matrix} (7.1) & \begin{aligned} h (t | x) & = h_{0} (t) \exp (x^{'} β) \\ = h_{0} (t) \exp (β_{1} x_{1} + β_{2} x_{2} + \dots + β_{p} x_{p}) \end{aligned} \end{matrix}$
- $h (t | x)$ = hazard at time t for a person with covariates $x$
- $h_{0} (t)$ = baseline hazard (unspecified)
- $β = (β_{1}, \dots, β_{p})^{'} \to$ vector of regression coefficients
- Covariate vector $x$ could include time-varying covariate
- No intercept term is included in $x^{'} β$
Model (Equation 7.1) is known as “Cox’s proportional hazards model” or simply “Cox model”
No distributional assumption is required for estimating the parameters of the Model (Equation 7.1)

The cumulative baseline hazard function is defined as $\begin{array}{r} H_{0} (t) = \int_{0}^{t} h_{0} (u) d u \end{array}$
The baseline survivor function $\begin{array}{r} S_{0} (t) = \exp [- H_{0} (t)] \end{array}$
The survivor function of $T$ given covariate vector $x$ $\begin{array}{r} S (t | x) = [S_{0} (t)]^{\exp (x^{'} β)} \end{array}$

Estimation of model parameters

Data ${(t_{i}, δ_{i}, x_{i}), i = 1, \dots, n}$
Parameters of interest are $h_{0} (t)$ and $β$

Log-likelihood function $\begin{aligned} ℓ (h_{0} (t), β) & = \log \prod_{i = 1}^{n} [f (t_{i}; x_{i})]^{δ_{i}} [S (t_{i}; x_{i})]^{1 - δ_{i}} \\ = \sum_{i} {δ_{i} \log [h_{0} (t_{i}) \exp (x_{i}^{'} β)] + \exp (x_{i}^{'} β) \log S_{0} (t_{i})} \\ = \sum_{i} {δ_{i} [\log h_{0} (t_{i}) + x_{i}^{'} β] + \exp (x_{i}^{'} β) \log S_{0} (t_{i})]} \end{aligned}$

No unique solutions of the parameters because the number of parameters to be estimated is greater than the number of observations

Complete likelihood function is not useful for estimating parameters of Cox’s proportional hazards model
There are a number of different likelihood functions defined for estimating parameters, of which Cox’s “partial likelihood function” is widely used for PH models
Log-partial-likelihood function is defined as $\begin{aligned} ℓ_{1} (β) & = \log \prod_{i = 1}^{n} (\frac{\exp (x_{i}^{'} β)}{\sum_{k = 1}^{n} Y_{k} (t_{i}) \exp (x_{k}^{'} β)})^{δ_{i}} \end{aligned}$
- $Y_{k} (t) = I (t_{k} \geq t) \to$ indicates whether the $k$ th subject is still in the risk set at time $t$ or not

Partial likelihood function can be treated as a regular likelihood function for making statistical inference
For partial likelihood function, the parameters of interest is $β$ and the estimated parameters $\hat{β} = {\arg max}_{β \in Θ} ℓ_{1} (β)$ follow asymptotically normal distribution, similar to MLEs
The baseline hazard functions are estimated from the full likelihood function with regression parameters are assumed to be known, i.e. $ℓ (h_{0} (t), \hat{β})$

Obtain the expression of partial likelihood function for the following censored sample

time	x
3	1
5	0
8	1
4+	1
10	0

7.2 Comparison of two or more lifetime distributions

Let $S_{j} (t)$ be the survivor function of lifetime $T_{j}$ , $j = 1, 2$
Data available ${(t_{i}, δ_{i}, x_{i}), i = 1, \dots, n}$
- $x_{i} = I (i th subject is from group 1)$
Null hypothesis $H_{0} : S_{1} (t) = S_{2} (t)$

Consider PH model $h (t | x) = h_{0} (t) \exp (β x) \Rightarrow S (t | x) = [S_{0} (t)]^{\exp (β x)}$
We can obtain $\begin{aligned} S_{2} (t) & = S (t | x = 0) = S_{0} (t) \\ S_{1} (t) & = S (t | x = 1) = [S_{0} (t)]^{\exp (β)} = [S_{2} (t)]^{\exp (β)} \end{aligned}$

The null hypothesis under proportional model assumption $H_{0} : S_{1} (t) = S_{2} (t) \Rightarrow H_{0} : β = 0$
Large sample-based property of MLE $\hat{β}$ can be used to test the null hypothesis

Log-likelihood function $\begin{aligned} ℓ (β) & = \log \prod_{i = 1}^{n} (\frac{e^{β x_{i}}}{\sum_{k = 1}^{n} Y_{k} (t_{i}) e^{β x_{k}}})^{δ_{i}} \\ = \sum_{i = 1}^{n} (δ_{i} x_{i} β - δ_{i} \log \sum_{k = 1}^{n} Y_{k} (t_{i}) e^{β x_{k}}) \end{aligned}$

Score function $\begin{aligned} U (β) & = \sum_{i = 1}^{n} (δ_{i} x_{i} - \frac{δ_{i} \sum_{k = 1}^{n} Y_{k} (t_{i}) e^{β x_{k}} x_{k}}{\sum_{k = 1}^{n} Y_{k} (t_{i}) e^{β x_{k}}}) \\ = \sum_{i = 1}^{n} (d_{1 i} - \frac{d_{i} n_{1 i} e^{β}}{n_{1 i} e^{β} + n_{2 i}}) \end{aligned}$
- $d_{i} = δ_{i}$
- $d_{1 i} = δ_{i} x_{i} = I (i t h subject from group 1)$
- $n_{1 i} = \sum_{k = 1}^{n} Y_{k} (t_{i}) x_{k} \to$ number of group 1 subjects at risk at time $t_{i}$
- $n_{2 i} = \sum_{k = 1}^{n} Y_{k} (t_{i}) (1 - x_{k}) \to$ number of group 2 subjects at risk at time $t_{i}$

Information matrix $\begin{aligned} I (β) & = - \frac{d_{i} n_{1 i} e^{β} n_{1 i} e^{β} - d_{i} (n_{1 i} e^{β} + n_{2 i}) n_{1 i} e^{β}}{(n_{1 i} e^{β} + n_{2 i})^{2}} \\ = \frac{d_{i} n_{1 i} n_{2 i} e^{β}}{(n_{1 i} e^{β} + n_{2 i})^{2}} \end{aligned}$

Confidence interval for $β$ can be obtained from the following pivotal quantity $Z (β) = \frac{U (β)}{[I (β)]^{1 / 2}}$ which follows an asymptotic standard normal distribution
$100 (1 - α) %$ confidence interval for $β$ can be obtained from the set of values of $β$ that satisfy $Z (β) \leq z_{1 - α}$

Under $H_{0} : β = 0$ $\begin{aligned} U (0) & = \sum_{i = 1}^{n} (d_{1 i} - \frac{d_{i} n_{1 i}}{n_{1 i} + n_{2 i}}) \\ I (0) & = \sum_{i = 1}^{n} \frac{d_{i} n_{1 i} n_{2 i}}{(n_{1 i} + n_{2 i})^{2}} \end{aligned}$
Test statistic $Z = \frac{U (0)}{[I (0)]^{1 / 2}} \sim N (0, 1)$
- MLE of $β$ does not require to test $H_{0} : β = 0$ using the statistic $Z$

The expression of $U (0)$ can be considered as the difference between observed number of deaths from group 1, $(d_{1 i})$ , at time $t_{i}$ and the corresponding expected number of deaths $d_{i} \times \frac{n_{1 i}}{n_{1 i} + n_{2 i}}$
At time $t_{i}$ , there are $n_{i} = n_{1 i} + n_{2 i}$ subjects are at risk and $d_{i}$ is either 0 or 1 (i.e. there is no ties in the lifetime)

group	event	alive	at risk
1	$d_{1 i}$	$n_{1 i} - d_{1 i}$	$n_{1 i}$
2	$d_{2 i}$	$n_{2 i} - d_{2 i}$	$n_{2 i}$
	$d_{i}$	$n_{i} - d_{i}$	$n_{i}$

This score test for the Cox model to compare two groups is also known as log-rank test.

Example 7.1.1

Data below show remission times (in weeks) for 40 leukemia patients who were randomly assigned either treatment $A$ or $B$

tab7_1_1

# A tibble: 40 × 3
    time status group
   <dbl>  <dbl> <chr>
 1     1      1 A    
 2     3      1 A    
 3     3      1 A    
 4     6      1 A    
 5     7      1 A    
 6     7      1 A    
 7    10      1 A    
 8    12      1 A    
 9    14      1 A    
10    15      1 A    
# ℹ 30 more rows

survdiff(Surv(time, status) ~ group, 
         data = tab7_1_1)

Call:
survdiff(formula = Surv(time, status) ~ group, data = tab7_1_1)

         N Observed Expected (O-E)^2/E (O-E)^2/V
group=A 20       17     21.5     0.951      2.36
group=B 20       20     15.5     1.322      2.36

 Chisq= 2.4  on 1 degrees of freedom, p= 0.1

coxph(Surv(time, status) ~ group, data = tab7_1_1) %>% 
    tidy()

# A tibble: 1 × 5
  term   estimate std.error statistic p.value
  <chr>     <dbl>     <dbl>     <dbl>   <dbl>
1 groupB    0.503     0.332      1.51   0.130

Example 7.2.1

Patients with cystic fibrosis are susceptible to an accumulation of mucus in lungs, which leads to pulmonary exacerbation and deterioration of lung function
A clinical trial was conducted to investigate the efficacy of the new drug DNase-1
- Subjects are randomly assigned to a new treatment or a placebo
Time of interest is the time to first exacerbation after randomization and data on fev (forced expiatory volume at the time of randomization) are also measured

Creating the data from the R object rhDNase

tab1_4 <- as_tibble(rhDNase) %>% 
  filter(is.na(ivstart) | ivstart > 0) %>% 
  mutate(time0 = as.numeric(end.dt - entry.dt),
         status = as.numeric(!is.na(ivstart)),
         time = if_else(status == 1, ivstart, time0),
         fevm = fev - mean(fev)) %>% 
  group_by(id) %>% 
  mutate(visit = n()) %>% 
  ungroup()

Cox’s PH model $h (t | trt, fevm) = h_{0} (t) \exp (β_{1} trt + β_{2} fevm)$ R code for fitting the model

mod1 <- coxph(Surv(time,  status) ~ trt + fevm, 
              data = tab1_4)

Estimates of regression coefficients

tidy(mod1)

# A tibble: 2 × 5
  term  estimate std.error statistic  p.value
  <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 trt    -0.352    0.106       -3.31 9.47e- 4
2 fevm   -0.0188   0.00226     -8.31 9.63e-17

Treatment group patients have lower hazard for time to first exacerbation
As FEV value increases the hazard of first exacerbation decreases
Effects of treatment and FEV are significant on the hazard of first exacerbation decreases

Estimates and corresponding confidence intervals of the parameters of Cox's PH model
term	estimate	p.value	HR	2.5 %	97.5 %
trt	-0.352	0.001	0.703	0.571	0.867
fevm	-0.019	0.000	0.981	0.977	0.986

Treatment group patients have about $30$ % lower hazard of first exacerbation than that of the placebo group patients provided FEV value remains constant
For 1-unit increase of FEV value, hazard of first exacerbation decreases about $2$ % provided treatment group remains constant

survfit() provides estimate of survivor function and corresponding standard errors

tidy(survfit(mod1)) %>% 
    as_tibble()

# A tibble: 161 × 8
    time n.risk n.event n.censor estimate std.error conf.high conf.low
   <dbl>  <dbl>   <dbl>    <dbl>    <dbl>     <dbl>     <dbl>    <dbl>
 1     1    761       1        0    0.999   0.00138     1        0.996
 2     5    760       3        0    0.994   0.00277     1.00     0.989
 3     6    757       1        0    0.993   0.00311     0.999    0.987
 4     8    756       4        0    0.988   0.00420     0.996    0.979
 5     9    752       3        0    0.983   0.00489     0.993    0.974
 6    11    749       2        0    0.981   0.00530     0.991    0.971
 7    13    747       2        0    0.978   0.00569     0.989    0.967
 8    14    745       2        0    0.975   0.00606     0.987    0.964
 9    15    743       4        0    0.970   0.00675     0.983    0.957
10    16    739       2        0    0.967   0.00708     0.980    0.953
# ℹ 151 more rows