Chapter 4

(AST405) Lifetime data analysis

Author

Md Rasel Biswas

4 Inference procedures for parametric models

Introduction

Likelihood methods for lifetime data were introduced in Chapter 2, which includes derivation of likelihood function for different types of censored data
- Maximum likelihood estimator
- Inference about parameters (hypothesis testing and confidence intervals)

Likelihood function (Complete data)

Let $t_{1}, \dots, t_{n}$ be a random sample from a distribution $f (t; θ)$ , where $θ$ is a $p$ -dimensional vector of parameters
Likelihood and log-likelihood function $\begin{aligned} L (θ) & = \prod_{i = 1}^{n} f (t_{i}; θ) \\ ℓ (θ) = \log L (θ) & = \sum_{i = 1}^{n} \log f (t_{i}; θ) \end{aligned}$
Maximum likelihood estimator (MLE) $\begin{array}{r} \hat{θ} = {a r g m a x}_{θ \in Θ} ℓ (θ) \end{array}$

Statistical inference (Complete data)

Large sample property of MLE $\begin{array}{r} \hat{θ} \sim N (θ, [I (θ)]^{- 1}) \end{array}$
- Fisher expected information matrix $\begin{array}{r} I (θ) = E [- \frac{\partial^{2} ℓ (θ)}{\partial θ \partial θ^{'}}] \end{array}$
- Observed information matrix $\begin{array}{r} I (\hat{θ}) = - \frac{\partial^{2} ℓ (θ)}{\partial θ \partial θ^{'}} |_{θ = \hat{θ}} \end{array}$

Likelihood function (Type I or random censoring)

Data: ${(t_{i}, δ_{i}), i = 1, \dots, n}$
- $t_{i}$ is a sample realization of ${\tilde{T}}_{i} = min (T_{i}, C_{i})$
- lifetime $T_{i}$ follows a distribution with pdf $f (t_{i}; θ)$ and the corresponding survivor function $S (t_{i}; θ)$
- censoring time $C_{i}$ could be random or fixed depending on the censoring mechanism
- $δ_{i} = I (T_{i} \leq C_{i})$ , censoring indicator

Data for type I or random censoring ${(t_{i}, δ_{i}), i = 1, \dots, n}$
Likelihood function $\begin{aligned} L (θ) & = \prod_{i = 1}^{n} [f (t_{i}; θ]^{δ_{i}} [S (t_{i}; θ)]^{1 - δ_{i}} \\ = \prod_{i = 1}^{n} [S (t_{i}; θ) h (t_{i}; θ)]^{δ_{i}} [S (t_{i}; θ)]^{1 - δ_{i}} \\ = \prod_{i = 1}^{n} [S (t_{i}; θ] [h (t_{i}; θ)]^{δ_{i}} \end{aligned}$

Likelihood function (Type II censoring)

Let lifetime $T_{i}$ $(i = 1, \dots, n)$ follows a distribution with pdf $f (t_{i}; θ)$ and the corresponding survivor function $S (t_{i}; θ)$
The experiment was terminated after observing $r$ smallest lifetimes
$t_{(1)} ⩽ \dots ⩽ t_{(r)}$
The remaining $(n - r)$ observations are considered as censored at $t_{(r)}$
Likelihood function $\begin{array}{r} L (θ) = [\prod_{i = 1}^{r} f (t_{(i)}; θ)] [S (t_{(r)}; θ)]^{n - r} \end{array}$

Statistical inference (censored sample)

For censored samples, the result “asymptotic distribution of MLE is normal” is still valid
The expression of the Fisher expected information matrix $I (θ)$ is complex for censored data, observed information matrix $I (\hat{θ})$ is used in making inference with censored sample

4.1 Exponential distribution

The exponential distribution is the first lifetime model for which statistical methodology were extensively developed
Exact tests can be developed for exponential distribution for certain type of censoring mechanism
Exponential distribution assume constant hazard and its use is limited for analyzing real life problems

Probability density function $\begin{array}{r} f (t; θ) = (1 / θ) \exp (- t / θ) t ⩾ 0, θ > 0 \end{array}$
Hazard function $\begin{array}{r} h (t; θ) = (1 / θ) \end{array}$
Survivor function $\begin{array}{r} S (t; θ) = \exp (- t / θ) \end{array}$
$p$ th quantile $S (t_{p}; θ) = 1 - p \Rightarrow \exp (- t_{p}; θ) = 1 - p \Rightarrow t_{p} = - θ \log (1 - p)$

Homework

Estimation and related inference of exponential distribution when the sample has no censored observations

Type I or random censoring

Lifetime $T \sim Exp (θ)$ , $θ > 0$
Data: ${(t_{i}, δ_{i}), i = 1, \dots, n}$
Likelihood function $\begin{aligned} L (θ) & = \prod_{i = 1}^{n} [f (t_{i}; θ)]^{δ_{i}} [S (t_{i}; θ)]^{1 - δ_{i}} \\ = \prod_{i = 1}^{n} S (t_{i}; θ) [h (t_{i}; θ)]^{δ_{i}} \\ = \prod_{i = 1}^{n} \exp (- t_{i} / θ) (1 / θ)^{δ_{i}} \end{aligned}$

Likelihood function $\begin{aligned} L (θ) & = \prod_{i = 1}^{n} \exp (- t_{i} / θ) (1 / θ)^{δ_{i}} \end{aligned}$
Log-likelihood function $\begin{aligned} ℓ (θ) & = \sum_{i = 1}^{n} [- (t_{i} / θ) - δ_{i} \log (θ)] \\ = - \frac{1}{θ} \sum_{i} t_{i} - r \log (θ) \end{aligned}$
- $r = \sum_{i} δ_{i} \to$ the number of failures observed in the sample

Log-likelihood function $\begin{aligned} ℓ (θ) & = - \frac{1}{θ} \sum_{i} t_{i} - r \log (θ) \end{aligned}$
MLE $\frac{\partial ℓ (θ)}{\partial θ} |_{θ = \hat{θ}} = 0 \Rightarrow \frac{1}{{\hat{θ}}^{2}} \sum_{i} t_{i} - \frac{r}{\hat{θ}} = 0$ $\begin{array}{r} \hat{θ} = \sum_{i = 1}^{n} t_{i} / r \end{array}$
- Assuming $r > 0$ and no finite MLE exist for $r = 0$

Information matrix

$\begin{aligned} I (θ) & = - \frac{\partial^{2} ℓ (θ)}{\partial θ^{2}} \\ = - \frac{\partial}{\partial θ} [\frac{1}{θ^{2}} \sum_{i} t_{i} - \frac{r}{θ}] \\ = \frac{2}{θ^{3}} \sum_{i} t_{i} - \frac{r}{θ^{2}} \end{aligned}$

Replacing the parameter $θ$ by its MLE $\hat{θ} = \sum_{i} t_{i} / r$ , the observed information matrix becomes $I (\hat{θ}) = \frac{r}{{\hat{θ}}^{2}}$

Confidence interval (Method I)

Using $\hat{θ}$ ’s asymptotic distribution $\begin{array}{r} Z_{1} = \frac{θ - \hat{θ}}{[I (\hat{θ})]^{- 1 / 2}} = \frac{θ - \hat{θ}}{\hat{θ} / \sqrt{r}} \sim N (0, 1) \end{array}$
For a small sample, $Z_{1}$ does not approximate the standard normal distribution very accurately
For a sample with a small number of uncensored observations, $ℓ (θ)$ tends to be asymmetric

Confidence interval (Method II)

Sprott (1980) showed that $ℓ_{1} (ϕ) = ℓ (ϕ^{- 3})$ is more closer to symmetric compared to $ℓ (θ)$ , where $ϕ = θ^{- 1 / 3} \Rightarrow ϕ^{3} = 1 / θ$ $\begin{aligned} ℓ (θ) & = - (1 / θ) \sum_{i} t_{i} - r \log θ \\ ℓ_{1} (ϕ) & = - ϕ^{3} \sum_{i} t_{i} + 3 r \log (ϕ) \end{aligned}$
- MLE $\hat{ϕ} = {\hat{θ}}^{- 1 / 3} = [\sum_{i} t_{i} / r]^{- 1 / 3}$
- Observed information matrix $\begin{array}{r} I_{1} (ϕ) = 6 ϕ \sum_{i} t_{i} + \frac{3 r}{ϕ^{2}} \Rightarrow I_{1} (\hat{ϕ}) = \frac{9 r}{{\hat{ϕ}}^{2}} \end{array}$

Pivotal quantity $Z_{2} = \frac{ϕ - \hat{ϕ}}{[I_{1} (\hat{ϕ})]^{- 1 / 2}} = \frac{ϕ - \hat{ϕ}}{\hat{ϕ} / \sqrt{9 r}} \sim N (0, 1)$
- Approximation of $Z_{2}$ is quite accurate compared to that of $Z_{1}$

Confidence interval (Method III)

Likelihood ratio statistic $Λ (θ) = 2 ℓ (\hat{θ}) - 2 ℓ (θ)$ can also be used to obtain confidence interval for $θ$ , where $\begin{aligned} ℓ (θ) & = - (1 / θ) \sum_{i} t_{i} - r \log (θ) \\ = - r (\hat{θ} / θ) - r \log (θ) \\ ℓ (\hat{θ}) & = - (1 / \hat{θ}) \sum_{i} t_{i} - r \log (\hat{θ}) \\ = - r - r \log (\hat{θ}) \end{aligned}$

LRT statistic $\begin{aligned} Λ (θ) & = 2 ℓ (\hat{θ}) - 2 ℓ (θ) \\ = 2 r [(\hat{θ} / θ) - 1 + \log (θ / \hat{θ})] \end{aligned}$
Two-sided $(1 - α) 100 %$ confidence intervals are obtained as the set of $θ$ values that satisfy $Λ (θ) \leq χ_{(1), 1 - α}^{2}$

Confidence interval of survivor function

Confidence interval for the parameter $θ$ can also be used to obtain confidence intervals for a monotone function of $θ$ , such as $S (t; θ) = \exp (- t / θ) or h (t; θ) = 1 / θ$
Let $100 (1 - α) %$ confidence interval for $θ$ $L (Data) ⩽ θ ⩽ U (Data),$ where $Data = {(t_{i}, δ_{i}), i = 1, \dots, n}$

$100 (1 - α) %$ confidence interval for $S (t_{0}; θ) = \exp (- t_{0} / θ)$ $\begin{aligned} L (Data) & ⩽ θ ⩽ U (Data) \\ t_{0} / U (Data) & ⩽ (t_{0} / θ) ⩽ t_{0} / L (Data) \\ - t_{0} / L (Data) & ⩽ (- t_{0} / θ) ⩽ - t_{0} / U (Data) \\ \exp (- t_{0} / L (Data)) & ⩽ S (t_{0}; θ) ⩽ \exp (- t_{0} / U (Data)) \end{aligned}$

Confidence interval of hazard function

$100 (1 - α) %$ confidence interval for $h (t_{0}; θ) = 1 / θ$ $(1 / U (Data)) ⩽ h (t_{0}; θ) ⩽ (1 / L (Data))$

Example 4.1.1

Lifetimes (in days) of 10 pieces of equipment

time	status
2	1
72	0
51	1
60	0
33	1
27	1
14	1
24	1
4	1
21	0

Assume lifetimes follow exponential distribution with parameter, i.e. $T_{i} \sim Exp (θ)$

MLE $\hat{θ} = \frac{\sum_{i} t_{i}}{r} = \frac{308}{7} = 44.0$
95% confidence interval of $θ$ (Method I)

$\begin{aligned} \hat{θ} \pm z_{.975} [I (\hat{θ})]^{- 1 / 2} & \Rightarrow \hat{θ} \pm z_{.975} (\hat{θ} / \sqrt{r}) \\ \Rightarrow 44.0 \pm (1.96) (44.0 / \sqrt{7}) \\ \Rightarrow 11.4 ⩽ θ ⩽ 76.6 \end{aligned}$

95% confidence interval of $θ$ (Method II)
- $\hat{ϕ} = {\hat{θ}}^{- 1 / 3} = (44.0)^{- 1 / 3} = 0.283$
- Confidence interval for $ϕ$
  $\begin{aligned} \hat{ϕ} \pm z_{.975} [I_{1} (\hat{ϕ})]^{- 1 / 2} & \Rightarrow \hat{ϕ} \pm z_{.975} (\hat{ϕ} / \sqrt{9 r}) \\ \Rightarrow 0.28 \pm (1.96) (0.28 / \sqrt{63}) \\ \Rightarrow 0.21 ⩽ ϕ ⩽ 0.35 \end{aligned}$
- Confidence interval for $θ$

$\begin{aligned} 0.21 ⩽ ϕ ⩽ 0.35 & \Rightarrow 0.21 ⩽ θ^{- 1 / 3} ⩽ 0.35 \\ \Rightarrow (0.35)^{- 3} ⩽ θ ⩽ (0.21)^{- 3} \\ \Rightarrow 22.69 ⩽ θ ⩽ 103.03 \end{aligned}$

Likelihood ratio statistic $Λ (θ) = 2 r [(\hat{θ} / θ) - 1 - \log (\hat{θ} / θ)]$
- 95% CI for $θ$ can be obtained from the set of $θ$ ’s such that $Λ (θ) \leq χ_{(1), .95}^{2} = 3.84$
- 95% CI for $θ$ $22.8 to 102.4$

A comparison of 95% CI for $θ$ with the data on lifetimes of 10 pieces of equipment (Example 4.1.1)
method	lower	upper
Normal approx.	11.40	76.60
Sprott	22.69	103.03
LRT	22.80	102.40

Methods for CI

Normal approximation $W_{1} (θ) = [\frac{(\hat{θ} - θ)}{[I (\hat{θ})]^{- 1 / 2}}]^{2} = (\hat{θ} - θ)^{2} I (\hat{θ})$
Sprott’s method $W_{2} (θ) = [\frac{(\hat{ϕ} - ϕ)}{[I_{1} (\hat{ϕ})]^{- 1 / 2}}]^{2} = ({\hat{θ}}^{- 1 / 3} - θ^{- 1 / 3})^{2} I_{1} ({\hat{θ}}^{- 1 / 3})$
LRT $W_{3} (θ) = 2 ℓ (\hat{θ}) - 2 ℓ (θ) = 2 r [(\hat{θ} / θ) - 1 - \log (\hat{θ} / θ)]$
$W_{j} (θ) \sim χ_{(1)}^{2} j = 1, 2, 3 under H_{0}$

Comparisons between normal approximations and LRT

Type II censoring plan

Assume lifetime $T \sim Exp (θ)$
Let $t_{(1)} < \dots < t_{(r)}$ be the $r$ smallest lifetimes observed from an experiment with $n (\geq r)$ subjects
The joint distribution of $t_{(1)}, \dots, t_{(r)}$ $\begin{aligned} f_{T} (t_{(1)}, \dots, t_{(r)}) & = \frac{n!}{(n - r)!} [\prod_{i = 1}^{r} (1 / θ) e^{- t_{(i)} / θ}] [e^{- t_{(r)} / θ)}]^{n - r} \\ = \frac{n!}{(n - r)!} (1 / θ)^{r} e^{- \frac{1}{θ} [\sum_{i = 1}^{r} t_{(i)} + (n - r) t_{(r)}]} \\ = \frac{n!}{(n - r)!} (1 / θ)^{r} e^{- T_{0} / θ} \end{aligned}$

The log-likelihood function $\begin{aligned} ℓ (θ) & = - \frac{1}{θ} \sum_{i} t_{(i)} - r \log (θ) - (1 / θ) (n - r) t_{(r)} + Const. \\ = - \frac{1}{θ} [\sum_{i = 1}^{r} t_{(i)} + (n - r) t_{(r)}] - r \log (θ) + Const. \\ = - \frac{T_{0}}{θ} - r \log (θ) + Const. \end{aligned}$

Maximum likelihood estimator $\frac{\partial ℓ (θ)}{\partial θ} |_{θ = \hat{θ}} = 0 \Rightarrow \hat{θ} = (1 / r) [\sum_{i = 1}^{r} t_{(i)} + (n - r) t_{(r)}]$
- Observed information matrix $I (\hat{θ}) = \frac{- \partial^{2} ℓ (θ)}{\partial θ^{2}} |_{θ = \hat{θ}} = \frac{r}{{\hat{θ}}^{2}}$
$(1 - α) 100 %$ CI for $θ$ (by Normal approximation) $\begin{array}{r} \hat{θ} \pm z_{1 - α / 2} [I (\hat{θ})]^{- 1 / 2} \end{array}$

Exact confidence interval

Exact confidence interval for $θ$ can be derived for uncensored and Type II censored samples, define $\begin{aligned} W_{1} & = n t_{(1)} \\ W_{2} & = (n - 1) (t_{(2)} - t_{(1)}) \\ ⋮ \\ W_{r} & = (n - r + 1) (t_{(r)} - t_{(r - 1)}) \end{aligned}$
In general $W_{i} = (n - i + 1) (t_{(i)} - t_{(i - 1)}) i = 1, \dots, r$

It can be shown that $\begin{matrix} (4.1) & \sum_{i = 1}^{r} W_{i} = \sum_{i = 1}^{r} t_{(i)} + (n - r) t_{(r)} = T_{0} \end{matrix}$

The joint distribution of $W_{1} = g_{1} (t_{(1)}, \dots, t_{(r)}), \dots, W_{r} = g_{r} (t_{(1)}, \dots, t_{(r)})$ $\begin{array}{r} f_{W} (w_{1}, \dots, w_{r}) = f_{T} (g_{1}^{- 1} (w_{1}, \dots, w_{r}), \dots, g_{r}^{- 1} (w_{1}, \dots, w_{r})) | J | \end{array}$ where $\begin{aligned} J & = \frac{\partial (g_{1}^{- 1} (w_{1}, \dots, w_{r}), \dots, g_{r}^{- 1} (w_{1}, \dots, w_{r}))}{\partial (w_{1}, \dots, w_{r})} \\ = [\begin{array}{c} \frac{\partial g_{1}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{1}} & \frac{\partial g_{2}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{1}} & \dots & \frac{\partial g_{r}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{1}} \\ \frac{\partial g_{1}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{2}} & \frac{\partial g_{2}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{2}} & \dots & \frac{\partial g_{r}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{2}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial g_{1}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{r}} & \frac{\partial g_{2}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{r}} & \dots & \frac{\partial g_{r}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{r}} \end{array}] \end{aligned}$

$\begin{aligned} w_{1} & = g_{1} (t_{(1)}, \dots, t_{(r)}) = n t_{(1)} \Rightarrow t_{(1)} = g_{1}^{- 1} (w_{1}, \dots, w_{r}) = \frac{w_{1}}{n} \\ w_{2} & = g_{2} (t_{(1)}, \dots, t_{(r)}) = (n - 1) (t_{(2)} - t_{(1)}) \\ \Rightarrow t_{(2)} = g_{2}^{- 1} (w_{1}, \dots, w_{r}) = \frac{w_{2}}{n - 1} + \frac{w_{1}}{n} \\ ⋮ \\ w_{r} & = g_{r} (t_{(1)}, \dots, t_{(r)}) = (n - r + 1) (t_{(r)} - t_{(r - 1)}) \\ \Rightarrow t_{(r)} = g_{r}^{- 1} (w_{1}, \dots, w_{r}) = \frac{w_{r}}{n - r + 1} + \dots + \frac{w_{2}}{n - 1} + \frac{w_{1}}{n} \end{aligned}$

$\begin{aligned} J & = [\begin{array}{c} \frac{\partial g_{1}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{1}} & \frac{\partial g_{2}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{1}} & \dots & \frac{\partial g_{r}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{1}} \\ \frac{\partial g_{1}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{2}} & \frac{\partial g_{2}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{2}} & \dots & \frac{\partial g_{r}^{- 1} (w_{!}, \dots, w_{r})}{\partial w_{2}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial g_{1}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{r}} & \frac{\partial g_{2}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{r}} & \dots & \frac{\partial g_{r}^{- 1} (w_{1}, \dots, w_{r})}{\partial w_{r}} \end{array}] \\ = [\begin{array}{c} \frac{1}{n} & \frac{1}{n} & \dots & \frac{1}{n} \\ 0 & \frac{1}{n - 1} & \dots & \frac{1}{n - 1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & \frac{1}{n - r + 1} \end{array}] \end{aligned}$

We can write $\begin{array}{r} | J | = \frac{1}{n \cdot (n - 1) \dots (n - r + 1)} = \frac{(n - r)!}{n!} \end{array}$

The joint distribution of $W_{1} = g_{1} (t_{(1)}, \dots, t_{(r)}), \dots, W_{r} = g_{r} (t_{(1)}, \dots, t_{(r)})$ $\begin{aligned} f_{W} (w_{1}, \dots, w_{r}) & = f_{T} (g_{1}^{- 1} (w_{1}, \dots, w_{r}), \dots, g_{r}^{- 1} (w_{1}, \dots, w_{r})) | J | \\ = \frac{n!}{(n - r)!} \frac{1}{θ^{r}} e^{- \frac{1}{θ} {\sum_{i} g_{i}^{- 1} (w_{i}) + (n - r) g_{r}^{- 1} (w_{r})}} \frac{(n - r)!}{n!} \\ = \frac{1}{θ^{r}} e^{- \sum_{i} w_{i} / θ} \end{aligned}$
- Using Equation 4.1
Equation $???$ shows that
- $W_{1}, \dots, W_{r}$ are independent and $W_{i} \sim Exp (θ)$

Since $W_{i}$ ’ are independent and $W_{i} \sim Exp (θ)$ , so $T_{0} = \sum_{i} W_{i} = \sum_{i} t_{(i)} + (n - r) t_{(r)}$ follows a gamma distribution with shape parameter $r$ and scale parameter $θ$
Then using the relationship between gamma and chi-square distribution $\frac{2 T_{0}}{θ} \sim χ_{(2 r)}^{2}$

Review

Moment generating functions for different distributions $\begin{aligned} M_{X} (t) & = E [e^{t X}] \\ = {\begin{cases} \frac{1}{1 - θ t} & for X \sim Exp (θ) \\ (\frac{1}{1 - θ t})^{r} & for X \sim Gamma (r, θ) \\ (\frac{1}{1 - 2 t})^{r / 2} & for X \sim χ_{(r)}^{2} \end{cases} \end{aligned}$

Since $\frac{2 T_{0}}{θ} \sim χ_{(2 r)}^{2}$ , we can write $P (χ_{(2 r), α / 2}^{2} \leq \frac{2 T_{0}}{θ} \leq χ_{(2 r), 1 - α / 2}^{2}) = 1 - α$
- $χ_{(2 r), p}^{2} \to$ $p$ th quantile of $χ_{(2 r)}^{2}$ distribution
$(1 - α) 100 %$ exact confidence interval for $θ$ $\begin{array}{r} \frac{2 T_{0}}{χ_{(2 r), 1 - α / 2}^{2}} \leq θ \leq \frac{2 T_{0}}{χ_{(2 r), α / 2}^{2}} \end{array}$

Example 4.1.3

The first 8 observations in a random sample of 12 lifetimes (in hours) from an assumed exponential distribution are $31, 58, 157, 185, 300, 470, 497, 673$
Homework
- Obtain 95% exact and approximate confidence intervals for $θ$

4.1.3 Comparison of distributions

Comparison of two or more lifetime distributions is often of interest in practice
For comparing two or more independent exponential distributions, different methods of hypothesis tests and confidence intervals are available

Likelihood ratio tests

Let $T_{i j}$ be the lifetime corresponding to the $j$ th subject of the $i$ th group $(i = 1, \dots, m; j = 1, \dots, n_{i})$
Assume $T_{i j} \sim Exp (θ_{i})$ and the null hypothesis of interest $H_{0} : θ_{1} = \dots = θ_{m}$
Data: ${(t_{i j}, δ_{i j}), i = 1, \dots, m; j = 1, \dots, n_{i}}$

The likelihood function for $θ = (θ_{1}, \dots, θ_{m})^{'}$ $\begin{aligned} L (θ) & = \prod_{i = 1}^{m} \prod_{j = 1}^{n_{i}} [f (t_{i j}; θ_{i})]^{δ_{i j}} [S (t_{i j}; θ_{i}]^{1 - δ_{i j}} \\ = \prod_{i = 1}^{m} \prod_{j = 1}^{n_{i}} [\frac{1}{θ_{i}}]^{δ_{i j}} e^{- t_{i j} / θ_{i}} \end{aligned}$
The corresponding log-likelihood function $\begin{array}{r} ℓ (θ) = - \sum_{i} \sum_{j} [(t_{i j} / θ_{i}) + δ_{i j} \log (θ_{i})] \end{array}$

The log-likelihood function $\begin{aligned} ℓ (θ) & = - \sum_{i} \sum_{j} [(t_{i j} / θ_{i}) + δ_{i j} \log (θ_{i})] \\ = - \sum_{i} [(T_{i} / θ_{i}) + r_{i} \log (θ_{i})] \end{aligned}$
- $T_{i} = \sum_{j} t_{i j}$ and $r_{i} = \sum_{j} δ_{i j}$
Maximum likelihood estimator of $θ_{i}$ $(i = 1, \dots, m)$
$\frac{\partial ℓ (θ)}{\partial θ_{i}} |_{θ_{i} = {\hat{θ}}_{i}} = 0 \Rightarrow \frac{T_{i}}{{\hat{θ}}_{i}^{2}} - \frac{r_{i}}{{\hat{θ}}_{i}} = 0 \Rightarrow {\hat{θ}}_{i} = (T_{i} / r_{i})$

Under $H_{0} : θ_{1} = \dots = θ_{m} = θ (say)$ log-likelihood function $ℓ (θ) = - \sum_{i} [(T_{i} / θ) + r_{i} \log (θ)]$
- Under $H_{0}$ , the MLE of $θ$ $\tilde{θ} = \frac{\sum_{i} T_{i}}{\sum_{i} r_{i}}$

Likelihood ratio test statistic $\begin{aligned} Λ & = 2 ℓ (\hat{θ}) - 2 ℓ (\tilde{θ}) \\ = 2 \sum_{i} [- (T_{i} / {\hat{θ}}_{i}) - r_{i} \log ({\hat{θ}}_{i}) + (T_{i} / \tilde{θ}) + r_{i} \log (\tilde{θ})] \\ = 2 \sum_{i} [- r_{i} \log ({\hat{θ}}_{i}) + r_{i} \log (\tilde{θ})] \end{aligned}$
- Under $H_{0}$ , $Λ \sim χ_{(m - 1)}^{2}$ for a large sample size

Example 4.1.4

Four independent samples of size 10 each had 7 failures
MLE under exponential model ${\hat{θ}}_{1} = 106, {\hat{θ}}_{2} = 80, {\hat{θ}}_{3} = 140, {\hat{θ}}_{4} = 158$
Homework
- Test $H_{0} : θ_{1} = \dots = θ_{4}$

Confidence intervals for $θ_{1} / θ_{2}$

Comparison between two exponential distributions $T_{i} \sim Exp (θ_{i}), i = 1, 2$
It can be shown that mle ${\hat{θ}}_{i}$ approximately follows normal distribution $\begin{aligned} {\hat{θ}}_{i} & \sim N (θ_{i}, θ_{i}^{2} / r_{i}) \\ \log {\hat{θ}}_{i} & \sim N (\log θ_{i}, 1 / r_{i}) \end{aligned}$
Distribution of $\log ({\hat{θ}}_{1} / {\hat{θ}}_{2})$ $\begin{array}{r} \log {\hat{θ}}_{1} - \log {\hat{θ}}_{2} = \log ({\hat{θ}}_{1} / {\hat{θ}}_{2}) \sim N (\log (θ_{1} / θ_{2}), (r_{1}^{- 1} + r_{2}^{- 1})) \end{array}$

Null hypothesis $\begin{array}{r} H_{0} : θ_{1} = θ_{2} \Rightarrow H_{0} : θ_{1} / θ_{2} = 1 \Rightarrow H_{0} : \log θ_{1} = \log θ_{2} \end{array}$
Test statistic $Z = \frac{\log ({\hat{θ}}_{1} / {\hat{θ}}_{2}) - \log (θ_{1} / θ_{2})}{(r_{1}^{- 1} + r_{2}^{- 1})^{1 / 2}} \sim N (0, 1)$

95% CI for $\log (θ_{1} / θ_{2})$ $\begin{matrix} (4.2) & \log ({\hat{θ}}_{1} / {\hat{θ}}_{2}) \pm z_{1 - α / 2} (r_{1}^{- 1} + r_{2}^{- 1})^{1 / 2} \end{matrix}$
95% CI for $(θ_{1} / θ_{2})$ $({\hat{θ}}_{1} / {\hat{θ}}_{2}) \exp (\pm z_{1 - α / 2} (r_{1}^{- 1} + r_{2}^{- 1})^{1 / 2})$

Confidence intervals can also be found by inverting the likelihood ratio test for a hypothesis of the form $H_{0} : θ_{1} = a θ_{2}$ where $a > 0$
For Type I sample ${(t_{i j}, δ_{i j}), i = 1, 2; j = 1, \dots, n_{i}}$ , the log-likelihood function $ℓ (θ_{1}, θ_{2}) = - \sum_{i = 1}^{2} [(T_{i} / θ_{i}) + r_{i} \log (θ_{i})]$
- MLE ${\hat{θ}}_{1} = (T_{1} / r_{1})$ and ${\hat{θ}}_{2} = (T_{2} / r_{2})$

Under $H_{0} : θ_{1} = a θ_{2}$ , the log-likelihood function $\begin{aligned} ℓ (a θ_{2}, θ_{2}) & = - r_{1} \log (a θ_{2}) - (T_{1} / (a θ_{2})) - r_{2} \log (θ_{2}) - (T_{2} / θ_{2}) \\ = - (r_{1} + r_{2}) \log (θ_{2}) - (T_{1} / (a θ_{2})) - (T_{2} / θ_{2}) - r_{1} \log (a) \end{aligned}$
- MLE under $H_{0}$ ${\tilde{θ}}_{2} = \frac{T_{1} + a T_{2}}{a (r_{1} + r_{2})} \Rightarrow {\tilde{θ}}_{1} = a {\tilde{θ}}_{2} = \frac{T_{1} + a T_{2}}{(r_{1} + r_{2})}$

The likelihood ratio test statistic $Λ (a) = 2 ℓ ({\hat{θ}}_{1}, {\hat{θ}}_{2}) - 2 ℓ ({\tilde{θ}}_{1}, {\tilde{θ}}_{2}),$ where $\begin{aligned} ℓ ({\hat{θ}}_{1}, {\hat{θ}}_{2}) & = - \sum_{i = 1}^{2} [(T_{i} / {\hat{θ}}_{i}) + r_{i} \log ({\hat{θ}}_{i})] \\ = - (r_{1} + r_{2}) - r_{1} \log ({\hat{θ}}_{1}) - r_{2} \log ({\hat{θ}}_{2}) \end{aligned}$

and $\begin{aligned} ℓ ({\tilde{θ}}_{1}, {\tilde{θ}}_{2}) & = - \frac{T_{1}}{{\tilde{θ}}_{1}} - \frac{T_{2}}{{\tilde{θ}}_{2}} - r_{1} \log ({\tilde{θ}}_{1}) - r_{2} \log ({\tilde{θ}}_{2}) \\ = - (r_{1} + r_{2}) - r_{1} \log ({\tilde{θ}}_{1}) - r_{2} \log ({\tilde{θ}}_{2}) \end{aligned}$

The likelihood ratio test statistic $\begin{aligned} Λ (a) & = 2 ℓ ({\hat{θ}}_{1}, {\hat{θ}}_{2}) - 2 ℓ ({\tilde{θ}}_{1}, {\tilde{θ}}_{2}) \\ = 2 r_{1} \log ({\tilde{θ}}_{1} / {\hat{θ}}_{1}) + 2 r_{2} \log ({\tilde{θ}}_{2} / {\hat{θ}}_{2}) \\ = 2 r_{1} \log (a {\tilde{θ}}_{2} / {\hat{θ}}_{1}) + 2 r_{2} \log ({\tilde{θ}}_{2} / {\hat{θ}}_{2}) \end{aligned}$

$(1 - α) 100 %$ confidence interval for $(θ_{1} / θ_{2})$ can be construed from the values $a$ that satisfy $\begin{matrix} (4.3) & Λ (a) \leq χ_{(1), 1 - α}^{2} \end{matrix}$
For Type I censored sample, LRT statistics based confidence interval for $(θ_{1} / θ_{2})$ Equation 4.3 is more accurate than that of normal approximation Equation 4.2 for small samples

Example 4.1.5

A small clinical trial was conducted to compare the duration of remission achieved by two drugs used in the treatment of leukemia.
Duration of remission is assumed to follow an exponential distribution and two groups of 20 patients produced the followings under a Type I censoring mechanism $r_{1} = 10, T_{1} = 700, r_{2} = 10, T_{2} = 540$
Obtain 95% approximate and exact CI for $(θ_{1} / θ_{2})$

Unrestricted MLEs ${\hat{θ}}_{1} = \frac{T_{1}}{r_{1}} = 70 and {\hat{θ}}_{2} = \frac{T_{2}}{r_{2}} = 54$
95% approximate CI for $\log (θ_{1} / θ_{2})$ $\begin{aligned} \log ({\hat{θ}}_{1} / {\hat{θ}}_{2}) & \pm z_{.975} (r_{1}^{- 1} + r_{2}^{- 1})^{1 / 2} \\ \log (70 / 54) & \pm (1.96) (10^{- 1} + 10^{- 1})^{1 / 2} \\ 0.26 & \pm 0.877 \end{aligned}$

95% approximate CI (normal distribution based) for $(θ_{1} / θ_{2})$ $- 0.617 < \log (θ_{1} / θ_{2}) < 1.136 \Rightarrow 0.54 < (θ_{1} / θ_{2}) < 3.114$

Is there any significant difference between $θ_{1}$ and $θ_{2}$ ?

Likelihood ratio statistic based confidence interval for $(θ_{1} / θ_{2})$ requires estimate of the parameters under the null hypothesis $H_{0} : θ_{1} = a θ_{2}$
Estimates under $H_{0}$ ${\tilde{θ}}_{2} = \frac{T_{1} + a T_{2}}{a (r_{1} + r_{2})} \Rightarrow {\tilde{θ}}_{1} = a {\tilde{θ}}_{2} = \frac{T_{1} + a T_{2}}{(r_{1} + r_{2})}$
Likelihood ratio statistic $\begin{aligned} Λ (a) & = 2 r_{1} \log (a {\tilde{θ}}_{2} / {\hat{θ}}_{1}) + 2 r_{2} \log ({\tilde{θ}}_{2} / {\hat{θ}}_{2}) \end{aligned}$

$a$	$Λ (a)$	$a$	$Λ (a)$
0.525	3.953	3.185	3.911
0.560	3.424	3.150	3.819
0.595	2.958	3.115	3.726
0.630	2.549	3.080	3.633
0.665	2.187	3.045	3.541
0.700	1.869	3.010	3.448
0.735	1.589	2.975	3.356
0.770	1.341	2.940	3.263

95% of $(θ_{1} / θ_{2})$ is the range of values of $a = (θ_{1} / θ_{2})$ , such that $Λ (a) \leq χ_{(1), .95}^{2} = 3.84$ which is $0.56 < a < 3.15 \Rightarrow 0.56 < (θ_{1} / θ_{2}) < 3.15$

95% confidence interval of $(θ_{1} / θ_{2})$ for Type I censored sample
method	lower	upper
Normal approximation	0.54	3.114
LRT	0.56	3.150

Type II censored sample (CI for $θ_{1} / θ_{2}$ )

Lifetime distributions $T_{i} \sim Exp (θ_{i}) i = 1, 2$
Type II censored sample for the group $i$ , which has $r_{i}$ number of failures and $(n_{i} - r_{i})$ subjects are censored at $t_{(i r_{i})}$ $t_{(i 1)} < \dots < t_{(i r_{i})}$

Likelihood function $\begin{aligned} L (θ_{1}, θ_{2}) & = \prod_{i = 1}^{2} (1 / θ_{i})^{r_{i}} e^{- \sum_{j} t_{(i j)} / θ_{i}} e^{- (n_{i} - r_{i}) t_{(i r_{i})} / θ_{i}} \\ = \prod_{i = 1}^{2} (1 / θ_{i})^{r_{i}} e^{- (1 / θ_{i}) [\sum_{j} t_{(i j)} - (n_{i} - r_{i}) t_{(i r_{i})}]} \\ = \prod_{i = 1}^{2} (1 / θ_{i})^{r_{i}} e^{- T_{i} / θ_{i}} \end{aligned}$

MLE ${\hat{θ}}_{i} = \frac{T_{i}}{r_{i}}$
We have already shown that $\frac{2 T_{i}}{θ_{i}} = \frac{2 r_{i} {\hat{θ}}_{i}}{θ_{i}} \sim χ_{(2 r_{i})}^{2}$
Then we can show $\frac{(2 r_{1} {\hat{θ}}_{1} / θ_{1}) / (2 r_{1})}{(2 r_{2} {\hat{θ}}_{2} / θ_{2}) / (2 r_{2})} = \frac{{\hat{θ}}_{1} θ_{2}}{{\hat{θ}}_{2} θ_{1}} \sim F_{(2 r_{1}, 2 r_{2})}$

We can write $P (F_{(2 r_{1}, 2 r_{2}), α / 2} \leq \frac{{\hat{θ}}_{1} θ_{2}}{{\hat{θ}}_{2} θ_{1}} \leq F_{(2 r_{1}, 2 r_{2}), 1 - α / 2}) = 1 - α$
$(1 - α) 100 %$ confidence interval for $(θ_{1} / θ_{2})$ $\begin{array}{r} \frac{({\hat{θ}}_{1} / {\hat{θ}}_{2})}{F_{(2 r_{1}, 2 r_{2}), 1 - α / 2}} \leq (θ_{1} / θ_{2}) \leq \frac{({\hat{θ}}_{1} / {\hat{θ}}_{2})}{F_{(2 r_{1}, 2 r_{2}), α / 2}} \end{array}$

4.2 Gamma distribution

The pdf of two-parameter gamma distribution $\begin{array}{r} f (t; α, k) = \frac{1}{α Γ k} (\frac{t}{α})^{k - 1} \exp (- t / α), t > 0 \end{array}$
- Scale parameter $α > 0$ and shape parameter $k > 0$

Survivor function $\begin{aligned} S (t; α, k) & = \int_{t}^{\infty} f (u; α, k) d u \\ = \int_{t}^{\infty} \frac{1}{α Γ k} (\frac{u}{α})^{k - 1} \exp (- u / α) d u \\ = 1 - I (k, t / α) \end{aligned}$
Incomplete gamma function $I (k, x) = \frac{1}{Γ (k)} \int_{0}^{x} u^{k - 1} e^{- u} d u$

Uncensored data

Let $t_{1}, \dots, t_{n}$ be a random sample from $Gamma (α, k)$ , the log-likelihood function $\begin{aligned} ℓ (α, k) & = \sum_{i = 1}^{n} \log f (t_{i}; α, k) \\ = \sum_{i = 1}^{n} \log [\frac{1}{α Γ k} (\frac{t_{i}}{α})^{k - 1} \exp (- t_{i} / α)] \\ = - n \log Γ k - n k \log α + n (k - 1) \log \tilde{t} - n \bar{t} / α \end{aligned}$
- $\tilde{t} = (\prod_{i = 1}^{n} t_{i})^{1 / n}$ and $\bar{t} = (1 / n) \sum_{i} t_{i}$

$ℓ (α, k) = - n \log Γ k - n k \log α + n (k - 1) \log \tilde{t} - n \bar{t} / α$

Score functions $\begin{aligned} U_{1} (α, k) & = \frac{\partial ℓ (α, k)}{\partial α} = \frac{- n k}{α} + \frac{n \bar{t}}{α^{2}} \\ U_{2} (α, k) & = \frac{\partial ℓ (α, k)}{\partial k} = - n ψ (k) - n \log α + n \log \tilde{t} \end{aligned}$ where $ψ (k) = Γ^{'} (k) / Γ (k)$ is the digamma function (see Appendix B of the Textbook)

MLE $(\hat{α}, \hat{k})$ is a solution of the system of linear equations $\begin{aligned} \frac{- n \hat{k}}{\hat{α}} + \frac{n \bar{t}}{{\hat{α}}^{2}} & = 0 \\ - n ψ (\hat{k}) - n \log \hat{α} + n \log \tilde{t} & = 0 \end{aligned}$
- Two equations and two unknowns
- Equations are non-linear in terms of the variables
- No closed form solutions

1. Substitution method

$\begin{matrix} (4.4) & \frac{- n \hat{k}}{\hat{α}} + \frac{n \bar{t}}{{\hat{α}}^{2}} = 0 \Rightarrow \hat{α} = \frac{\bar{t}}{\hat{k}} \end{matrix}$

$\begin{matrix} (4.5) & - n ψ (\hat{k}) - n \log \hat{α} + n \log \tilde{t} = 0 \Rightarrow ψ (\hat{k}) - \log \hat{k} = \log (\tilde{t} / \bar{t}) \end{matrix}$

Solve Equation 4.5 using a suitable optimization technique (e.g. graphical method, Newton-Raphson method, etc.) to obtain $\hat{k}$
Then $\hat{α}$ can be obtained from Equation 4.4

2. Newton-Raphson method

Score functions $\begin{aligned} U_{1} (α, k) & = \frac{\partial ℓ (α, k)}{\partial α} = \frac{- n k}{α} + \frac{n \bar{t}}{α^{2}} \\ U_{2} (α, k) & = \frac{\partial ℓ (α, k)}{\partial k} = - n ψ (k) - n \log α + n \log \tilde{t} \end{aligned}$

Elements of Hessian matrix $\begin{aligned} H_{11} (α, k) & = \frac{\partial^{2} ℓ (α, k)}{\partial α^{2}} = \frac{n k}{α^{2}} - \frac{2 n \bar{t}}{α^{3}} \\ H_{12} (α, k) & = \frac{\partial^{2} ℓ (α, k)}{\partial α \partial k} = \frac{- n}{α} = I_{21} (α, k) \\ H_{22} (α, k) & = \frac{\partial^{2} ℓ (α, k)}{\partial k^{2}} = - n ψ^{'} (k) \end{aligned}$

Score vector $U (α, k) = [\begin{matrix} U_{1} (α, k) \\ U_{2} (α, k) \end{matrix}]$
Hessian matrix $H (α, k) = [\begin{matrix} H_{11} (α, k) & H_{12} (α, k) \\ H_{21} (α, k) & H_{22} (α, k) \end{matrix}]$
Score vector and information matrix are function of parameters and data

Initial values $θ^{(0)} = (α^{(0)}, k^{(0)})^{'}$ are chosen so that elements of score vector and hessian matrix are finite
Updated estimate $θ^{(1)} = (α^{(1)}, k^{(1)})^{'}$ is obtained as $\begin{matrix} (4.6) & θ^{(1)} = θ^{(0)} - [H (θ^{(0)})]^{- 1} U (θ^{(0)}) \end{matrix}$
The estimate $θ^{(2)}$ can be obtained by using $θ^{(1)}$ as input in Equation 4.6
Repeating the procedure of evaluating the Equation 4.6 using the current estimate, the following sequence of estimates can be obtained ${θ^{(j)}, j = 3, 4, 5, \dots}$

A convergence criterion needs to be defined to obtain the MLE from the sequence of estimates ${θ^{(j)}, j = 1, 2, 3, \dots},$

Convergence criteria are defined on the basis of two successive values of the parameters estimates
$θ^{(j)} = (\hat{α}, \hat{k})^{'}$ is considered as MLE if one of the following criteria is satisfied
- $| θ^{(j)} - θ^{(j - 1)} | is very small$
- $| ℓ (θ^{(j)}) - ℓ (θ^{(j - 1)}) | is very small$
- $U (θ_{j}) \approx 0$

Estimated variance-variance matrix of the MLE $(\hat{α}, \hat{k})^{'}$ can be obtained from the inverse of the negative of the hessian matrix evaluated at the MLE $\hat{Var} (\begin{matrix} \hat{α} \\ \hat{k} \end{matrix}) = - [H (\hat{α}, \hat{k})]^{- 1}$

Newton-Raphson method: Pseudo code

1: theta0 <- `initial value of the parameter`
2: eps <- 1 
3: while (eps > 1e-5) {
4:   u0 <- U(theta0)
5:   h0 <- H(theta0)
6:   theat1 <- theta0 - inv(h0) * u0
7:   eps <- max(abs(theta1 - theta0))
8:   if (eps < 1e-5) break
9:   else theta0 <- theta1
10: }
11: return (list(theta0, h0))

Statistical software have routines (such as optim() of R) that can optimize likelihood function to obtain MLE
- Such routines require providing a “function” of likelihood function as an argument
- Different optimization algorithms, such as Newton-Raphson, Mead-Nelder, simulated annealing, etc. are implemented in optimization routines

Statistical inference

Asymptotic distributions of MLEs, i.e. $\hat{α} \sim N (α, var (\hat{α})) and \hat{k} \sim N (k, var (\hat{k}))$
Since $α > 0$ and $k > 0$ , confidence intervals based on the sampling distribution of $\log \hat{α}$ and $\log \hat{k}$ ensure non-negative lower limit of the confidence interval $\log \hat{α} \sim N (\log α, var (\log \hat{α})) and \log \hat{k} \sim N (\log k, var (\log \hat{k}))$
- $var (\log \hat{α}) = (1 / \hat{α})^{2} var (\hat{α})$

$(1 - α) 100 %$ confidence interval for $\log α$

$\log \hat{α} \pm z_{1 - α / 2} SE (\log \hat{α})$

$(1 - α) 100 %$ confidence interval for $α$

$\hat{α} \exp (\pm z_{1 - α / 2} SE (\log \hat{α}))$

Example 4.2.1

Following are survival time of 20 male rates that were exposed to a high level of radiation

rtime <- c(152, 152, 115, 109, 137, 88, 94, 77, 160, 165, 125, 40, 128, 123, 136, 101, 62, 153, 83, 69)
rtime

 [1] 152 152 115 109 137  88  94  77 160 165 125  40 128 123 136 101  62 153  83
[20]  69

Assume the lifetimes follow a gamma distribution with scale parameter $α$ and shape parameter $k$

From the data: $\bar{t} = 113.45 and \tilde{t} = 107.07$
Expressions of the score function $\begin{matrix} (4.7) & \begin{aligned} \hat{α} = (\bar{t} / \hat{k}) & = 113.45 / \hat{k} \\ ψ (\hat{k}) - \log \hat{k} - \log (\tilde{t} / \bar{t}) & = 0 \\ ψ (\hat{k}) - \log \hat{k} + 0.058 & = 0 \end{aligned} \end{matrix}$

R function uniroot() can be used to obtain the value of $\hat{k}$ by solving the Equation 4.7

kfun <- function(k0, time) {
  t_bar <- mean(time)
  tgbar <- exp(mean(log(time)))
  return(digamma(k0) - log(k0) - 
           log(tgbar / t_bar))
}
k_hat <- uniroot(kfun, lower = 3, 
                 upper = 10, 
                 time = rtime)

$\hat{k} = 8.799 \Rightarrow \hat{α} = 12.893$

k_hat

$root
[1] 8.799215

$f.root
[1] 8.181174e-10

$iter
[1] 7

$init.it
[1] NA

$estim.prec
[1] 6.103516e-05

Log-likelihood function of gamma distribution

R function to calculate log-likelihood function of a gamma distribution for a given sample

gamma_loglk <- function(par, time) {
   sum(
     dgamma(time, scale = par[1], shape = par[2], log = T)
     )
}

par is a vector with the parameter scale as the first element and shape as the second element
time is the observed failure times
gamma_loglk function can be evaluated for any given valid values of par and time

For given values of parameters, say $(α_{0} = 1, k_{0} = 2.)$ , the value of log-likelihood function can be obtained for the rat data

gamma_loglk(par = c(1, 2), time = rtime)

[1] -2175.531

For another set of values $(α_{0} = 100, k_{0} = 80)$ , the corresponding value of log-likelihood function

gamma_loglk(par = c(100, 80), time = rtime)

[1] -5392.711

`optim()` function

R function optim() is a general purpose optimization routine

optim(par, fn,  gr = NULL, method = "Nelder-Mead",
      lower = -Inf, upper = Inf, 
      control = list(), hessian  = FALSE, ...
      )

par $\to$ initial values for the parameters to be optimized
fn $\to$ a function to be minimized
gr $\to$ a function to return the gradient (score function)
method $\to$ a method to be used, e.g. “Nelder-Mead”, “BFGS”, “CG”, “L-BFGS-B”, etc.
lower and upper $\to$ lower and upper limit of the parameters to be optimized
control $\to$ list of arguments (e.g. fnscale, etc.) for controlling the iterations

Initial values

Initial values of the parameters can be selected from the exploratory plots of log-likelihood function
The log-likelihood function must provide finite values with the initial values of the parameters

As an example, assume the initial values $α_{0} = 12.9$ and $k_{0} = 8.8$ , and the corresponding value of log-likelihood function

gamma_loglk(par = c(12.9, 8.8), time = rtime)

[1] -100.48

For another set of initial values, $α_{0} = 1.0$ and $k_{0} = 1.0$

gamma_loglk(par = c(1.0, 1.0), time = rtime)

[1] -2269

Since both sets of initial values can provide finite values of log-likelihood function, we can use either of these two as an initial value for optimizing the log-likelihood function using the R function optim()

Example 4.2.1 using `optim()`

Fit of the gamma distribution to the rat data

gamma_fit <- optim(
    par = c(1, 1),  # initial values
    fn = gamma_loglk, # function name
    control = list(fnscale = -1), # for maximization
    hessian  = T, # will return hessian matrix
    time = rtime 
    )

List of objects in gamma_fit

names(gamma_fit)

[1] "par"         "value"       "counts"      "convergence" "message"    
[6] "hessian"

Converged?

gamma_fit$convergence

[1] 0

Estimate of scale and shape parameters

gamma_fit$par

[1] 12.896976  8.795312

$\hat{α} = 12.897$ and $\hat{k} = 8.795$

Estimated variance-variance matrix

cvar <- solve(-gamma_fit$hessian)
cvar

          [,1]       [,2]
[1,]  16.88167 -10.871356
[2,] -10.87136   7.416138

$SE (\hat{α}) = \sqrt{16.882} = 4.109$
$SE (\hat{k}) = \sqrt{7.416} = 2.723$

$95 %$ CI for $α$ and $k$

Using the sampling distribution of $\hat{α}$ and $\hat{k}$ $\begin{array}{r} \hat{α} \pm z_{1 - α / 2} SE (\hat{α}) \Rightarrow 4.84 \leq α \leq 20.95 \\ \hat{k} \pm z_{1 - α / 2} SE (\hat{k}) \Rightarrow 3.46 \leq α \leq 14.13 \end{array}$
Using the sampling distribution of $\log \hat{α}$ and $\log \hat{k}$ $\begin{array}{r} \hat{α} \exp (\pm z_{1 - α / 2} SE (\log \hat{α})) \Rightarrow 6.91 \leq α \leq 24.08 \\ \hat{k} \exp (\pm z_{1 - α / 2} SE (\hat{k})) \Rightarrow 4.79 \leq k \leq 16.14 \end{array}$

Quantiles

$p$ th quantile $\begin{aligned} S (t_{p}; α, k) = (1 - p) & \Rightarrow \frac{1}{Γ (k)} \int_{t_{p} / α}^{\infty} u^{k - 1} e^{- u} d u = (1 - p) \\ \Rightarrow (t_{p} / α) = Q (p, k) \\ \Rightarrow t_{p} = α Q (p, k) \end{aligned}$
- $Q (p, k)$ is the $p$ th quantile function of one-parameter gamma distribution [R function qgamma(p, scale=1, shape)]
Estimate of the median for the rat data is ${\hat{t}}_{.5} = \hat{α} Q (.5, \hat{k})$

12.893951 * qgamma(.5, scale = 1, shape = 8.799089)

[1] 109.1872

qgamma(.5, scale = 12.893951, shape = 8.799089)

[1] 109.1872

Likelihood ratio statistic

Likelihood ratio statistic $Λ (α, k) = 2 ℓ (\hat{α}, \hat{k}) - 2 ℓ (α, k)$
Approximate $(1 - p) 100 %$ confidence region for $(α, k)^{'}$ can be obtained from the set of points $(α, k)$ satisfying $Λ (α, k) \leq χ_{(2), 1 - p}^{2}$

95% confidence region of $(α, k)$ for rat survival data

LRT statistic based CI for $k$

Consider the following null hypothesis $H_{0} : k = k_{0}$
MLEs (unrestricted and under restriction) $\begin{array}{r} (\hat{α}, \hat{k}) = {a r g m a x}_{(α, k) \in Θ} ℓ (α, k) under H_{1} \\ \tilde{α} (k_{0}) = {a r g m a x}_{α \in Θ_{α}} ℓ (α, k_{0}) under H_{0} \end{array}$

Likelihood ratio statistic $\begin{array}{r} Λ_{k} (k_{0}) = 2 ℓ (\hat{α}, \hat{k}) - 2 ℓ (\tilde{α} (k_{0}), k_{0}) \end{array}$
- Under $H_{0} : k = k_{0}$ , $Λ_{k} (k_{0})$ approximately follows a $χ_{(1)}^{2}$ distribution
An approximate two-sided $(1 - p) 100 %$ confidence interval for $k$ can be obtained from a set of values of $k_{0}$ satisfying $\begin{array}{r} Λ_{k} (k_{0}) \leq χ_{(1), 1 - p}^{2} \end{array}$

$95% CI : 4.54 \leq k \leq 15.28$

LRT statistic based CI for $α$

Consider the following null hypothesis $H_{0} : α = α_{0}$
MLEs (unrestricted and under restriction) $\begin{array}{r} (\hat{α}, \hat{k}) = {a r g m a x}_{(α, k) \in Θ} ℓ (α, k) under H_{1} \\ \tilde{k} (α_{0}) = {a r g m a x}_{k \in Θ_{k}} ℓ (α_{0}, k) under H_{0} \end{array}$

Likelihood ratio statistic $\begin{array}{r} Λ_{α} (α_{0}) = 2 ℓ (\hat{α}, \hat{k}) - 2 ℓ (α_{0}, \tilde{k} (α_{0})) \end{array}$
- Under $H_{0} : α = α_{0}$ , $Λ_{α} (α_{0})$ approximately follows a $χ_{(1)}^{2}$ distribution
An approximate two-sided $(1 - p) 100 %$ confidence interval for $α$ can be obtained from a set of values of $α_{0}$ satisfying $\begin{array}{r} Λ_{α} (α_{0}) \leq χ_{(1), 1 - p}^{2} \end{array}$

$95% CI : 7.37 \leq α \leq 25.9$

Summary of $95 %$ CI

For $α$ $\begin{aligned} no-transformation : & 4.84 \leq α \leq 20.95 \\ log-transformation : & 6.91 \leq α \leq 24.08 \\ LRT : & 7.37 \leq α \leq 25.9 \end{aligned}$
For $k$ $\begin{aligned} no-transformation : & 3.46 \leq k \leq 14.13 \\ log-transformation : & 4.79 \leq k \leq 16.14 \\ LRT : & 4.54 \leq k \leq 15.28 \end{aligned}$

Censored

Data ${(t_{i}, δ_{i}), i = 1, \dots, n}$ and the corresponding log-likelihood function $\begin{aligned} ℓ (α, k) & = \log \prod_{i = 1}^{n} [\frac{1}{α Γ (k)} (\frac{t_{i}}{α})^{k - 1} e^{- t_{i} / α}]^{δ_{i}} [1 - I (k, t_{i} / α)]^{1 - δ_{i}} \\ = - k \log α - r \log Γ (k) + (k - 1) \sum_{i} δ_{i} \log t_{i} \\ - \sum_{i} δ_{i} t_{i} / α + \sum_{i} (1 - δ_{i}) \log [1 - I (k, t_{i} / α)] \end{aligned}$

No closed form solutions are available for MLEs $\hat{α}$ and $\hat{k}$
Algebraic expressions of score function and information matrix for the log-likelihood function $(???)$ are very complicated
Score functions and information matrix can be evaluated numerically
Optimization routines available in statistical software can be used to obtain MLEs $\hat{α}$ and $\hat{k}$ , and the corresponding SEs
- Different optimization algorithms such as Newton-Raphson, Nelder-Mead, etc. are available in such routines

As an example, we are going to use the same rat data that we have used for the analysis of complete data
For the rat data, assume a time $\geq 150$ weeks is considered as censored, there are 5 censored observations
- An R object rtimec is created, which has two columns time and status

rtimec |> t()

       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
time    152  152  115  109  137   88   94   77  160   165   125    40   128
status    0    0    1    1    1    1    1    1    0     0     1     1     1
       [,14] [,15] [,16] [,17] [,18] [,19] [,20]
time     123   136   101    62   153    83    69
status     1     1     1     1     0     1     1

gamma log-likelihood function for censored sample

gamma_loglk <- function(par, time, status = NULL) {
  #
  if (is.null(status)) status <- rep(1, length(time))
  #
  llk_f <- sum(status * dgamma(time, scale = par[1], 
                  shape = par[2], log = T))
  #
  llk_c <- sum((1 - status) * pgamma(time, scale = par[1], 
                     shape = par[2], lower.tail = F, 
                     log.p = T))
  return(llk_f + llk_c)
}

Censored sample

R codes to obtain MLE of parameters of a gamma distribution from a censored sample

gamma_out_c <- optim(
  par = c(1, 1), fn = gamma_loglk, 
  control = list(fnscale = -1), hessian  = T, 
  time = rtimec$time, status = rtimec$status
)

Maximum likelihood estimators [gamma_out_c$par] $\hat{α} = 21.3 and \hat{k} = 5.79$
Standard errors [solve(-gamma_out_c$hessian)] $s e (\hat{α}) = 8.61 and s e (\hat{k}) = 2.14$

$95% CI : 10.7 \leq α \leq 52.48$

$95% CI : 2.78 \leq k \leq 10.9$

$95 %$ CI for $α$ $\begin{aligned} no-transformation : & 4.44 \leq α \leq 38.17 \\ log-transformation : & 9.65 \leq α \leq 47.02 \\ LRT : & 10.7 \leq α \leq 52.48 \end{aligned}$
$95 %$ CI for $k$ $\begin{aligned} no-transformation : & 3.46 \leq k \leq 14.13 \\ log-transformation : & 4.79 \leq k \leq 16.14 \\ LRT : & 2.78 \leq k \leq 10.9 \end{aligned}$

Estimate of median ${\hat{t}}_{.5} = \hat{α} Q (.5, \hat{k}) = 116.4 weeks$

Acknowledgements

This lecture is adapted from materials created by Mahbub Latif

References

Sprott, David Arthur. 1980. “Maximum Likelihood in Small Samples: Estimation in the Presence of Nuisance Parameters.” Biometrika 67 (3): 515–23.