Chapter 2
(AST405) Lifetime data analysis
2 Observed Schemes, Censoring, and Likelihood
2.1 Introduction
Preliminary discussion of likelihood;
Suppose that the probability distribution of potentially observable data in a study specified up to the parameter vector
-
A likelihood function for
is a function of and is proportional to the probability of data that were observedData
observed data probability density or mass function from which the data are assumed to arise is a more formal notation for likelihood function
Asymptotic results and large sample methods
Assume that the data consist of a random sample
from a distribution with probability density function a vector of unknown parameters, ’s can be vectors, but for simplicity, they are considered scalarsThe likelihood function for
If
’s are independent but not identically distributed then is replaced by in the definition of likelihood function
-
Let
be a point in at which is maximized maximum likelihood estimator (m.l.e.) ofIn most simple settings
exists and unique
It is often convenient to work with the log-likelihood function, which is also maximized at
-
The m.l.e.
can be found by solving score or score function score vector of order
-
Score vector
is asymptotically ( -variate) normally distributed with mean and variance-covariance matrix with the entry of Fisher or expected information matrix observed information matrix
-
Under mild regularity conditions
is a consistent estimator of is a consistent estimator of
Optimization methods for maximum likelihood
- Maximum likelihood estimator
correspond to the maximum of the log-likelihood function , i.e.
There are different numerical approaches for optimizing the multiparameter log-likelihood function
forMany likelihood functions have a unique maximum at
, which is a stationary point satisfyingNumerical approaches involve a starting point
and an iterative procedure designed to give a sequence of points converging to (i.e. when )
-
Three types of optimization methods are available in statistical software
Methods that do not use derivatives (e.g., simplex algorithm, such as Nelder-Mead method)
Methods that use only first derivates
(e.g. steepest ascent, quasi-Newton, and conjugate gradient method)-
Methods that use both first derivative
and second derivative (e.g. Newton-Raphson method)-
is known as Hessian matrix and observed information matrix
-
-
Newton-Raphson method is commonly used for optimizing
, which is based on the iteration scheme for the step as-
value of at the iteration
-
- Using Taylor series expansion, expanding
at- Then
- Then
In many situations, finding the maximum likelihood estimator may be challenging, e.g. it may be on the boundary of
Likelihood function may also possess multiple stationary points, and optimization techniques are designed to obtain local maxima, so it may not converge to global maxima
It is important to understand the shape of
before applying an optimization method
Example 2.1.1
Suppose that the lifetimes of individuals in some population follow a distribution with density function
and distribution function , and that the lifetimes for a random sample of individuals are observed.In format of Eq. 2.1.1, Data =
and
Parametric approach
Assume that
has a specific parametric formLikelihood function
The maximum likelihood estimator
can be obtained by maximizing or and consequently an estimate of , the distribution functionFor example, if
then and
Nonparametric approach
Assume
is discrete with unspecified probabilities at the jump points-
The model parameters are
and the likelihood function is- Restrictions:
- Restrictions:
-
The maximum likelihood estimators can be obtained by maximizing the corresponding likelihood function
-
is an indicator function
-
-
Estimate of the distribution function
-
empirical distribution funciton
-
Likelihood for a truncated sample
-
Suppose that
are not from an unrestricted random sample of individuals, but a random sample of those with lifetimes one year or less- No information is available for those whose lifetimes are greater than one year
The likelihood function for this truncated sample is given by
rather than (2.1.2)
Likelihood based inferences
Score test
Score vector
asymptotically follows -variate normal distribution with mean vector and variance-covariance matrixUnder the null hypothesis
asymptotically follows distribution.The statistic
can also be used to obtain confidence intervals for
Wald test
The m.l.e.
follows a -dimensional normal distribution with mean and variance-covariance matrixIn other words,
follows a -dimensional normal distribution with mean and variance-covariance matrixUnder
asymptotically follows .Since
is a consistent estimator of , we can replace by in the test statistic.
Likelihood ratio test
- Under
asymptotically follows
2.2 Right censoring and maximum likelihood
Right censoring and maximum likelihood
For right censored data, only the lower bounds on lifetime are available for some individuals
Right censored lifetimes are observed for various reasons, such as termination of the study, lost-to-follow-up, etc.
Contribution to the likelihood function would be different for right censored and complete lifetimes
Construction of the likelihood function could differ for different types of censoring, such as left censoring, interval censoring, etc.
-
Let the random variables
represent the lifetimes of individuals- Let
be the corresponding right censoring times
- Let
-
For the
individual, we observe is a sample realization of is known as censoring or status indicator, i.e.
-
There are three types of right censoring mechanism
Type I censoring
Independent random censoring
Type II censoring
Type I censoring
In Type I censoring, potential censoring time
is assumed to be fixed for each individual-
Type I censoring often arises when a study is conducted over a specified period of time
- For example, if termination of life test on electrical insulation specimens after 180 minutes, then
- For example, if termination of life test on electrical insulation specimens after 180 minutes, then
- Likelihood function for the observed Type I censored sample
. . .
Likelihood function for the observed Type I censored sample
We can show that
-
Likelihood function for the observed Type I censored sample
- If
is continuous at , then
- If
Example 2.2.1
Suppose that lifetimes
are independent and follow an exponential distribution with the p.d.f. .Let
be a random sample (right censored, Type I) from the exponential distribution.Obtain the expression of the likelihood function for the given sample.
Given
-
The likelihood function
Independent Random Censoring
Censoring time
is assumed to be continuous random variable with survivor function and density functionLifetime
is also continuous random variable with survivor function and density function-
Assumptions:
and are independent does not depend on any of the parameters of
- Likelihood function for the observed independent random censored sample
- We can show
- We can show
Likelihood function for the observed independent random censored sample
Since
and don’t involve any parameters of
Type II censoring
-
In Type II censoring, lifetest starts with
units and it stops when number of failures are observedSo
smallest lifetimes in a random sample of are observed a specified integer that lies between 1 andThe remaining
units are considered as censored at the time
For Type II censoring, the likelihood function is the probability of observing
smallest lifetimes out of lifetimesThis expression is similar to the expression obtained for Type I and random independent censoring with all the censoring times equal to
Example 2.2.2
Let
be a Type II random sample of lifetimes , where follows an exponential distribution with rate-
The likelihood function
A general formulation of right censoring
The censoring process is often not any of the types discussed so far, and may be sufficiently complicated to make modeling it impossible.
For example, a decision to terminate a life test or clinical trial at time
, or to withdraw certain individuals, might be based on failure information prior to time ,Fortunately it can be shown that under rather general conditions the observed likelihood is of the form (2.2.3) and can be used in the normal way to make inferences about the lifetime distribution under study.
Read Section 2.2.2 of the textbook for details.
A Hypothetical Study
A hypothetical study
A small prospective study was run, where 10 participants were recruited to follow
The event of interest was the development of myocardial infarction (MI, or heart attack) over a period of 10 years (follow-up period)
Participants are recruited into the study over a period of two years and were then followed for up to 10 years
Study in calender years
{S2, S3, S5}
experienced MI{S4, S7}
dropped out{S10}
died from other causes{S1, S6, S8, S9}
completed 10-year follow-up without MI
Study in years
Times for the subjects who did not experience MI by the end of 10-year follow-up or dropped-out or died from causes not related to MI are known as censored times
Time-to-MI (time to the event of interest) are known as failure time (or survival time)
For survival data, the pair (time, status),
, is considered as the responseA sample of survival data can also be expressed as following, where
sign indicates censored observations
2.3 Other type of incomplete data
Acknowledgements
This lecture is adapted from materials created by Mahbub Latif