5 | 5 | 8 | 8 | 12 | 23 | 27 | 30 | 33 | 43 | 45 |
Chapter 3
(AST405) Lifetime data analysis
3 Some Nonparametric and Graphical Procedures
3.1 Introduction
Graphs and simple data summaries are important for both description and analysis of data.
They are closely related to nonparametric estimates of distributional characteristics; many graphs are just plots of some estimate.
This chapter introduces nonparametric estimation and procedures for portraying univariate lifetime data.
Tools such as frequency tables and histograms, empirical distribution functions, probability plots, and data density plots are familiar across different branches of statistics.
For lifetime data, the presence of censoring makes it necessary to modify the standard methods.
To illustrate, let us consider one of the most elementary procedures in statistics, the formation of a relative-frequency table.
Suppose we have a complete (i,e., uncensored) sample of
lifetimes from some population.Divide the time axis
into intervals , where ; with being the upper limit on observation.Let
be the observed number of lifetimes that lie in .A frequency table is just a list of the intervals and their associated frequencies,
, or relative fequencies, .A relative-frequency histogram, consisting of rectangles with bases on
) and areas ; is often drawn to portray this.
When data are censored, however, it is generally not possible to form the frequency table, because if a lifetime is censored, we do not know which interval,
, it lies in. As a result, we cannot determine the .Section 3.6 describes how to deal with frequency tables when data are censored; this is referred to as life table methodology.
First, however, we develop methods for ungrouped data.
-
Section 3.2 discusses nonparametric estimation of distribution, survivor, or cumulative hazard functions under right censoring.
- This also forms the basis for descriptive and diagnostic plots, which are presented in Section 3.3.
Sections 3.4 and 3.5 deal with the estimation of hazard functions and with nonparametric estimation from some other types of incomplete data.
3.2 Non-parametric Estimation of a Survivor Function and Quantiles
Recall: Parametric estimation of survivor function
This method assumes a parametric model (e.g., exponential distribution) of the data and we estimate the parameter first, then form the estimator of the survival function. In Parametric approach, we assume that we model the distribution as an exponential distribution with unknown parameter
Non-parametric estimation of a survivor function
- As an example, consider the following sample of
complete observations
-
Empirical survivor function (ESF) for a specific value
is defined as is a step function that decreases by just after each observed lifetime if all observations are distinctGenerally, the ESF drops by
just past if lifetimes equal to
For a specific value
, ESF can also be defined as
Acute myeloid leukemia (AML)
-
AML patients who reached a remission status after the treatment of chemotherapy were randomly assigned to one of the two treatments
maintenance chemotherapy
no-maintenance chemotherapy (control group)
-
Time of interest: Length of remission (in weeks)
maintained: 13,
, 9, , 18, , 31, 23, 34, , 48control: 5, 8, 12, 5, 30, 33, 8,
, 23, 27, 43, 45
Does maintenance chemotherapy prolong the time until relapse?
- Estimate the survival function for the following sample of 11 complete observations of control group (
)
- Estimates of survival function for
- Find
or
- Find
Sorted lifetimes
- Estimated survivor function
0 | 11 | 1.000 |
5 | 9 | 0.818 |
8 | 7 | 0.636 |
12 | 6 | 0.545 |
23 | 5 | 0.455 |
27 | 4 | 0.364 |
30 | 3 | 0.273 |
33 | 2 | 0.182 |
43 | 1 | 0.091 |
45 | 0 | 0.000 |
Nonparametric estimate of survivor function (Empirical Survivor Function - ESF)
Exercise
The following are life times of 21 lung cancer patients receiving control treatment (with no censoring):
Draw the ESF
How would we estimate
, the probability that an individual survives to time 10 or later?
Let’s get back to the AML example:
Sorted lifetimes:
0 | 11 | 0 | 1.000 |
5 | 11 | 2 | 0.818 |
8 | 9 | 2 | 0.636 |
12 | 7 | 1 | 0.545 |
23 | 6 | 1 | 0.455 |
27 | 5 | 1 | 0.364 |
30 | 4 | 1 | 0.273 |
33 | 3 | 1 | 0.182 |
43 | 2 | 1 | 0.091 |
45 | 1 | 1 | 0.000 |
0 | 11 | 0 | 1.000 | 1.000 |
5 | 11 | 2 | 0.818 | 0.818 |
8 | 9 | 2 | 0.778 | 0.636 |
12 | 7 | 1 | 0.857 | 0.545 |
23 | 6 | 1 | 0.833 | 0.455 |
27 | 5 | 1 | 0.800 | 0.364 |
30 | 4 | 1 | 0.750 | 0.273 |
33 | 3 | 1 | 0.667 | 0.182 |
43 | 2 | 1 | 0.500 | 0.091 |
45 | 1 | 1 | 0.000 | 0.000 |
Relationship between
0 | 11 | 0 | 1.000 | 1.000 = | 1.000 |
5 | 11 | 2 | 0.818 | 1.000*0.818 = | 0.818 |
8 | 9 | 2 | 0.778 | 1.0000.8180.778 = | 0.636 |
12 | 7 | 1 | 0.857 | ’’ | 0.545 |
23 | 6 | 1 | 0.833 | ’’ | 0.455 |
27 | 5 | 1 | 0.800 | ’’ | 0.364 |
30 | 4 | 1 | 0.750 | ’’ | 0.273 |
33 | 3 | 1 | 0.667 | ’’ | 0.182 |
43 | 2 | 1 | 0.500 | ’’ | 0.091 |
45 | 1 | 1 | 0.000 | ’’ | 0.000 |
- Sorted unique lifetimes
- Sorted unique lifetimes
0 | 11 | 0 | 1.000 | 1.000 = | 1.000 | [0, 5) |
5 | 11 | 2 | 0.818 | 1.000*0.818 = | 0.818 | [5, 8) |
8 | 9 | 2 | 0.778 | 1.0000.8180.778 = | 0.636 | [8, 12) |
12 | 7 | 1 | 0.857 | ’’ | 0.545 | [12, 23) |
23 | 6 | 1 | 0.833 | ’’ | 0.455 | [23, 27) |
27 | 5 | 1 | 0.800 | ’’ | 0.364 | [27, 30) |
30 | 4 | 1 | 0.750 | ’’ | 0.273 | [30, 33) |
33 | 3 | 1 | 0.667 | ’’ | 0.182 | [33, 43) |
43 | 2 | 1 | 0.500 | ’’ | 0.091 | [43, 45) |
45 | 1 | 1 | 0.000 | ’’ | 0.000 | [45, Inf) |
Notations:
Observed times:
Ordered observed unique time points:
Intervals
-
Intervals are constructed so that each of which starts at an observed lifetime and ends just before the next observed lifetime
- E.g.
- E.g.
- Sorted unique lifetimes
-
Expressing
in terms of This method is known as Kaplan-Meier or Product-limit estimator of survivor function.- We saw that this method is equivalent to the ESF approach:
- We saw that this method is equivalent to the ESF approach:
But the advantage of Kaplan-Meier method is that it can handle censored observations too.
Censored sample
If we had censored data, then?
For the control group of AML example, now include the censored observation
- Censored sample:
- Sorted censored sample
5 | 12 | 2 |
8 | 10 | 2 |
12 | 8 | 1 |
16 | 7 | 0 |
23 | 6 | 1 |
27 | 5 | 1 |
30 | 4 | 1 |
33 | 3 | 1 |
43 | 2 | 1 |
45 | 1 | 1 |
5 | 12 | 2 | 0.833 |
8 | 10 | 2 | 0.800 |
12 | 8 | 1 | 0.875 |
16 | 7 | 0 | 1.000 |
23 | 6 | 1 | 0.833 |
27 | 5 | 1 | 0.800 |
30 | 4 | 1 | 0.750 |
33 | 3 | 1 | 0.667 |
43 | 2 | 1 | 0.500 |
45 | 1 | 1 | 0.000 |
5 | 12 | 2 | 0.833 | 0.833 |
8 | 10 | 2 | 0.800 | 0.667 |
12 | 8 | 1 | 0.875 | 0.583 |
16 | 7 | 0 | 1.000 | 0.583 |
23 | 6 | 1 | 0.833 | 0.486 |
27 | 5 | 1 | 0.800 | 0.389 |
30 | 4 | 1 | 0.750 | 0.292 |
33 | 3 | 1 | 0.667 | 0.194 |
43 | 2 | 1 | 0.500 | 0.097 |
45 | 1 | 1 | 0.000 | 0.000 |
Kaplan-Meier estimator
Kaplan-Meier estimator
Let
be a censored random sample of lifetimesSuppose that there are
distinct lifetimes at which deaths (event) occurs
-
Define for
time no. of deaths observed at no. of individuals at risk at time , i.e. number of individuals alive an uncensored just prior time
A non-parametric estimator of survivor function
It is known as Kaplan-Meier (KM) or Product-limit (PL) estimator of survivor function (Kaplan and Meier 1958)
Similarly
The paper was published in the Journal of American Statistical Association in 1958
Number of citations 66,345 (Google Scholar, 02 November 2024)
Edward L Kaplan (1920–2006)
Paul Meier (1924–2011)
Kaplan-Meier estimator as an MLE
PL estimator as an MLE
Assume
have a discrete distribution with survivor function and hazard functionWithout loss of generality, assume
-
The general expression of likelihood function (from Eq. 2.2.12)
lifetime of the individual
-
Since
number of observed lifetimes equal to , i.e. number of observed deaths at number of subjects at risk (alive and uncensored) at time
The parameters of the lifetime distribution
The likelihood function
The log-likelihood function
The MLE of
The score function evaluated at
- In general
The mle of
If
(which would happen if the largest observed lifetime is a censored observation) then and undefined beyond ,If
then and for all
Standard error of
- The
diagonal element of the information matrix
Using the assumption
Off diagonal elements of
are zero
- The asymptotic variance of
Standard error of the PL estimator
-
Variance of
- Using the delta method
is obtained from
- Using the delta method
- Using the delta method
- This formula of variance of PL estimator is known as the Greenwood’s formula
- Censored sample:
5 | 12 | 2 | 0.833 |
8 | 10 | 2 | 0.667 |
12 | 8 | 1 | 0.583 |
16 | 7 | 0 | 0.583 |
23 | 6 | 1 | 0.486 |
27 | 5 | 1 | 0.389 |
30 | 4 | 1 | 0.292 |
33 | 3 | 1 | 0.194 |
43 | 2 | 1 | 0.097 |
45 | 1 | 1 | 0.000 |
5 | 12 | 2 | 0.833 | 0.017 |
8 | 10 | 2 | 0.667 | 0.025 |
12 | 8 | 1 | 0.583 | 0.018 |
16 | 7 | 0 | 0.583 | 0.000 |
23 | 6 | 1 | 0.486 | 0.033 |
27 | 5 | 1 | 0.389 | 0.050 |
30 | 4 | 1 | 0.292 | 0.083 |
33 | 3 | 1 | 0.194 | 0.167 |
43 | 2 | 1 | 0.097 | 0.500 |
45 | 1 | 1 | 0.000 | Inf |
5 | 12 | 2 | 0.833 | 0.017 | 0.017 |
8 | 10 | 2 | 0.667 | 0.025 | 0.042 |
12 | 8 | 1 | 0.583 | 0.018 | 0.060 |
16 | 7 | 0 | 0.583 | 0.000 | 0.060 |
23 | 6 | 1 | 0.486 | 0.033 | 0.093 |
27 | 5 | 1 | 0.389 | 0.050 | 0.143 |
30 | 4 | 1 | 0.292 | 0.083 | 0.226 |
33 | 3 | 1 | 0.194 | 0.167 | 0.393 |
43 | 2 | 1 | 0.097 | 0.500 | 0.893 |
45 | 1 | 1 | 0.000 | Inf | Inf |
5 | 12 | 2 | 0.833 | 0.017 | 0.017 | 0.012 |
8 | 10 | 2 | 0.667 | 0.025 | 0.042 | 0.019 |
12 | 8 | 1 | 0.583 | 0.018 | 0.060 | 0.020 |
16 | 7 | 0 | 0.583 | 0.000 | 0.060 | 0.020 |
23 | 6 | 1 | 0.486 | 0.033 | 0.093 | 0.022 |
27 | 5 | 1 | 0.389 | 0.050 | 0.143 | 0.022 |
30 | 4 | 1 | 0.292 | 0.083 | 0.226 | 0.019 |
33 | 3 | 1 | 0.194 | 0.167 | 0.393 | 0.015 |
43 | 2 | 1 | 0.097 | 0.500 | 0.893 | 0.008 |
45 | 1 | 1 | 0.000 | Inf | Inf | NaN |
survival
package in R
Call: survfit(formula = Surv(time, status) ~ 1, data = dat)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
5 12 2 0.833 0.108 0.647 1.00
8 10 2 0.667 0.136 0.447 0.99
12 8 1 0.583 0.142 0.362 0.94
23 6 1 0.486 0.148 0.268 0.88
27 5 1 0.389 0.147 0.185 0.82
30 4 1 0.292 0.139 0.115 0.74
33 3 1 0.194 0.122 0.057 0.66
43 2 1 0.097 0.092 0.015 0.62
45 1 1 0.000 NaN NA NA
survminer::ggsurvplot(surv_model, data = dat, surv.median.line = "hv", conf.int = FALSE)
Nelson-Aalen estimator
Estimator of
-
Cumulative hazard function
-
increment of cumulative hazard function over
-
Nelson-Aalen estimator
-
The following estimator of cumulative hazard function is known as Nelson-Aalen (NA) estimator (Nelson 1969; Aalen 1975)
In the notations used for Kaplan-Meier, NA estimator looks like
- The variance of
can be obtained from the information matrix derived for the non-parametric likelihood function
- Censored sample
5 | 12 | 2 |
8 | 10 | 2 |
12 | 8 | 1 |
16 | 7 | 0 |
23 | 6 | 1 |
27 | 5 | 1 |
30 | 4 | 1 |
33 | 3 | 1 |
43 | 2 | 1 |
45 | 1 | 1 |
5 | 12 | 2 | 0.167 | 0.167 |
8 | 10 | 2 | 0.200 | 0.367 |
12 | 8 | 1 | 0.125 | 0.492 |
16 | 7 | 0 | 0.000 | 0.492 |
23 | 6 | 1 | 0.167 | 0.658 |
27 | 5 | 1 | 0.200 | 0.858 |
30 | 4 | 1 | 0.250 | 1.108 |
33 | 3 | 1 | 0.333 | 1.442 |
43 | 2 | 1 | 0.500 | 1.942 |
45 | 1 | 1 | 1.000 | 2.942 |
5 | 12 | 2 | 0.167 | 0.167 | 0.108 |
8 | 10 | 2 | 0.200 | 0.367 | 0.166 |
12 | 8 | 1 | 0.125 | 0.492 | 0.203 |
16 | 7 | 0 | 0.000 | 0.492 | 0.203 |
23 | 6 | 1 | 0.167 | 0.658 | 0.254 |
27 | 5 | 1 | 0.200 | 0.858 | 0.310 |
30 | 4 | 1 | 0.250 | 1.108 | 0.379 |
33 | 3 | 1 | 0.333 | 1.442 | 0.466 |
43 | 2 | 1 | 0.500 | 1.942 | 0.585 |
45 | 1 | 1 | 1.000 | 2.942 | 0.585 |
Both
and are nonparametric m.l.e.’s, and are connected by the relationship between survivor and cumulative hazard functionNote
and are discrete and don’t satisfy the relationship , which is true for the continuous distributions
CIs for survival probabilities
-
Nonparametric methods can also be used to construct confidence intervals for different lifetime distribution characteristics, such as
Survival probabilities
Quantiles
The methods of constructing confidence intervals are based on the following property of MLE
Plain CI
-
The PL estimator
is an MLE of-
Greenwood’s variance estimator
-
A pivotal quantity can be defined as
-
The
% confidence interval for can be obtained from the following expression th quantile of the standard normal distribution, i.e.
% confidence interval for
5 | 12 | 2 | 0.833 | 0.012 |
8 | 10 | 2 | 0.667 | 0.019 |
12 | 8 | 1 | 0.583 | 0.020 |
16 | 7 | 0 | 0.583 | 0.020 |
23 | 6 | 1 | 0.486 | 0.022 |
27 | 5 | 1 | 0.389 | 0.022 |
30 | 4 | 1 | 0.292 | 0.019 |
33 | 3 | 1 | 0.194 | 0.015 |
43 | 2 | 1 | 0.097 | 0.008 |
45 | 1 | 1 | 0.000 | NaN |
- Find the 95% confidence interval of
- 95% confidence interval of
-based confidence interval-
Limitations
When the number of uncensored lifetimes is small or when
is close to 0 or 1, the distribution of may not be well approximated byThe expression
may contain values outside of the interval
CI using transformation
Consider a function of
that takes values onExamples of the function
, for
-
MLE of
-
PL estimate of
-
-
Asymptotic variance of
-
We can define a pivotal quantity based on the distribution of the sampling distribution of
Compare to
, is closer to standard normal distributionConfidence intervals based on
are better performing compared to that of
Using the distribution of
, confidence interval of can be obtained in two steps-
Obtain the
% CI of Using inverse transformation, obtain the CI of
from that of
-
Using inverse transformation, obtain the CI of
from that of-
inverse function of
-
Inverse functions
Log function
Logit function
Log-log function
95% CI of
is
8 | 10 | 2 | 0.667 | 0.019 |
12 | 8 | 1 | 0.583 | 0.020 |
16 | 7 | 0 | 0.583 | 0.020 |
- 95% CI of
is
-
95% CI of
can be obtained as
-
95% CI of
Using the distribution of
Using the distribution of
Homework
-
Obtain the 95% CI of
using the following transformations
Bootstrap CI
Nonparametric bootstrap methods can be used to obtain the sampling distributions of pivotal quantities
and - and -quantile of the sampling distribution constitutes a CI of
Steps for obtaining bootstrap CIs
Observed data
, and and are MLE of and corresponding SEGenerate a bootstrap sample
by sampling with replacement fromObtain PL estimate
and the corresponding SE from the bootstrap sampleCompute pivotal quantity
Repeat the steps 1–3 for
number of times to obtain
The
-quantile of is estimated by , where is an integer and is -smallest value amongThe
CI of , where
- E.g. For 95% CI,
and
Homework
- Obtain bootstrap confidence interval for
Example: Remission data
The following data are on lengths of remission for two groups (placebo and 6-MP) of leukemia patients
Objective was to examine whether the drug 6-MP is more effective than placebo
SE | lower | upper | ||||
---|---|---|---|---|---|---|
6 | 21 | 3 | 0.857 | 0.076 | 0.707 | 1.000 |
7 | 17 | 1 | 0.807 | 0.087 | 0.636 | 0.977 |
10 | 15 | 1 | 0.753 | 0.096 | 0.564 | 0.942 |
13 | 12 | 1 | 0.690 | 0.107 | 0.481 | 0.900 |
16 | 11 | 1 | 0.627 | 0.114 | 0.404 | 0.851 |
22 | 7 | 1 | 0.538 | 0.128 | 0.286 | 0.789 |
23 | 6 | 1 | 0.448 | 0.135 | 0.184 | 0.712 |
SE | lower | upper | ||||
---|---|---|---|---|---|---|
1 | 21 | 2 | 0.905 | 0.064 | 0.779 | 1.000 |
2 | 19 | 2 | 0.810 | 0.086 | 0.642 | 0.977 |
3 | 17 | 1 | 0.762 | 0.093 | 0.580 | 0.944 |
4 | 16 | 2 | 0.667 | 0.103 | 0.465 | 0.868 |
5 | 14 | 2 | 0.571 | 0.108 | 0.360 | 0.783 |
8 | 12 | 4 | 0.381 | 0.106 | 0.173 | 0.589 |
11 | 8 | 2 | 0.286 | 0.099 | 0.092 | 0.479 |
12 | 6 | 2 | 0.190 | 0.086 | 0.023 | 0.358 |
15 | 4 | 1 | 0.143 | 0.076 | 0.000 | 0.293 |
17 | 3 | 1 | 0.095 | 0.064 | 0.000 | 0.221 |
22 | 2 | 1 | 0.048 | 0.046 | 0.000 | 0.139 |
23 | 1 | 1 | 0.000 | NaN | NaN | NaN |
CIs for quantiles
For lifetime distribution, the quantiles
are of more interest than mean of the distribution, e.g., the median is used as the measure of location for lifetime distributionMedian has some advantages over mean as a measure of location for lifetime distributions
Median always exist (provided
) and it is easier to estimate when data are censored
Estimates of quantiles
Nonparametric estimates of
can be obtained from the PL estimatesFor a step function
, the corresponding inverse function is not uniquely definedThe estimate
could be either an intervals of times ( ’s) or a specific value of times ( ’s) depending on the point at which the line intersect the step function
prob | 6-MP | placebo |
---|---|---|
0.25 | 13 | 4 |
0.50 | 23 | 8 |
0.75 | NA | 12 |
Confidence intervals of quantiles
-
confidence interval of can be obtained by inverting the corresponding confidence interval of survival function
-
The lower limit
of confidence interval is defined as- Similarly
- Similarly
prob | 6-MP | placebo |
---|---|---|
0.25 | ||
0.50 | NA | |
0.75 | NA |
- Confidence intervals for second and third quartiles cannot be estimated from this data set for the “6-MP” group because
Standard error of
-
The expression of
can be obtained from the sampling distribution of using delta method-
is obtained using Greenwood’s formula
-
Confidence interval of
-
CI for where
3.3 Descriptive and diagnostic plots
Plots of PL
or Nelson-Aalen estimates can be used to provide good description of univariate lifetime dataThese estimates are useful to assess the appropriateness of a parametric model
Plots of survivor functions
Data:
PL estimate of survivor functionAssume lifetimes follow a distribution with survivor function
and is the corresponding distribution function, where is the parameter vector, e.g. for exponential distribution
- Let
is an estimate of and is the estimate of survivor function , e.g. for exponential distribution
If the model assumption (i.e. lifetimes follow a distribution with survivor function
) is appropriate then should not be very far fromA comparison between
and can be used as a model assessment toolA plot of
and on the same graph can be used to compare graphically
Example 3.3.1: Ball bearing data
17.88 | 41.52 | 48.40 | 54.12 | 68.64 | 84.12 | 105.12 | 128.04 |
28.92 | 42.12 | 51.84 | 55.56 | 68.64 | 93.12 | 105.84 | 173.40 |
33.00 | 45.60 | 51.96 | 67.80 | 68.88 | 98.64 | 127.92 | NA |
-
Assume Weibull and log-normal models for analyzing ball bearing data and want to assess which of these two models is appropriate for the data
Weibull model
Log-normal model
Estimated survivor functions
Weibull model
Log-normal model
Figure 3.2 shows that there is a good agreement between nonparametric PL estimates
and both the Weibull and log-normal modelsEither Weibull or log-normal can be assumed to analyze ball bearing data
Probability plots
Let
are distinct failure times, and and are corresponding parametric and non-parametric estimates of survivor functionProbability-probability (P-P) plot is defined as the scatter plot of
If the assumed parametric model
is appropriate then all the points in the resulting scatter plot should lie around a straight line with slope one
time | PL | Weibull | Log-normal |
---|---|---|---|
17.88 | 0.978 | 0.960 | 0.992 |
28.92 | 0.935 | 0.894 | 0.934 |
33.00 | 0.891 | 0.862 | 0.895 |
41.52 | 0.848 | 0.787 | 0.792 |
42.12 | 0.804 | 0.781 | 0.784 |
45.60 | 0.761 | 0.747 | 0.737 |
48.40 | 0.717 | 0.718 | 0.698 |
51.84 | 0.674 | 0.682 | 0.651 |
51.96 | 0.630 | 0.681 | 0.649 |
54.12 | 0.587 | 0.658 | 0.620 |
Quantile-Quantile (Q-Q) plot
Quantile-quantile (Q-Q) plot compares observed quantiles and the corresponding quantiles estimated from the assumed parametric model
Observed quantiles are observed distinct failure time
corresponding to the PL estimates
are the quantiles corresponding to obtained from the modelQ-Q plot is defined as the scatter plot of
versus
Linearization method
This method is based on linearizing the survivor or distribution function of the assumed parametric model
There exists functions
and such that is a linear function ofIf the parametric model
is appropriate, the plot of versus should be roughly linear, where is the PL estimateThis method does not require to estimate the parameter
-
Exponential distribution
is a linear function of , so a plot of versus should be linear through the origin if the exponential model is appropriateA graphical estimate of
can be obtained when the plot is roughly linear by fitting a straight line through the points
-
Weibull distribution
A plot of
versus should roughly linear if a Weibull model is appropriateWhen the plot is approximately linear, graphical estimates of
and can be obtained by intercept and slope of the fitted straight line through the points
Location-scale family
In general, linearization method of assessing the appropriateness of the assumed parametric model can be defined for location-scale family of models
Let transformed lifetime
has a location-scale distribution, e.g.
- Survivor function of
can be defined as-
, ,
-
It can be shown that
is a linear ofA plot of
versus should be roughly linear if the assumed family of models is appropriate.
- Expressions of
and for log-location-scale family of models - For all these three distributions,
- In general, for distinct failure times, a plot of points
can be used to assess whether the assumed location-scale model is appropriate
Graphical estimates of
and can be obtained from the lines fitted to the points, where and are estimated by the slope and -intercept, respectivelyGraphical estimates can be used as initial values of the optimization routines that are used for estimating model parameters
Since these plots are subject to sampling variations, these are often used for informal model assessment
Hazard plots
-
The plotting procedures described for survivor functions
can also be described for cumulative hazards functionSurvivor function of Weibull distribution
Cumulative hazard function of Weibull distribution
-
An alternative of plotting
versus would be to plot versus can be used to assess the appropriateness of the Weibull model for the data at hand PL estimate Nelson-Aalen estimate
The plots based on survivor function and cumulative hazard function could differ slightly, specially for a large
, because for discrete data
Acknowledgements
This lecture is adapted from materials created by Mahbub Latif