Chapter 4

(AST405) Lifetime data analysis

Author

Md Rasel Biswas

4 Inference procedures for parametric models

Introduction

  • Likelihood methods for lifetime data were introduced in Chapter 2, which includes derivation of likelihood function for different types of censored data

    • Maximum likelihood estimator

    • Inference about parameters (hypothesis testing and confidence intervals)

Likelihood function (Complete data)

  • Let t1,,tn be a random sample from a distribution f(t;θ), where θ is a p-dimensional vector of parameters

  • Likelihood and log-likelihood function L(θ)=i=1nf(ti;θ)(θ)=logL(θ)=i=1nlogf(ti;θ)

  • Maximum likelihood estimator (MLE) θ^=argmaxθΘ(θ)

Statistical inference (Complete data)

  • Large sample property of MLE θ^N(θ,[I(θ)]1)

    • Fisher expected information matrix I(θ)=E[2(θ)θθ]

    • Observed information matrix I(θ^)=2(θ)θθ|θ=θ^

Likelihood function (Type I or random censoring)

  • Data: {(ti,δi),i=1,,n}

    • ti is a sample realization of T~i=min(Ti,Ci)

    • lifetime Ti follows a distribution with pdf f(ti;θ) and the corresponding survivor function S(ti;θ)

    • censoring time Ci could be random or fixed depending on the censoring mechanism

    • δi=I(TiCi), censoring indicator


  • Data for type I or random censoring {(ti,δi),i=1,,n}

  • Likelihood function L(θ)=i=1n[f(ti;θ]δi[S(ti;θ)]1δi=i=1n[S(ti;θ)h(ti;θ)]δi[S(ti;θ)]1δi=i=1n[S(ti;θ][h(ti;θ)]δi

Likelihood function (Type II censoring)

  • Let lifetime Ti (i=1,,n) follows a distribution with pdf f(ti;θ) and the corresponding survivor function S(ti;θ)

  • The experiment was terminated after observing r smallest lifetimes
    t(1)t(r)

  • The remaining (nr) observations are considered as censored at t(r)

  • Likelihood function L(θ)=[i=1rf(t(i);θ)][S(t(r);θ)]nr

Statistical inference (censored sample)

  • For censored samples, the result “asymptotic distribution of MLE is normal” is still valid

  • The expression of the Fisher expected information matrix I(θ) is complex for censored data, observed information matrix I(θ^) is used in making inference with censored sample

4.1 Exponential distribution

  • The exponential distribution is the first lifetime model for which statistical methodology were extensively developed

  • Exact tests can be developed for exponential distribution for certain type of censoring mechanism

  • Exponential distribution assume constant hazard and its use is limited for analyzing real life problems


  • Probability density function f(t;θ)=(1/θ)exp(t/θ)t0,θ>0

  • Hazard function h(t;θ)=(1/θ)

  • Survivor function S(t;θ)=exp(t/θ)

  • pth quantile S(tp;θ)=1pexp(tp;θ)=1ptp=θlog(1p)


Homework

  • Estimation and related inference of exponential distribution when the sample has no censored observations

Type I or random censoring

  • Lifetime TExp(θ), θ>0

  • Data: {(ti,δi),i=1,,n}

  • Likelihood function L(θ)=i=1n[f(ti;θ)]δi[S(ti;θ)]1δi=i=1nS(ti;θ)[h(ti;θ)]δi=i=1nexp(ti/θ)(1/θ)δi


  • Likelihood function L(θ)=i=1nexp(ti/θ)(1/θ)δi

  • Log-likelihood function (θ)=i=1n[(ti/θ)δilog(θ)]=1θitirlog(θ)

    • r=iδi the number of failures observed in the sample

  • Log-likelihood function (θ)=1θitirlog(θ)

  • MLE (θ)θ|θ=θ^=01θ^2itirθ^=0 θ^=i=1nti/r

    • Assuming r>0 and no finite MLE exist for r=0

  • Information matrix

I(θ)=2(θ)θ2=θ[1θ2itirθ]=2θ3itirθ2

  • Replacing the parameter θ by its MLE θ^=iti/r, the observed information matrix becomes I(θ^)=rθ^2

Confidence interval (Method I)

  • Using θ^’s asymptotic distribution Z1=θθ^[I(θ^)]1/2=θθ^θ^/rN(0,1)

  • For a small sample, Z1 does not approximate the standard normal distribution very accurately

  • For a sample with a small number of uncensored observations, (θ) tends to be asymmetric

Confidence interval (Method II)

  • Sprott () showed that 1(ϕ)=(ϕ3) is more closer to symmetric compared to (θ), where ϕ=θ1/3ϕ3=1/θ (θ)=(1/θ)itirlogθ1(ϕ)=ϕ3iti+3rlog(ϕ)

    • MLE ϕ^=θ^1/3=[iti/r]1/3

    • Observed information matrix I1(ϕ)=6ϕiti+3rϕ2I1(ϕ^)=9rϕ^2


  • Pivotal quantity Z2=ϕϕ^[I1(ϕ^)]1/2=ϕϕ^ϕ^/9rN(0,1)

    • Approximation of Z2 is quite accurate compared to that of Z1

Confidence interval (Method III)

  • Likelihood ratio statistic Λ(θ)=2(θ^)2(θ) can also be used to obtain confidence interval for θ, where (θ)=(1/θ)itirlog(θ)=r(θ^/θ)rlog(θ)(θ^)=(1/θ^)itirlog(θ^)=rrlog(θ^)

  • LRT statistic Λ(θ)=2(θ^)2(θ)=2r[(θ^/θ)1+log(θ/θ^)]

  • Two-sided (1α)100% confidence intervals are obtained as the set of θ values that satisfy Λ(θ)χ(1),1α2

Confidence interval of survivor function

  • Confidence interval for the parameter θ can also be used to obtain confidence intervals for a monotone function of θ, such as S(t;θ)=exp(t/θ)orh(t;θ)=1/θ

  • Let 100(1α)% confidence interval for θ L(Data)θU(Data), where Data={(ti,δi),i=1,,n}


  • 100(1α)% confidence interval for S(t0;θ)=exp(t0/θ) L(Data)θU(Data)t0/U(Data)(t0/θ)t0/L(Data)t0/L(Data)(t0/θ)t0/U(Data)exp(t0/L(Data))S(t0;θ)exp(t0/U(Data))

Confidence interval of hazard function

  • 100(1α)% confidence interval for h(t0;θ)=1/θ (1/U(Data))h(t0;θ)(1/L(Data))

Example 4.1.1

  • Lifetimes (in days) of 10 pieces of equipment
time status
2 1
72 0
51 1
60 0
33 1
27 1
14 1
24 1
4 1
21 0
  • Assume lifetimes follow exponential distribution with parameter, i.e. TiExp(θ)

  • MLE θ^=itir=3087=44.0

  • 95% confidence interval of θ (Method I)

θ^±z.975[I(θ^)]1/2θ^±z.975(θ^/r)44.0±(1.96)(44.0/7)11.4θ76.6


  • 95% confidence interval of θ (Method II)

    • ϕ^=θ^1/3=(44.0)1/3=0.283

    • Confidence interval for ϕ
      ϕ^±z.975[I1(ϕ^)]1/2ϕ^±z.975(ϕ^/9r)0.28±(1.96)(0.28/63)0.21ϕ0.35

    • Confidence interval for θ

0.21ϕ0.350.21θ1/30.35(0.35)3θ(0.21)322.69θ103.03


  • Likelihood ratio statistic Λ(θ)=2r[(θ^/θ)1log(θ^/θ)]

    • 95% CI for θ can be obtained from the set of θ’s such that Λ(θ)χ(1),.952=3.84

    • 95% CI for θ 22.8to102.4



A comparison of 95% CI for θ with the data on lifetimes of 10 pieces of equipment (Example 4.1.1)
method lower upper
Normal approx. 11.40 76.60
Sprott 22.69 103.03
LRT 22.80 102.40

Methods for CI

  • Normal approximation W1(θ)=[(θ^θ)[I(θ^)]1/2]2=(θ^θ)2I(θ^)

  • Sprott’s method W2(θ)=[(ϕ^ϕ)[I1(ϕ^)]1/2]2=(θ^1/3θ1/3)2I1(θ^1/3)

  • LRT W3(θ)=2(θ^)2(θ)=2r[(θ^/θ)1log(θ^/θ)]

  • Wj(θ)χ(1)2j=1,2,3under H0

Comparisons between normal approximations and LRT

Type II censoring plan

  • Assume lifetime TExp(θ)

  • Let t(1)<<t(r) be the r smallest lifetimes observed from an experiment with n(r) subjects

  • The joint distribution of t(1),,t(r) fT(t(1),,t(r))=n!(nr)![i=1r(1/θ)et(i)/θ][et(r)/θ)]nr=n!(nr)!(1/θ)re1θ[i=1rt(i)+(nr)t(r)]=n!(nr)!(1/θ)reT0/θ


  • The log-likelihood function (θ)=1θit(i)rlog(θ)(1/θ)(nr)t(r)+Const.=1θ[i=1rt(i)+(nr)t(r)]rlog(θ)+Const.=T0θrlog(θ)+Const.

  • Maximum likelihood estimator (θ)θ|θ=θ^=0θ^=(1/r)[i=1rt(i)+(nr)t(r)]

    • Observed information matrix I(θ^)=2(θ)θ2|θ=θ^=rθ^2
  • (1α)100% CI for θ (by Normal approximation) θ^±z1α/2[I(θ^)]1/2

Exact confidence interval

  • Exact confidence interval for θ can be derived for uncensored and Type II censored samples, define W1=nt(1)W2=(n1)(t(2)t(1))Wr=(nr+1)(t(r)t(r1))

  • In general Wi=(ni+1)(t(i)t(i1))i=1,,r


  • It can be shown that (4.1)i=1rWi=i=1rt(i)+(nr)t(r)=T0

  • The joint distribution of W1=g1(t(1),,t(r)),,Wr=gr(t(1),,t(r)) fW(w1,,wr)=fT(g11(w1,,wr),,gr1(w1,,wr))|J| where J=(g11(w1,,wr),,gr1(w1,,wr))(w1,,wr)=[g11(w1,,wr)w1g21(w1,,wr)w1gr1(w1,,wr)w1g11(w1,,wr)w2g21(w1,,wr)w2gr1(w1,,wr)w2g11(w1,,wr)wrg21(w1,,wr)wrgr1(w1,,wr)wr]

w1=g1(t(1),,t(r))=nt(1)t(1)=g11(w1,,wr)=w1nw2=g2(t(1),,t(r))=(n1)(t(2)t(1))t(2)=g21(w1,,wr)=w2n1+w1nwr=gr(t(1),,t(r))=(nr+1)(t(r)t(r1))t(r)=gr1(w1,,wr)=wrnr+1++w2n1+w1n


J=[g11(w1,,wr)w1g21(w1,,wr)w1gr1(w1,,wr)w1g11(w1,,wr)w2g21(w1,,wr)w2gr1(w!,,wr)w2g11(w1,,wr)wrg21(w1,,wr)wrgr1(w1,,wr)wr]=[1n1n1n01n11n1001nr+1]

  • We can write |J|=1n(n1)(nr+1)=(nr)!n!

  • The joint distribution of W1=g1(t(1),,t(r)),,Wr=gr(t(1),,t(r)) fW(w1,,wr)=fT(g11(w1,,wr),,gr1(w1,,wr))|J|=n!(nr)!1θre1θ{igi1(wi)+(nr)gr1(wr)}(nr)!n!=1θreiwi/θ

  • Equation ??? shows that

    • W1,,Wr are independent and WiExp(θ)

  • Since Wi’ are independent and WiExp(θ), so T0=iWi=it(i)+(nr)t(r) follows a gamma distribution with shape parameter r and scale parameter θ

  • Then using the relationship between gamma and chi-square distribution 2T0θχ(2r)2


Review

  • Moment generating functions for different distributions MX(t)=E[etX]={11θtfor XExp(θ)(11θt)rfor XGamma(r,θ)(112t)r/2for Xχ(r)2

  • Since 2T0θχ(2r)2, we can write P(χ(2r),α/222T0θχ(2r),1α/22)=1α

    • χ(2r),p2 pth quantile of χ(2r)2 distribution
  • (1α)100% exact confidence interval for θ 2T0χ(2r),1α/22θ2T0χ(2r),α/22

Example 4.1.3

  • The first 8 observations in a random sample of 12 lifetimes (in hours) from an assumed exponential distribution are 31,58,157,185,300,470,497,673

  • Homework

    • Obtain 95% exact and approximate confidence intervals for θ

4.1.3 Comparison of distributions

  • Comparison of two or more lifetime distributions is often of interest in practice

  • For comparing two or more independent exponential distributions, different methods of hypothesis tests and confidence intervals are available

Likelihood ratio tests

  • Let Tij be the lifetime corresponding to the jth subject of the ith group (i=1,,m;j=1,,ni)

  • Assume TijExp(θi) and the null hypothesis of interest H0:θ1==θm

  • Data: {(tij,δij),i=1,,m;j=1,,ni}


  • The likelihood function for θ=(θ1,,θm) L(θ)=i=1mj=1ni[f(tij;θi)]δij[S(tij;θi]1δij=i=1mj=1ni[1θi]δijetij/θi

  • The corresponding log-likelihood function (θ)=ij[(tij/θi)+δijlog(θi)]


  • The log-likelihood function (θ)=ij[(tij/θi)+δijlog(θi)]=i[(Ti/θi)+rilog(θi)]
    • Ti=jtij and ri=jδij
  • Maximum likelihood estimator of θi (i=1,,m)
    (θ)θi|θi=θ^i=0Tiθ^i2riθ^i=0θ^i=(Ti/ri)

  • Under H0:θ1==θm=θ(say) log-likelihood function (θ)=i[(Ti/θ)+rilog(θ)]

    • Under H0, the MLE of θ θ~=iTiiri

  • Likelihood ratio test statistic Λ=2(θ^)2(θ~)=2i[(Ti/θ^i)rilog(θ^i)+(Ti/θ~)+rilog(θ~)]=2i[rilog(θ^i)+rilog(θ~)]

    • Under H0, Λχ(m1)2 for a large sample size

Example 4.1.4

  • Four independent samples of size 10 each had 7 failures

  • MLE under exponential model θ^1=106,θ^2=80,θ^3=140,θ^4=158

  • Homework

    • Test H0:θ1==θ4

Confidence intervals for θ1/θ2

  • Comparison between two exponential distributions TiExp(θi),i=1,2

  • It can be shown that mle θ^i approximately follows normal distribution θ^iN(θi,θi2/ri)logθ^iN(logθi,1/ri)

  • Distribution of log(θ^1/θ^2) logθ^1logθ^2=log(θ^1/θ^2)N(log(θ1/θ2),(r11+r21))


  • Null hypothesis H0:θ1=θ2H0:θ1/θ2=1H0:logθ1=logθ2

  • Test statistic Z=log(θ^1/θ^2)log(θ1/θ2)(r11+r21)1/2N(0,1)


  • 95% CI for log(θ1/θ2) (4.2)log(θ^1/θ^2)±z1α/2(r11+r21)1/2

  • 95% CI for (θ1/θ2) (θ^1/θ^2)exp(±z1α/2(r11+r21)1/2)


  • Confidence intervals can also be found by inverting the likelihood ratio test for a hypothesis of the form H0:θ1=aθ2 where a>0

  • For Type I sample {(tij,δij),i=1,2;j=1,,ni}, the log-likelihood function (θ1,θ2)=i=12[(Ti/θi)+rilog(θi)]

    • MLE θ^1=(T1/r1) and θ^2=(T2/r2)

  • Under H0:θ1=aθ2, the log-likelihood function (aθ2,θ2)=r1log(aθ2)(T1/(aθ2))r2log(θ2)(T2/θ2)=(r1+r2)log(θ2)(T1/(aθ2))(T2/θ2)r1log(a)

    • MLE under H0 θ~2=T1+aT2a(r1+r2)θ~1=aθ~2=T1+aT2(r1+r2)

  • The likelihood ratio test statistic Λ(a)=2(θ^1,θ^2)2(θ~1,θ~2), where (θ^1,θ^2)=i=12[(Ti/θ^i)+rilog(θ^i)]=(r1+r2)r1log(θ^1)r2log(θ^2)

and (θ~1,θ~2)=T1θ~1T2θ~2r1log(θ~1)r2log(θ~2)=(r1+r2)r1log(θ~1)r2log(θ~2)

  • The likelihood ratio test statistic Λ(a)=2(θ^1,θ^2)2(θ~1,θ~2)=2r1log(θ~1/θ^1)+2r2log(θ~2/θ^2)=2r1log(aθ~2/θ^1)+2r2log(θ~2/θ^2)

  • (1α)100% confidence interval for (θ1/θ2) can be construed from the values a that satisfy (4.3)Λ(a)χ(1),1α2

  • For Type I censored sample, LRT statistics based confidence interval for (θ1/θ2) is more accurate than that of normal approximation for small samples

Example 4.1.5

  • A small clinical trial was conducted to compare the duration of remission achieved by two drugs used in the treatment of leukemia.

  • Duration of remission is assumed to follow an exponential distribution and two groups of 20 patients produced the followings under a Type I censoring mechanism r1=10,T1=700,r2=10,T2=540

  • Obtain 95% approximate and exact CI for (θ1/θ2)


  • Unrestricted MLEs θ^1=T1r1=70andθ^2=T2r2=54

  • 95% approximate CI for log(θ1/θ2) log(θ^1/θ^2)±z.975(r11+r21)1/2log(70/54)±(1.96)(101+101)1/20.26±0.877


  • 95% approximate CI (normal distribution based) for (θ1/θ2) 0.617<log(θ1/θ2)<1.1360.54<(θ1/θ2)<3.114

Is there any significant difference between θ1 and θ2?


  • Likelihood ratio statistic based confidence interval for (θ1/θ2) requires estimate of the parameters under the null hypothesis H0:θ1=aθ2

  • Estimates under H0 θ~2=T1+aT2a(r1+r2)θ~1=aθ~2=T1+aT2(r1+r2)

  • Likelihood ratio statistic Λ(a)=2r1log(aθ~2/θ^1)+2r2log(θ~2/θ^2)


a Λ(a) a Λ(a)
0.525 3.953 3.185 3.911
0.560 3.424 3.150 3.819
0.595 2.958 3.115 3.726
0.630 2.549 3.080 3.633
0.665 2.187 3.045 3.541
0.700 1.869 3.010 3.448
0.735 1.589 2.975 3.356
0.770 1.341 2.940 3.263


  • 95% of (θ1/θ2) is the range of values of a=(θ1/θ2), such that Λ(a)χ(1),.952=3.84 which is 0.56<a<3.150.56<(θ1/θ2)<3.15

95% confidence interval of (θ1/θ2) for Type I censored sample
method lower upper
Normal approximation 0.54 3.114
LRT 0.56 3.150

Type II censored sample (CI for θ1/θ2)

  • Lifetime distributions TiExp(θi)i=1,2

  • Type II censored sample for the group i, which has ri number of failures and (niri) subjects are censored at t(iri) t(i1)<<t(iri)


  • Likelihood function L(θ1,θ2)=i=12(1/θi)riejt(ij)/θie(niri)t(iri)/θi=i=12(1/θi)rie(1/θi)[jt(ij)(niri)t(iri)]=i=12(1/θi)rieTi/θi

  • MLE θ^i=Tiri

  • We have already shown that 2Tiθi=2riθ^iθiχ(2ri)2

  • Then we can show (2r1θ^1/θ1)/(2r1)(2r2θ^2/θ2)/(2r2)=θ^1θ2θ^2θ1F(2r1,2r2)


  • We can write P(F(2r1,2r2),α/2θ^1θ2θ^2θ1F(2r1,2r2),1α/2)=1α

  • (1α)100% confidence interval for (θ1/θ2) (θ^1/θ^2)F(2r1,2r2),1α/2(θ1/θ2)(θ^1/θ^2)F(2r1,2r2),α/2

4.2 Gamma distribution

  • The pdf of two-parameter gamma distribution f(t;α,k)=1αΓk(tα)k1exp(t/α),t>0

    • Scale parameter α>0 and shape parameter k>0

  • Survivor function S(t;α,k)=tf(u;α,k)du=t1αΓk(uα)k1exp(u/α)du=1I(k,t/α)

  • Incomplete gamma function I(k,x)=1Γ(k)0xuk1eudu

Uncensored data

  • Let t1,,tn be a random sample from Gamma(α,k), the log-likelihood function (α,k)=i=1nlogf(ti;α,k)=i=1nlog[1αΓk(tiα)k1exp(ti/α)]=nlogΓknklogα+n(k1)logt~nt¯/α

    • t~=(i=1nti)1/n and t¯=(1/n)iti

(α,k)=nlogΓknklogα+n(k1)logt~nt¯/α

  • Score functions U1(α,k)=(α,k)α=nkα+nt¯α2U2(α,k)=(α,k)k=nψ(k)nlogα+nlogt~ where ψ(k)=Γ(k)/Γ(k) is the digamma function (see Appendix B of the Textbook)

  • MLE (α^,k^) is a solution of the system of linear equations nk^α^+nt¯α^2=0nψ(k^)nlogα^+nlogt~=0

    • Two equations and two unknowns

    • Equations are non-linear in terms of the variables

    • No closed form solutions

1. Substitution method

(4.4)nk^α^+nt¯α^2=0α^=t¯k^

(4.5)nψ(k^)nlogα^+nlogt~=0ψ(k^)logk^=log(t~/t¯)

  • Solve using a suitable optimization technique (e.g. graphical method, Newton-Raphson method, etc.) to obtain k^

  • Then α^ can be obtained from

2. Newton-Raphson method

  • Score functions U1(α,k)=(α,k)α=nkα+nt¯α2U2(α,k)=(α,k)k=nψ(k)nlogα+nlogt~

  • Elements of Hessian matrix H11(α,k)=2(α,k)α2=nkα22nt¯α3H12(α,k)=2(α,k)αk=nα=I21(α,k)H22(α,k)=2(α,k)k2=nψ(k)

  • Score vector U(α,k)=[U1(α,k)U2(α,k)]

  • Hessian matrix H(α,k)=[H11(α,k)H12(α,k)H21(α,k)H22(α,k)]

  • Score vector and information matrix are function of parameters and data


  • Initial values θ(0)=(α(0),k(0)) are chosen so that elements of score vector and hessian matrix are finite

  • Updated estimate θ(1)=(α(1),k(1)) is obtained as (4.6)θ(1)=θ(0)[H(θ(0))]1U(θ(0))

  • The estimate θ(2) can be obtained by using θ(1) as input in

  • Repeating the procedure of evaluating the using the current estimate, the following sequence of estimates can be obtained {θ(j),j=3,4,5,}


  • A convergence criterion needs to be defined to obtain the MLE from the sequence of estimates {θ(j),j=1,2,3,},

  • Convergence criteria are defined on the basis of two successive values of the parameters estimates

  • θ(j)=(α^,k^) is considered as MLE if one of the following criteria is satisfied

    • |θ(j)θ(j1)|is very small

    • |(θ(j))(θ(j1))|is very small

    • U(θj)0


  • Estimated variance-variance matrix of the MLE (α^,k^) can be obtained from the inverse of the negative of the hessian matrix evaluated at the MLE Var^(α^k^)=[H(α^,k^)]1

Newton-Raphson method: Pseudo code

1: theta0 <- `initial value of the parameter`
2: eps <- 1 
3: while (eps > 1e-5) {
4:   u0 <- U(theta0)
5:   h0 <- H(theta0)
6:   theat1 <- theta0 - inv(h0) * u0
7:   eps <- max(abs(theta1 - theta0))
8:   if (eps < 1e-5) break
9:   else theta0 <- theta1
10: }
11: return (list(theta0, h0))

  • Statistical software have routines (such as optim() of R) that can optimize likelihood function to obtain MLE

    • Such routines require providing a “function” of likelihood function as an argument

    • Different optimization algorithms, such as Newton-Raphson, Mead-Nelder, simulated annealing, etc. are implemented in optimization routines

Statistical inference

  • Asymptotic distributions of MLEs, i.e. α^N(α,var(α^))and k^N(k,var(k^))

  • Since α>0 and k>0, confidence intervals based on the sampling distribution of logα^ and logk^ ensure non-negative lower limit of the confidence interval logα^N(logα,var(logα^))and logk^N(logk,var(logk^))

    • var(logα^)=(1/α^)2var(α^)

  • (1α)100% confidence interval for logα

logα^±z1α/2SE(logα^)

  • (1α)100% confidence interval for α

α^exp(±z1α/2SE(logα^))

Example 4.2.1

Following are survival time of 20 male rates that were exposed to a high level of radiation

rtime <- c(152, 152, 115, 109, 137, 88, 94, 77, 160, 165, 125, 40, 128, 123, 136, 101, 62, 153, 83, 69)
rtime
 [1] 152 152 115 109 137  88  94  77 160 165 125  40 128 123 136 101  62 153  83
[20]  69
  • Assume the lifetimes follow a gamma distribution with scale parameter α and shape parameter k

  • From the data: t¯=113.45andt~=107.07

  • Expressions of the score function (4.7)α^=(t¯/k^)=113.45/k^ψ(k^)logk^log(t~/t¯)=0ψ(k^)logk^+0.058=0


kfun <- function(k0, time) {
  t_bar <- mean(time)
  tgbar <- exp(mean(log(time)))
  return(digamma(k0) - log(k0) - 
           log(tgbar / t_bar))
}
k_hat <- uniroot(kfun, lower = 3, 
                 upper = 10, 
                 time = rtime)

k^=8.799α^=12.893

k_hat
$root
[1] 8.799215

$f.root
[1] 8.181174e-10

$iter
[1] 7

$init.it
[1] NA

$estim.prec
[1] 6.103516e-05

Log-likelihood function of gamma distribution

  • R function to calculate log-likelihood function of a gamma distribution for a given sample
gamma_loglk <- function(par, time) {
   sum(
     dgamma(time, scale = par[1], shape = par[2], log = T)
     )
}
  • par is a vector with the parameter scale as the first element and shape as the second element

  • time is the observed failure times

  • gamma_loglk function can be evaluated for any given valid values of par and time


  • For given values of parameters, say (α0=1,k0=2.), the value of log-likelihood function can be obtained for the rat data
gamma_loglk(par = c(1, 2), time = rtime)
[1] -2175.531
  • For another set of values (α0=100,k0=80), the corresponding value of log-likelihood function
gamma_loglk(par = c(100, 80), time = rtime)
[1] -5392.711

optim() function

  • R function optim() is a general purpose optimization routine
optim(par, fn,  gr = NULL, method = "Nelder-Mead",
      lower = -Inf, upper = Inf, 
      control = list(), hessian  = FALSE, ...
      )
  • par initial values for the parameters to be optimized

  • fn a function to be minimized

  • gr a function to return the gradient (score function)

  • method a method to be used, e.g. “Nelder-Mead”, “BFGS”, “CG”, “L-BFGS-B”, etc.

  • lower and upper lower and upper limit of the parameters to be optimized

  • control list of arguments (e.g. fnscale, etc.) for controlling the iterations


Initial values

  • Initial values of the parameters can be selected from the exploratory plots of log-likelihood function

  • The log-likelihood function must provide finite values with the initial values of the parameters


  • As an example, assume the initial values α0=12.9 and k0=8.8, and the corresponding value of log-likelihood function
gamma_loglk(par = c(12.9, 8.8), time = rtime)
[1] -100.48
  • For another set of initial values, α0=1.0 and k0=1.0
gamma_loglk(par = c(1.0, 1.0), time = rtime)
[1] -2269

  • Since both sets of initial values can provide finite values of log-likelihood function, we can use either of these two as an initial value for optimizing the log-likelihood function using the R function optim()

Example 4.2.1 using optim()

  • Fit of the gamma distribution to the rat data
gamma_fit <- optim(
    par = c(1, 1),  # initial values
    fn = gamma_loglk, # function name
    control = list(fnscale = -1), # for maximization
    hessian  = T, # will return hessian matrix
    time = rtime 
    )

  • List of objects in gamma_fit
names(gamma_fit)
[1] "par"         "value"       "counts"      "convergence" "message"    
[6] "hessian"    
  • Converged?
gamma_fit$convergence
[1] 0

  • Estimate of scale and shape parameters
gamma_fit$par
[1] 12.896976  8.795312
  • α^=12.897 and k^=8.795

  • Estimated variance-variance matrix
cvar <- solve(-gamma_fit$hessian)
cvar
          [,1]       [,2]
[1,]  16.88167 -10.871356
[2,] -10.87136   7.416138
  • SE(α^)=16.882=4.109

  • SE(k^)=7.416=2.723


95% CI for α and k

  • Using the sampling distribution of α^ and k^ α^±z1α/2SE(α^)4.84α20.95k^±z1α/2SE(k^)3.46α14.13

  • Using the sampling distribution of logα^ and logk^ α^exp(±z1α/2SE(logα^))6.91α24.08k^exp(±z1α/2SE(k^))4.79k16.14


Quantiles

  • pth quantile S(tp;α,k)=(1p)1Γ(k)tp/αuk1eudu=(1p)(tp/α)=Q(p,k)tp=αQ(p,k)

    • Q(p,k) is the pth quantile function of one-parameter gamma distribution [R function qgamma(p, scale=1, shape)]
  • Estimate of the median for the rat data is t^.5=α^Q(.5,k^)

12.893951 * qgamma(.5, scale = 1, shape = 8.799089)     
[1] 109.1872
qgamma(.5, scale = 12.893951, shape = 8.799089)
[1] 109.1872

Likelihood ratio statistic

  • Likelihood ratio statistic Λ(α,k)=2(α^,k^)2(α,k)

  • Approximate (1p)100% confidence region for (α,k) can be obtained from the set of points (α,k) satisfying Λ(α,k)χ(2),1p2


95% confidence region of (α,k) for rat survival data

LRT statistic based CI for k

  • Consider the following null hypothesis H0:k=k0

  • MLEs (unrestricted and under restriction) (α^,k^)=argmax(α,k)Θ(α,k)under H1α~(k0)=argmaxαΘα(α,k0)under H0


  • Likelihood ratio statistic Λk(k0)=2(α^,k^)2(α~(k0),k0)

    • Under H0:k=k0, Λk(k0) approximately follows a χ(1)2 distribution
  • An approximate two-sided (1p)100% confidence interval for k can be obtained from a set of values of k0 satisfying Λk(k0)χ(1),1p2


95% CI:4.54k15.28

LRT statistic based CI for α

  • Consider the following null hypothesis H0:α=α0

  • MLEs (unrestricted and under restriction) (α^,k^)=argmax(α,k)Θ(α,k)under H1k~(α0)=argmaxkΘk(α0,k)under H0


  • Likelihood ratio statistic Λα(α0)=2(α^,k^)2(α0,k~(α0))

    • Under H0:α=α0, Λα(α0) approximately follows a χ(1)2 distribution
  • An approximate two-sided (1p)100% confidence interval for α can be obtained from a set of values of α0 satisfying Λα(α0)χ(1),1p2


95% CI:7.37α25.9


Summary of 95% CI

  • For α no-transformation:4.84α20.95log-transformation:6.91α24.08LRT:7.37α25.9

  • For k no-transformation:3.46k14.13log-transformation:4.79k16.14LRT:4.54k15.28

Censored

Data {(ti,δi),i=1,,n} and the corresponding log-likelihood function (α,k)=logi=1n[1αΓ(k)(tiα)k1eti/α]δi[1I(k,ti/α)]1δi=klogαrlogΓ(k)+(k1)iδilogtiiδiti/α+i(1δi)log[1I(k,ti/α)]


  • No closed form solutions are available for MLEs α^ and k^

  • Algebraic expressions of score function and information matrix for the log-likelihood function (???) are very complicated

  • Score functions and information matrix can be evaluated numerically

  • Optimization routines available in statistical software can be used to obtain MLEs α^ and k^, and the corresponding SEs

    • Different optimization algorithms such as Newton-Raphson, Nelder-Mead, etc. are available in such routines

  • As an example, we are going to use the same rat data that we have used for the analysis of complete data

  • For the rat data, assume a time 150 weeks is considered as censored, there are 5 censored observations

    • An R object rtimec is created, which has two columns time and status
rtimec |> t()
       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
time    152  152  115  109  137   88   94   77  160   165   125    40   128
status    0    0    1    1    1    1    1    1    0     0     1     1     1
       [,14] [,15] [,16] [,17] [,18] [,19] [,20]
time     123   136   101    62   153    83    69
status     1     1     1     1     0     1     1

gamma log-likelihood function for censored sample

gamma_loglk <- function(par, time, status = NULL) {
  #
  if (is.null(status)) status <- rep(1, length(time))
  #
  llk_f <- sum(status * dgamma(time, scale = par[1], 
                  shape = par[2], log = T))
  #
  llk_c <- sum((1 - status) * pgamma(time, scale = par[1], 
                     shape = par[2], lower.tail = F, 
                     log.p = T))
  return(llk_f + llk_c)
}

Censored sample

  • R codes to obtain MLE of parameters of a gamma distribution from a censored sample
gamma_out_c <- optim(
  par = c(1, 1), fn = gamma_loglk, 
  control = list(fnscale = -1), hessian  = T, 
  time = rtimec$time, status = rtimec$status
)

  • Maximum likelihood estimators [gamma_out_c$par] α^=21.3andk^=5.79

  • Standard errors [solve(-gamma_out_c$hessian)] se(α^)=8.61andse(k^)=2.14


95% CI:10.7α52.48


95% CI:2.78k10.9


  • 95% CI for α no-transformation:4.44α38.17log-transformation:9.65α47.02LRT:10.7α52.48

  • 95% CI for k no-transformation:3.46k14.13log-transformation:4.79k16.14LRT:2.78k10.9


  • Estimate of median t^.5=α^Q(.5,k^)=116.4weeks

Acknowledgements

This lecture is adapted from materials created by Mahbub Latif

References

Sprott, David Arthur. 1980. “Maximum Likelihood in Small Samples: Estimation in the Presence of Nuisance Parameters.” Biometrika 67 (3): 515–23.