Semiparametric regression for the mean and rate functions of recurrent events Chihyun Lee April 1, 2013
1
Introduction
In the analysis of recurrent event data, research interest may either lie on the gap times between successive events or on the mean and rate of events. This paper [5] focuses on the latter. Conventionally, counting process model with Cox [2] type of intensity functions introduced by Andersen and Gill [1] which is of the following form,
λZ (t) = exp{β0T Z(t)}λ0 (t), 1
(1)
have been assumed often for recurrent events. This model, however, requires an accurate specification of the dependence of recurrent events within subject. Thus, Pepe and Cai [6], Lawless and Nadeau [3], and Lawless et al. [4] have studied the following mean function model E{dN ∗ (t)|Z(t)} = dµZ (t) = exp{β0T Z(t)}dµ0 (t)2
(2)
which is more versatile than (1), allowing unknown dependence structure and being flexible to any counting process. A simple illustration of this point is provided in the paper. This works in the same fashion as the GEE approach where unknown correlation or unmeasured 1 2
All notations follow the notations used in class unless specified Let N ∗ (t) be the number of events that occur in [0, t].
1
dependence structure between outcomes are used. It is known to provide robust standard error estimates. In this paper, rigorous justification of using model (2) in inferences on the regression parameters is provided. In addition, model checking techniques are illustrated as well as methods of constructing simultaneous bands for µ0 (·).
2
Inferences on the regression parameters
Inferences on the regression parameters under two different model assumptions are compared in this paper. Under the intensity function assumption (1), the solution of the partial likeP Rτ lihood score function for β0 , U (β, τ ) = ni=1 0 {Zi (u) − ε(β, u)}dNi (u), βˆ becomes the estiRτ mates, and the covariance matrix is A−1 where A ≡ E[ 0 {Z(t)−¯ ε(β0 , t)}⊗2 Y (t)exp{β0T Z(t)}dµ0 (t)] and ε¯(β0 , t) is the limit of ε(β0 , t). This is called the naive method. For arbitrary counting processes, model (1) cannot be used. The distribution of U (β0 , τ ) = Rt Pn R τ {Z (u) − ε(β, u)}dM (u) where M (t) = N (t) − Y (u)exp{β0T Zi (u)}dµ0 (u) should i i i i i=1 0 0 i be derived through model (2). The limiting distribution of n−1/2 U (β0 , t) has mean zero and i hR Rt s T covariance matrix, Σ(s, t) = E 0 {z1 (u) − ε¯(β0 , u)}dM1 (u) 0 {Z1 (ν) − ε¯(β0 , ν)} dM1 (ν) . It is shown that the covariance matrix of βˆ is asymptotically normal with the sandwich covariance matrix, Γ ≡ A−1 ΣA−1 where Σ = Σ(τ, τ ). This covariance matrix Γ can be ˆ Aˆ−1 and refer as the robust estimator. ˆ ≡ Aˆ−1 Σ estimated by Γ
3
Inferences on the mean function
Let µ0 (t) be estimated by µ ˆQ (t) which is derived by incorporating a random weight function Q into the score function. Define V (t) ≡ n1/2 {ˆ µQ (t)−µ0 (t)}. The convergence of V (t) to the zero mean Gaussian process shows the asymptotic normality of µ ˆQ (t), and the estimates with
2
the derived consistent variance estimator enables the construction of pointwise confidence interval. The paper introduce a simple trick of applying a log transformation in the procedure as µ0 (t) is non-negative. In the construction of the simultaneous confidence bands, the supremum of V (t) needs to be evaluated. However, it is impossible to express the distribution analytically. Thus, the approximation of V (t), V˜ (t) was used to generate a number of realization. With the ˜
(t) realizations, we can find the value of cα/2 that satisfies P r{sup| Vξˆ1/2 | < cα/2 } = 1 − α where
ξˆ1/2 is the standard error estimator. Then, the confidence band for µ0 (t) can be easily established using the log-transformation as for the confidence interval.
4
Model checking techniques
As the recurrent event data is correlated within subjects, the difference between the observed ˆ i (t) is not martingales, so the martingale central limit and the predicted numbers of events, M theorem cannot be directly applied. Certain values are defined and the null distribution of those values are derived in the aim of observing unusual patterns. This paper presents model checking methods for 4 different things, i.e. the functional forms of the covariates, the exponential link function of model (2), the proportional rates or means assumption with respect to the jth covariate component, and the overall fit of the model. For all four model checking procedures, numbers are generated from the derived null distribution of defined values under each case. Then, evaluations on how extreme the actually observed values are compared to the generated values can be made. This is considered as a formal way of checking the adequacy of model because p-values can be obtained computing the probability of the unusual values.
3
5
Conclusion
If the dependence structure is complex or unknown, the intensity function model (1) may not be applicable since the specification of the dependence of recurrent events is impossible. The robust rate and mean model (2), however, can be used. This proposed method remains valid whether or not the dependence structure is correctly modelled. Simulation studies conducted in the paper show that the robust method of estimating variance assuming (2) presents more accurate estimation than the naive variance estimator which results in underestimated values. Also the coverage probabilities of the robust estimator seem to be more reasonable as it is close to 0.95. The robust estimator is applied to the real data of the CGD study, and the results are consistent to the simulation results. In the appendix of this paper, underlying asymptotic theory of the proposed methods are rigorously provided.
References [1] Andersen, P. K. and Gill, R. D. (1982) Cox’s regression model for counting processes: a large sample study. Annals of Statistics 10, 1100 - 1120. [2] Cox, D. R. (1972) Regression models and life-tables (with discussion). Journal of the Royal Statistical Society: Series B 34, 187 - 220. [3] Lawless, J. F. and Nadeau, C. (1995) Some simple robust methods for the analysis of recurrent events. Technometrics, 37 158 - 168. [4] Lawless, J. F., Nadeau, C. and Cook, R. J. (1997) Analysis of mean and rate functions for recurrent events. In Proc. 1st Seattle Symp. Biostatistics: Survival Analysis (eds D. Y. Lin and T. R. Fleming), pp. 37 - 49. New York: Springer. [5] Lin, D. Y., Wei, L. J., Yang, I., and Ying, Z. (2000) Semiparametric regression for the 4
mean and rate functions of recurrent events. Journal of the Royal Statistical Society: Series B, 62 711 - 730. [6] Pepe, M. S. and Cai, J. (1993) Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. Journal of the American Statistical Association 88, 811 - 820.
5