Sensitivity analysis of censored output through polynomial, logistic, and tobit regression: theory and case study

Jack Kleijnen 1, Antonie Vonk Noordegraaf 2, Mirjam Nielen 2 1 Tilburg 2 Wageningen

WSC, Arlington (Virginia), 9-12 December 2001 CentER, 24 January 2002

Overview Œ • Ž

• •

What is censored output? How to do sensitivity analysis (SA)? Solutions: Three regression meta-models m Polynomial m Logistic (logit) m Tobit Case study: animal disease (IBR) in Holland Main result: tobit gives right signs for effects

25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

2

1. Introduction Sensitivity Analysis, guided by DOE DOE analyses I/O data through ANOVA (polynomial regression) Problem: censored output Examples: Waiting time (non-negative) ‘Rare event’ simulation with finite time horizon (time to occurrence of event) Statistical problem: OLS gives bias Solution: logit and tobit regression Origin: econometrics, etc. (Tobin 1958) No applications in simulation! 25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

3

2. Classic DOE & ANOVA k ‘factors’ combined into n scenarios Case study: k = 31, n = 231 - 25 = 64 Number of runs per scenario (replicates): m Simulation I/O is (Z, w) with Z an N×k matrix; z(i) repeated m times; N = n m Polynomial regression: y(i) = x(i) β + e( e(i) with predictor y(i) main & interaction in β etc X an N×q matrix; q = 1 + k + k(k - 1)/2 etc. Iff e (i) ∼ NIID(0, σ), then OLS is BLUE and ML; significance test for β : Student t 25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

4

2. Classic DOE (continued) - Two estimators for var[e (i)]: Pooled, if m > 1 replications MSE, if n > q and no lack of fit

- CRN: correlations violate white noise - Cross-validation of metamodel: 1. 2. 3. 4. 5.

Delete I/O data for scenario i Re-estimate factor effects Predict output for scenario i Repeat for all i (= 1, …, n) Scatter plot: see next slide

25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

5

2. Classic DOE (continued) Scatter plot (simulated vs predicted output): D 8 1?

Fig. 2.

25-jan-02

Scatter plot of metamodel prediction and simulation realisation, based on cross-validation Kleijnen: Logit; WSC 2001 & CentER 2002

6

3. Logit regression ‘Rare event’ literature: focus on VRT Now: Identify the important factors Case study: w is ‘time to event occurrence’ w* = 1 if w = 1000 (censored) w* = 0 if w < 1000 (not censored)

Logit predictor of P(w* = 1) = E(w*): y = {exp($ $’x)}/{1 + exp($ $’x)} so 0 < y < 1 Estimation through ML; see SPSS Asymptotic normal with ‘Hessian’ cov( βˆ ) 25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

7

3. Logit (continued ) Building the model (see SPSS and references) 1. Start per factor x(j): univariate logit Include x(j) if p < 0.25 using Wald’s statistic 2. Backwards elimination in multivariate logit 3. Add interactions if their p < 0.05 4. Check fit of logit metamodel: a. Nagelkerke (1991)’s R2 (close to 1?) b. Fraction of scenarios classified correctly: If yˆ > 0.50 then ‘censored’ scenario Check if w = 1000 25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

8

4. Tobit regression Logit: much info lost (binary transformation) Tobit: latent variable w* w = w* if w* < 1000 w = 1000 if w* > 1000

Tobit predictor of w*: y* = $’x + e (see ANOVA with y, not y*) Estimation through ML; see LIMDEP 7.0 Building the meta-model: see ANOVA (stepwise selection of main effects; interactions) 25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

9

5. Case study on IBR Simulate IBR outbreaks & control, per week k = 31 factors, standardized as 0 and 1 DOE: n = 231 - 25 = 64 (resolution 4) m = 2 replicates Total time on five PCs @ 533 MHZ: two weeks! Output w: # of weeks to reach 5% prevalence, but terminate at 1000 weeks (censoring) 25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

10

5.1 Case study: polynomial Building the polynomial meta-model: 1. Start with first-order polynomial 2. Backwards elimination: 11 of the 31 factors remain 3. Add interactions for remaining factors: 3 interactions added 4. R2 = 0.82, but two wrong signs! ρˆ = 0.97, but 23 of 64 simulated outputs are censored whereas yˆ > 1000 Explanation: biased OLS estimators 25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

11

5.2 Case study: logit Only 6 significant main effects (was 11) No interactions (was 3) Nagelkerke’s R2 = 0.81 (was 0.82) Fraction of 64 scenarios classified correctly (as being censored): 92%

25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

12

5.3 Case study: tobit Comparison with polynomial: Only 9 significant main effects (was 11) No wrong signs! Same 3 interactions R2 = 0.83 (was 0.82) Fraction of 64 scenarios classified correctly: 89%

25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

13

Conclusions & future research l

l

l

l

Problem: Censored simulation output Case study: 23 of 64 scenarios are censored Polynomial: High R or D; yet wrong signs Cause: Biased estimator Logit: Binary outcome (does event occur?) Case study: 92% correctly classified Other applications: Rare event simulation Tobit: Tailored to censoring! Case study: No wrong signs; 89% correctly classified

25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

14

Future research l l

l l

l l l

Rare event simulation & SA via logit models Waiting times with censored output (not < 0): tobit Non-normality & ML (bias) Heteroscedasticity & ML Two estimators of variance: Pooled, MSR CRN Alternatives: GLM Model building strategy *****

25-jan-02

Kleijnen: Logit; WSC 2001 & CentER 2002

15

wsc-cent.pdf

é Solutions: Three regression meta-models. m Polynomial. m Logistic (logit). m Tobit. è Case study: animal disease (IBR) in Holland. è Main result: tobit gives ...

90KB Sizes 1 Downloads 123 Views

Recommend Documents

No documents