Empirical calibration of confidence intervals Martijn J. Schuemie, Marc A. Suchard 2017-06-29

Contents 1 Introduction

1

2 Negative controls

2

3 Positive controls

2

3.1

Plot control effect sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Systematic error model

4

4.1

Fitting the systematic error model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

4.2

Evaluating the calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

5 Confidence interval calibration 5.1

6

Calibrating the confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 References

1

3

6 7

Introduction

Observational studies are prone to bias, but unfortunately this bias is often brushed aside with a single comment in the discussion of a paper and is never quantified. Elsewhere we have proposed using negative controls (exposure-outcome pairs where the true relative risk is believed to be one) to produce an empirical bias distribution, and subsequently calibrate p-values. Here we extend this approach to calibrated confidence intervals (CIs), which requires the use of positive controls. Since real positive controls are problematic, we have generated synthetic positive controls by injection additional (simulated) outcomes on top of negative controls. In this document we use an example study to illustrate how CIs can be calibrated using the EmpiricalCalibration R package. In the example, we have replicated a previously published newuser cohort study comparing dabigatran to warfarin for the risk of GI bleeding. The study does not aply any adjustment for confounding. The results from this study are available in the package, and can be loaded using the data() command: data(southworthReplication) The estimate for the outcome of interest can in this dataset be identified because trueLogRr is NA: outcomeOfInterest <- southworthReplication[is.na(southworthReplication$trueLogRr), ] outcomeOfInterest

1

## outcomeName trueLogRr logRr seLogRr ## 1 GiBleed NA -0.3575234 0.06668903 computeTraditionalCi(outcomeOfInterest$logRr, outcomeOfInterest$seLogRr) ## rr lb ub ## 0.6994063 0.6964876 0.7023372 Here we see that the effect estimate for GI bleed for dabigatran versus warfarin is 0.7, with a very tight CI.

2

Negative controls

Negative controls are drug-outcome pairs where we believe the drug does not cause (or prevent) the outcome. In other words, we believe the true effect size to be a relative risk of 1. We would prefer our negative controls to have some resemblance with out hypothesis of interest (in our example dabigatran vs warfarin and GI bleed), and we therefore typically pick negative controls where the outcome is the same (exposure controls), or the exposure is the same (outcome controls). In this example, we have opted for outcome controls, and have identified a set of outcomes not believed to be caused by either dabigatran or warfarin. We have executed exactly the same analysis for these outcomes, resulting in a set of effect size estimates, one per negative control: negatives <- southworthReplication[southworthReplication$trueLogRr == 0 & !is.na(southworthReplication$trueLogRr), ] head(negatives) ## ## ## ## ## ## ##

outcomeName trueLogRr logRr seLogRr 4 Neck pain 0 -0.10015508 0.05005106 6 Curvature of spine 0 0.01854604 0.08024194 10 Dislocation of joint 0 -0.14348342 0.11084576 14 Peripheral vertigo 0 -0.03627483 0.08998388 21 Effusion of joint 0 -0.20650211 0.06302084 25 Urinary tract infectious disease 0 -0.25308826 0.03858693

3

Positive controls

Positive controls are drug-outcome pairs where we believe the drug does cause (or prevent) the outcome. In this case we have generated synthetic positive controls with various true effect sizes. Similar to the negative controls we have executed exactly the same analysis for these outcomes, resulting in a set of effect size estimates, one per positive control: data(southworthReplication) positives <- southworthReplication[southworthReplication$trueLogRr != 0 & !is.na(southworthReplication$trueLogRr), ] head(positives) ## ## 2 ## 3 ## 5

outcomeName Neck pain Neck pain Neck pain

trueLogRr logRr seLogRr 1.3862944 1.1466961 0.03324708 0.6931472 0.5539283 0.03959980 0.4054651 0.2888025 0.04344616 2

## 7 Curvature of spine 1.3862944 1.3519970 0.05327228 ## 8 Curvature of spine 0.6931472 0.6918903 0.06357177 ## 9 Curvature of spine 0.4054651 0.4272736 0.06932738

3.1

Plot control effect sizes

We can start by creating a forest plot of our controls: controls <- southworthReplication[!is.na(southworthReplication$trueLogRr), ] plotTrueAndObserved(controls$logRr, controls$seLogRr, controls$trueLogRr)

0.25

0.5

1

2

3

4

6

8

10

The thick black lines indicate the true effect size, and the blue and orange dots and lines represent the effect size estimate and 95% confidence intervals of the controls. We see that many controls have a confidence interval that does not include the true effect size (orange lines), certainly more than the expected 5%. This indicates the analysis has systematic error.

4

Systematic error model

4.1

Fitting the systematic error model

We can use the controls to estimate a model for the systematic error: model <- fitSystematicErrorModel(controls$logRr, controls$seLogRr, controls$trueLogRr) model ## ## ## ## ## ## ## ## ## ## ## ## ## ##

meanIntercept meanSlope logSdIntercept logSdSlope -0.06655103 0.93329424 -1.67417066 0.01575879 attr(,"CovarianceMatrix") [,1] [,2] [,3] [,4] [1,] 5.645229e-04 -0.0005483258 9.364782e-05 -0.0001833231 [2,] -5.483258e-04 0.0009044631 -1.934024e-04 0.0003556884 [3,] 9.364782e-05 -0.0001934024 9.131157e-03 -0.0087454417 [4,] -1.833231e-04 0.0003556884 -8.745442e-03 0.0137776815 attr(,"LB95CI") [1] -0.1131192 0.8743497 -1.8614591 -0.2142986 attr(,"UB95CI") [1] -0.01998289 0.99223878 -1.48688221 0.24581617 attr(,"class") [1] "systematicErrorModel"

We see that the intercept of the mean is slightly lower than 0, indicating the analysis is negatively biased when the null is true. We can also visualize the model: plotErrorModel(controls$logRr, controls$seLogRr, controls$trueLogRr)

4

Systematic error mean (plus and minus one SD)

10 8 6 4

2

1

0.5

0.25 0.25

0.5

1

2

4

6

8

10

True effect size Here we see how systematic error changes as a function of the true effect size. As visible, the model assumes the mean and standard deviation (SD) of the error distribution are linearly related to the true effect size. For this plot, the mean and SD are also estimated at each true effect size independently, allowing us to evaluate whether the linearity assumption is far off.

4.2

Evaluating the calibration

To evaluate whether our estimation of the systematic error distribution is a good one, we can test whether the calibrated CI is truly calibrated, meaning the fraction of controls where truth is inside the calibrated CI is approximately the specified width of the CI. We also evaluate ‘centeredness’, whether truth outside of the CI is equally distributed above and below the CI: plotCiCoverage(controls$logRr, controls$seLogRr, controls$trueLogRr, crossValidationGroup = controls$outcomeName)

## Fitting error models within leave-one-out cross-validation

5

Above confidence interval

Within confidence interval

Below confidence interval

True effect size = 1True effect size = 1.5True effect size = 2 True effect size = 4 1 Uncalibrated

.75 .50

Fraction

.25 0 1

Calibrated

.75 .50 .25 0 0 .25 .50 .75 1 0 .25 .50 .75 1 0 .25 .50 .75 1 0 .25 .50 .75 1 Width of the confidence interval

The data shown here is produced using a leave-one-out cross-validation. It is important to remember that for this dataset positive controls were generated based on negative controls and will therefore be correlated. To perform an unbiased evaluation the ‘one’ in the leave-one-out should therefore be the group of a negative control and its derived positive controls, and this is achieved by specifying the crossValidationGroup argument. Thus, in this evaluation for each negative control and the positive controls derived from that negative control, we fit systematic error models using all other controls, and compute calibrated CIs for the left-out controls across a wide range of widths. In the graph the dashed lines indicate a perfectly calibrated study. We see that the calibrated CIs are much closer to the ideal than the uncalibrated CIs.

5 5.1

Confidence interval calibration Calibrating the confidence interval

We can now use the estimated systematic error model to compute the calibrated CI for our outcome of interest: ci <- calibrateConfidenceInterval(outcomeOfInterest$logRr, outcomeOfInterest$seLogRr, model) ci ## logRr logLb95Rr logUb95Rr seLogRr ## 1 -0.3117815 -0.7254078 0.1067071 0.2122781

6

exp(ci[,1:3])

## logRr logLb95Rr logUb95Rr ## 1 0.7321415 0.4841271 1.112608 In this case, even though the point estimate has not changed much, the calibrated CI is much wider than the uncalibrated one to take in to account the residual error due to lack of any adjustment for confounding.

6

References

citation("EmpiricalCalibration")

## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##

To cite EmpiricalCalibration in publications use: Schuemie MJ, Ryan PB, DuMouchel W, Suchard MA and Madigan D (2014). "Interpreting observational studies: why empirical calibration is needed to correct p-values." _Statistics in Medicine_, *33*(2), pp. 209-218. . A BibTeX entry for LaTeX users is @Article{, author = {M. J. Schuemie and P. B. Ryan and W. DuMouchel and M. A. Suchard and D. Madigan}, title = {Interpreting observational studies: why empirical calibration is needed to correct p-values}, journal = {Statistics in Medicine}, volume = {33}, number = {2}, pages = {209-218}, year = {2014}, url = {http://onlinelibrary.wiley.com/doi/10.1002/sim.5925/abstract}, }

7

Empirical calibration of confidence intervals - GitHub

Jun 29, 2017 - 5 Confidence interval calibration. 6. 5.1 Calibrating the confidence interval . ... estimate for GI bleed for dabigatran versus warfarin is 0.7, with a very tight CI. ... 6. 8. 10. True effect size. Systematic error mean (plus and min.

267KB Sizes 0 Downloads 260 Views

Recommend Documents

Empirical calibration of p-values - GitHub
Jun 29, 2017 - true even for advanced, well thought out study designs, because of ... the systematic error distribution inherent in an observational analysis. ... study are available in the package, and can be loaded using the data() command:.

Robust Nonparametric Confidence Intervals for ...
results are readily available in R and STATA using our companion software ..... mance in finite samples by accounting for the added variability introduced by.

Bootstrap confidence intervals in DirectLiNGAM
Page 1 ... elements in order to make it converge to the real distribution. For the second ...... [6] J. Fan and R. Li, “Variable selection via nonconcave penalized.

Incentives for Eliciting Confidence Intervals
Dec 3, 2012 - rule punishes the expert for specifying a larger interval width, and for the distance of ..... In sum, learning about the dispersion of beliefs is confounded. ... width gives information about their relative degrees of risk aversion.

Interpretation of Confidence Intervals B. Weaver
Aug 27, 2008 - we may collect data. ... data in a sample ( X ) is used to estimate the mean in the population ... 2 You would not do this in practice, of course.

Calibration of Confidence Measures in Speech ...
application developers have no access to the internals of the engines and cannot ..... ASR Engine E1 in the training (calibration), development, and test sets for ...

Determining the calibration of confidence estimation procedures for ...
Dec 23, 2012 - Standard Protein Mix. Database mix 7 [30]. Provided with the data. (110 proteins including contaminants). Sigma 49 mix. Three replicate LTQ ...

Calibration of Confidence Measures in Speech ...
systems, voice mail (VM) transcription, and voice search. [2][3]. Even though the ASR accuracy has been greatly improved over the past three decades, errors ...

Two-Sided Confidence Intervals for the Single Proportion
Jun 9, 2017 - See discussions, stats, and author profiles for this publication at: ... zero width intervals; and ease of use, whether by tables, software or formulae. .... variance p(1!p)/n, and we proceed to show that this results in a good degree .

Standard Errors and Confidence Intervals for Scalability ...
Oct 10, 2012 - not available for scales consisting of Likert items, the most popular item type in ... or attitude of interest, such as religiosity, tolerance or social capital. ... 2011), political knowledge and media use (Hendriks Vettehen, ..... We

A Review on Calibration Factors in Empirical Software Cost ...
Software cost estimation (SCE) is a vigorous research area in the software engineering community. SCE is a process of estimating the cost of software in terms ...

Understanding radio polarimetry. II. Instrumental calibration of ... - GitHub
Summary and notation of Paper I ..... Summary of corrupting influences in polarimetric calibration. ... note, the act of rotating the receptors does affect the po-.

Geometrical Calibration of Multispectral Calibration
cameras have been a promising platform for many re- search and industrial ... [2] proposed a pattern consisting of a grid of regular squares cut out of a thin.

Exact confidence intervals for the Hurst parameter of a ... - Project Euclid
sian and non-Gaussian stochastic processes, is crucial in applications, ranging ..... of Hermite power variations of fractional Brownian motion. Electron. Comm.

Some Announcements What Is Calibration? ...according to ... - GitHub
is defined as the process of quantitatively defining the system response to known, controlled signal inputs. ○ www.eumetsat.int/en/dps/helpdesk/glossary.html.

Semiparametric forecast intervals
May 25, 2010 - include quarterly inflation fan charts published by the Bank of ... in Wiley Online Library ... Present value asset pricing models for exchange rates (e.g., Engel .... has also shown that the ANW estimator has good boundary ...

Estimation of Prediction Intervals for the Model Outputs ...
Abstract− A new method for estimating prediction intervals for a model output using machine learning is presented. In it, first the prediction intervals for in-.

Empirical Evaluation of Volatility Estimation
Abstract: This paper shall attempt to forecast option prices using volatilities obtained from techniques of neural networks, time series analysis and calculations of implied ..... However, the prediction obtained from the Straddle technique is.

Degree-of-confidence-of-teachers.pdf
תהליכי ביזור לעומת אחרים שפחות נחשפו לתהליך זה; 2 )בבחינת הקשר. Page 3 of 29. Degree-of-confidence-of-teachers.pdf. Degree-of-confidence-of-teachers.pdf. Open.