Design of Environmental Monitoring Networks Resilient to Facility-Wide-False-Detection-Rates NUNES, L. M ., ALMEIDA, S. & MONTEIRO, J. P. Universidade do Algarve, Geosciences Center, Campus de Gambelas, 8005-139 Faro, Portugal. e-mail:
[email protected]
CUNHA, M. C. Universidade de Coimbra, Dep. Engenharia Civil, Coimbra.
RIBEIRO, L. Instituto Superior Técnico, Geosciences Center, Lisboa.
Abstract We propose in this article a new method for designing environmental monitoring networks, where the facility-wide-false positive/negative rates (FWFR) are considered in the design. FWFR are a result of the need to use statistical tests to evaluate if given threshold values have been surpassed. The overall chance of having a false rate over an entire site is cumulative, i.e., it increases with the number of statistical tests performed (number of locations monitored and number of chemical parameters tested). Even if each statistical comparison has a small associated probability (false rate), the probability of at least one of a large number of comparisons being significant by chance is near certainty. In the proposed formulation, the objective function depends on the estimated FWFR. The latter are computed by modelling solutes dispersion in groundwater and by imposing an, a posteriori, fixed FWFR: easily made by changing the necessary number of estimated values such that they correspond to an exceedance/not exceedance (if previously had not). The monitoring networks thus obtained are less prone to FWFR. Key words: Facility-wide false rate; optimisation; groundwater
INTRODUCTION Facility-wide false rates (FWFR) are a result from multiple hypotheses testing on many indicators and locations. FWFR may result from falsely indicating that an area is contaminated when it is not (facility-wide false positives, FWFP), or that the area is not contaminated, when it is (facility-wide false negatives, FWFN). The basic in hypothesis testing strategy consists in defining a null hypothesis (H0) established to be nullified or refuted, in order to support an alternative hypothesis (H1). Assuming the null hypothesis is true the significance level (α) of a test is the maximum probability that the observed statistic would be observed and still be considered consistent with chance variation (consistent with the null hypothesis). Consequently, if the null hypothesis is true the significance level is the probability that it will be rejected in error (a decision known as a Type I error). In the context of groundwater monitoring, where the null hypothesis is usually of “no contamination”, the test will indicate contamination when there is none. This type of error of pointing out contamination
831
when there is none, brings unnecessary costs, as a result of the monitoring-network extension or by making treatments longer when they are not needed. On the other hand, the Type II error occurs when one fails to reject the null hypothesis that is false, thus resulting in instances where actual contamination goes undetected. This could cause serious environmental and human health problems. Assuming independence among all samples, if the probability of a false positive result for a single comparison is α, the probability of at least one of the comparisons being significant by chance alone is FWFPR = 1 − Pr( all samples okay )
(
)
FWFPR = 1 − Pr(one sample okay ) tests
(1)
FWFPR = 1 − (1 − α )tests
If the number of tests is high the chance to generate a false positive will be high, which indicates an instance of contamination when there really is no contamination. Since a key problem in the context of groundwater monitoring involves multiple comparisons, result of measurements from many monitoring wells, each analyzed for multiple constituents, the conclusion of contamination is near certainty, regardless of whether or not contamination is actually present. To remedy this problem (USEPA, 1992) proposes to do enough initial testing of background and facility leachate and waste samples to determine those specific parameters present at levels substantially greater than background (see also Davis & McNichols (1994a,b) and Gibbons (1994). US EPA guidance encourages controlling FWFPR over both water constituents and wells (USEPA, 1992). This is, however, a way of controlling false detections once the monitoring network is installed. In this article we concentrate instead in analysing the problem of designing monitoring networks if FWFR (positive and negative) are considered in the optimisation of the design, i.e., where to locate de monitoring in order to minimize the impact of false detections.
PROPOSED METHODOLOGY The methodology has the following steps: i) generation of a concentration field given the available information, which may be the result of estimation (e.g., by geostatistics) or by solving the advection-dispersion-reaction equation for solute transport – obtaining prior information; ii) estimate the impact when imposing a known FWFR, given by a set of statistics, on conditional estimations made by geostatistical interpolations – generating posterior FWFR fields; iii) chose the monitoring design.
Obtaining prior information Obtaining the solute concentration fields is the objective of any groundwater monitoring network, so it may seem awkward to start a monitoring optimisation method with what is ultimately the end result. However it is possible to generate initial guesses for the concentration distribution if one has some prior data, such as a previous sparse sampling campaign, and/or an appropriate flow and transport model. Hence, the estimated, and or estimated/simulated concentration fields are what is designated here as the prior information. Consider the spatial area under study as the continuous 832
domain, D, and a set of N locations that discretise the domain. Solute concentration is a random variable, determined at locations x, with coordinates (j,k): C(xj,k).
Generating posterior FWFR fields The terms facility-wide false rate (positive and negative) discussed here are not the statistical consequence of an hypothesis test; it is the analysis of the impact of artificially changing the “known” concentration (obtained as prior information), C(xj,k), by values much higher (lower) than the prior. For instance, to obtain a FWFPR of α=0.1 it suffices to change a number i of prior locations, such that i=0.1N. To obtain the spatial distribution of the FWFPR, all possible combinations of i in N locations are necessary. The method is based on the calculation of the errors introduced by these changes on the estimated (interpolated) fields. The algorithm is easily implement, though. For instance, in the case of FWFPR: Set a cutoff value, cutoff For each α value (i/N) do For all combinations of i in N (locations with coordinates a,b) If C(xa,b) < cutoff, set C(xa,b) = cutoff Estimate C(xj,k), (e.g., by geostatistics): C*(xj,k) Compute the estimation error ε(xj,k)= C*(xj,k) - C(xj,k) Compute the average and maximum values of εFP(xj,k): εimaxFP(xjk) and εiaverageFP(xjk)
A similar algorithm, with the If statement changed to “If C(xa,b) > cutoff, set C(xa,b) = cutoff”, is obtained, for the facility-wide false negative rate (FWFNR): εFN(xj,k), εimaxFN(xjk) and εiaverageFN(xjk). εimaxFP(xjk) are the facility-wide maximum false positive errors for a given FWFR, and corresponds to the combinations that generated the highest errors; εiaverageFP(xjk) is the facility-wide average false positive errors for a given FWFR and location; εimaxFN(xjk) is the facility-wide maximum false positive errors for a given FWFR and location; εiaverageFN(xjk) is the facility-wide average false negative errors for a given FWFR and location. It is now possible to compute the global averaged facility-wide false positive error, πFP(xjk), and the global averaged facilitywide false negative error, πFN(xjk). The optimisation criterion is the global averaged facility-wide false error, π(xjk) resultant from adding the previous two: p
π FP ( x jk ) = ∑ α iε imax FP ( x jk )ε iaverageFP ( x jk )
(2)
π FN ( x jk ) = ∑ α iε imax FN ( x jk )ε iaverageFN ( x jk )
(3)
π ( x jk ) = π FP ( x jk ) + π FN ( x jk )
(4)
i =1 p
i =1
With p representing the number of false rates actually used. The criterion includes a local weighting of the FWFR, represented by the last two terms in equations (2) and (3); and a global weighting, represented byαi. The local weighting will balance the relative contribution of false positives and false negatives specific to each location; the global weighting reflects the posterior density function (pdf) of the estimations after introducing false detects. If this pdf could be reasonably known it would be possible to estimate other local statistics, such as the variance, allowing much more robust network designs. Unfortunately, obtaining the pdf is a very time-consuming task,
833
unpractical for most network dimensions, as it requires studying all combinations of i in N, which becomes a very high number even for modest number of monitoring locations. Obtaining the monitoring design After having calculated the criterion π(xjk), the subset of the required dimension (number of monitoring locations in the design, n) is obtained by selecting only the best n locations. This subset is the optimal design given the criterion.
HYPOTHETICAL CASE-STUDY The methodology was tested in a hypothetical case-study, consisting of a continuous source of a conservative chemical species, located in a very small area, which contaminates a porous unconfined aquifer. The period between leakage and setting of the monitoring network is 10 000 days. Modelling domain (is, in this simplified example, a rectangle with 1400 m x 700 m (E-W x N-S), discretized in a squared mesh equally spaced of 10 m, with a single layer (2D conditions). Upper and bottom limits of the porous medium are horizontal and the depth of the aquifer is 20 m. Flow boundary conditions of the first kind (Dirichlet type) prescribe the head value at 19 m on the West boundary and at 12 m on the East boundary. Recharge is 0.0005 m3 m-2 d1 , equally distributed in the domain. No pumping is considered. Given the fact that modelling conditions do not change during the modelling period, steady-state conditions are used. This case-study is a very simple one, and could easily be solved by simple analytical models, such as the Ogata & Banks (1961), however the numerical solution keeps the approach more general, as it can easily accommodate changes in the assumptions (e.g., heterogeneity of the K and dispersivity fields, and reactive transport) without having to change model formulation. The hydraulic conductivity is constant throughout the field, and equal to 10.0 m d-1. The effective porosity is considered constant and equal to 0.3. Dispersivity coefficient is isotropic and equal to 10.0 m. The amount of the chemical species entering on the top the water table at the source location varies between 0.0 g/m2.d and 400.0 g/m2.d, randomly (both location and concentration) modelled as contaminated recharge over the source area. No retardation or degradation is considered. Molecular diffusion coefficient is considered irrelevant given the groundwater flow velocity. At the onset of the simulation concentrations of the chemical species inside the domain are zero. Groundwater potentials in the domain are simulated with MODFLOW (McDonald & Harbaugh, 1988)), which solves the partial differential equations describing the threedimensional movement of groundwater with constant density and temperature in porous medium. Concentrations are simulated with the MT3D code (Zheng, 1990), for the conservative solute. The original set of concentration data was obtained by taking only 47 values out of the modelled domain. This reduction in the size of the prior information set was necessary to accelerate the computation of the kriging necessary to calculate ε(xj,k), which requires solving a kriging system for each of the combinations of i in N locations. Here i=1,...,5 and N=47 possible locations. Note that even for such small values, the number of combinations surpasses 1,5 million combinations when i=5. The values of α studied here are, then, α = {2.13; 4.26; 6.38; 8.51; 10.64}. Kriging considered the 834
following parameters: spherical variogram (nugget=10 (mg/l)2; sill=12000 (mg/l)2; range E-W=740 m; range N-S=234; grid size=50 m; block discretization=16 points). Table 3 shows some statistics for the prior information field with N=47. 700 600 500 400 300 200 100 0
0
200
400
600
800
1000
1200
1400
Fig. 1 Modelling domain and solute concentration distribution (mg/l)
Cutoff values for false positives was set at 550.0 mg/l, and for false negatives at 0.0 mg/l.
RESULTS AND DISCUSSION Figure 2 a) and c) shows the fields obtained by averaging the estimation errors over all combinations, when i=5, for false positives and false negatives, respectively. Figure 2 b) and c) shows the coordinates of the locations where a false positive/negative will have the highest impact on the estimated values; the resulting estimation errors are also shown. Table 1 shows statistics for these fields, as well as for all other i values. 700
700
600
600
500
500
400
400
300
300
200
200
100
100
0
0
200
400
600
800
1000
1200
1400
0
0
200
400
600
a) 700
700
600
600
500
500
400
400
300
300
200
200
100
100
0
0
200
400
600
800
c)
800
1000
1200
1400
800
1000
1200
1400
b)
1000
1200
1400
0
0
200
400
600
d)
Fig. 2 Resulting ε(xj,k): a) εi=5averageFP(xjk); b) εi=5maxFP(xjk); c) εi=5averageFN(xjk); d) εi=5maxFN(xjk)
835
Table 1. Statistics for εFP(xj,k) and εFN(xj,k)
εiaverage εimax
i=1 α=2.13 28.039
i=2 α=4.26 37.548
FP i=3 α=6.38 47.093
i=4 α=8.51 56.675
i=5 i=1 α=10.64 α=2.13 66.296 19.227
i=2 α=4.26 19.893
FN i=3 α=6.38 20.562
33.780
48.683
63.544
78.351
92.929
28.310
32.204
24.070
i=4 α=8.51 21.235 35.907
i=5 α=10.64 21.910 39.551
As the false rates increase from α=2.13 to α=10.64, the values of εiaverage and εimax also increase, but at a different rates. The rates of increase are also different for false positives and false negatives: they are higher in the former, but the quotient between rates of increase is higher in the latter. This indicates a relation between cutoff value and the values of εi, and that the lower cutoff value has a much higher impact on the errors of estimation, probably because the domain is dominated by contaminated water. Under the recommendation by US EPA (USEPA, 1992) of keeping false positive rate under 5%, only two out of the 47 locations should be false detects. The question is which locations are most problematic in terms of the impact in the decision to proceed to more stringent control, or to remediation? The spatial distribution of the estimation errors εiaverage and εimax (Figure 2) shows some interesting features: i) εiaverageFP has the highest values in central areas of the domain, though not where concentrations are highest; ii) εimaxFP has the highest values in peripheral areas of the domain, where concentrations are lowest; iii) εiaverageFN has the highest values in central areas of the domain, some where concentrations are highest; iv) εimaxFN has the highest values in central areas of the domain, where concentrations are highest. Hence, for false positives, the value of the criterion is balanced between concentrating locations in the most contaminated areas and in the periphery of the plume; whereas for false negatives, the tendency is to locate monitoring in the centre of the plume. This is an expected result, as false positives have more impact in locations where there should be no contamination, and false negatives where there is contamination. The method here proposed helps in quantifying the importance of the location. The results of the proposed criterion were compared with those of regular and random network designs. Figure 3 shows the resulting optimised monitoring networks when n={12; 20; 28}, for the studied designs. Table 2 shows statistics for these designs, the absolute error between the solute concentration fields interpolated with the design, C*(xj,k), and that with the prior information, C(xj,k), C*(xj,k)-C(xj,k), and the relative error, [C*(xj,k)-C(xj,k)]/C(xj,k). The former indicates the true magnitude of the difference between the estimated and the true value. It gives indication as to the order of magnitude of the errors. The latter, by normalizing by the true value, gives a measure of the precision of the estimation, as lower values indicate more precise estimations, even if associated with higher absolute errors. All calculations were made using the same variogram model used before and a domain discretised in a grid with 50 m x 50 m (386 cells). Statistics for the designs are summarized in Table 2. These should be compared with those of Table 3.
836
Table 2 Statistics for the studied monitoring network designs. Errors calculated for N=386. Regular design n=12 112.685 118.887 224.444 0.0000 2656.546 46.615 6.51×109
mean median maximum minimum variance absolute error relative error
n=20 114.155 123.576 254.276 0.0000 3752.836 49.478 1.40×1010
n=28 75.238 54.939 254.468 0.0000 4764.896 9.927 6.82×109
Random design n=12 71.020 59.954 249.220 0.0000 3478.283 5.263 2.72×109
n=20 72.158 49.958 223.604 0.172 4087.981 7.204 3.33×109
n=28 62.948 25.548 252.697 0.0000 5133.821 -2.029 4.10×108
Criterion n=12 98.193 83.515 256.917 3.703 3998.220 32.773 3.69×109
Table 3 Statistics for the prior information fields (N=47 and N=386) Mean Median Maximum Minimum 386 65.673 3.106 536.750 0.000 47 53.022 1.338 264.500 0.000 700
700
600
600
500
500
400
400
300
300
200
200
100
100
0
0
100
200
300
400
500
600
700
800
0
900 1000 1100 1200 1300 1400
0
n=20 75.380 49.615 254.409 0.0000 4948.436 10.102 2.01×109
n=28 69.161 38.325 254.237 0.0000 4978.134 4.210 8.59×108
Variance 11942.414 6842.887
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400
a)
b) 700 600 500 400 300 200 100 0
0
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400
c)
Fig. 3 Monitoring network designs: a) n=12; b) n=20; c) n=28. z: Regular design; {: random design; X: optimised design.
Results show that the random design performs very well when compared to the other designs when no false detections are considered (see the figures underlined in Table 2). In particular, the average and the median concentrations are best estimated (though highly overestimated). As a consequence, the relative and absolute errors of estimation are lower in this design. The optimised design was best estimating the maxima and the variance. The regular design performed the worst. These results indicate that if monitoring could be false-detection free then a random design would outperform the regular network, and be slightly better than the optimised design. However, monitoring has always some probability of false detections, as a consequence of improper well construction, sampling errors, cross-contamination, contamination during transport, analytical errors, and computing errors (during handling of data). Under such conditions the optimised design is, by construction, the best design.
837
CONCLUSIONS We propose in this article a new method for designing environmental monitoring networks, where the facility-wide-false positive/negative rates (FWFR) are considered in the design. Facility-wide false rates result from multiple hypotheses testing on many indicators and locations, and may result from falsely indicating that an area is contaminated when it is not (facility-wide false positives), or that the area is not contaminated, when it is (facility-wide false negatives). The methodology has the following steps: i) generation of a concentration field given the available information; ii) estimate the impact when imposing a known FWFR; iii) chose the monitoring design. The methodology was tested with a hypothetical case-study, consisting of a continuous source of a conservative chemical species, located in a very small area, which contaminates a porous unconfined aquifer. The values of α studied were α = {2.13; 4.26; 6.38; 8.51; 10.64}. Under the recommendation by US EPA (USEPA, 1992) of keeping false positive rate under 5%, only two locations should be false detects. The question is which locations are most problematic in terms of the impact in the decision to proceed to more stringent control, or to remediation? The method here proposed helps in quantifying the importance of the location, and to design the monitoring network. The results of the proposed criterion were compared with those of regular and random network designs. Results indicate that if monitoring could be false-detection free then a random design would outperform the regular network, and be slightly better than the optimised design. However, monitoring has always some probability of false detections. Under such conditions the optimised design is, by construction, the best design. Acknowledgements This work was funded by Fundação para Ciência e a Tecnologia under grant POCI/ECM/56263/2004.
REFERENCES Davis, C. B. & McNichols, R. J. (1994a) Ground water monitoring statistics update, I, Progress since 1988. Ground Water Monit. Rem., 14 (4), 148-158. Davis, C. B. & McNichols, R. J. (1994b) Ground water monitoring statistics update, II, Nonparametric prediction limits. Ground Water Monit. Rem., 14 (4), 159-175. Gibbons, R. D. (1994) Statistical Methods for Groundwater Monitoring. John Wiley and Sons, Inc., New York, USA McDonald, M. G. & Harbaugh, A. W. (1988) A Modular Three-Dimensional Finite-Difference Ground-Water Flow Model, Book 6, Chapt. 6, U.S. Geol. Surv. Techn. Water-Resour. Inv. Ogata, A. & Banks, R. B. (1961) A solution of the differential equation of longitudinal dispersion in porous media. U.S. Geol. Survey Professional Paper, 441-A, A1-A7. USEPA (1992) Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance, United States Environmental Agency. Zheng, C. (1990). MT3D: A Modular Three-Dimensional Transport Model for Simulation of Advection, Dispersion and Chemical Reactions of Contaminants in Ground Water Systems, S. S. Papadopulos and Assoc. R. Md.
838
839