S TATISTICAL P ROPERTIES OF G ENERALIZED D ISCREPANCIES AND R ELATED Q UANTITIES Christine Choirat1 and Raffaello Seri2 1

2

Centre de Recherche Viabilit´e, Jeux, Contrˆole Universit´e Paris Dauphine 75775 Paris CEDEX 16, France (email: [email protected]) CNR-IMATI via Bassini 15 20131 Milano (e-mail: [email protected])

A BSTRACT. When testing that a sample of n points in the unit hypercube [0, 1]d comes from a uniform distribution, the Kolmogorov-Smirnov and the Cram´er-von Mises statistics are simple and well-known procedures. To encompass these measures of uniformity, Hickernell (1996, 1998) introduced the so-called generalized L p −discrepancies. These discrepancies can be used in numerical integration by Monte Carlo and quasi-Monte Carlo methods, design of experiments, uniformity testing and goodness of fit tests. The aim of this paper is to derive the strong and weak asymptotic properties of these statistics.

1

I NTRODUCTION

Hickernell (1996, 1998) has introduced the generalized L p −discrepancies D p (Pn ) based on the sample of n points Pn in [0, 1]d : they extend the Kolmogorov-Smirnov and Cram´er-von Mises statistics and allow for measuring the degree of nonuniformity of a sample and the efficiency of numerical integration procedures. The links of these figures of merit with goodness-of-fit statistics and with optimal design of experiments have been pointed out by Hickernell (1999). Liang et al. (2000) have started an investigation of the asymptotic properties of D2 (Pn ) and have derived two alternative statistics, strictly linked to this one, that can be used to test statistically the efficiency of numerical integration procedures. Recently, other authors have investigated similar discrepancies (Leeb, 2001, Hoogland and Kleiss, 1996, Hoogland et al., 1997, van Hameren et al., 1997). This paper is a short exposition of the results derived in Choirat and Seri (2003), where complete results and references to the literature can be found. We define some notation that will be used in the following. For any index set u ⊂ {1, . . . , d}, we denote by |u| its cardinality, by [0, 1]u the |u| −dimensional unit

hypercube and by xu a |u| −dimensional vector containing the elements of x indexed by the elements of u.

2

A SYMPTOTIC R ESULTS FOR L p −D ISCREPANCIES

In the case 1 ≤ p ≤ +∞ (see equation (3.8b) in Hickernell, 1998) the generalized discrepancies are defined by the equation: "

¯ ( ¯ 1 ¯ |u| ¯β · ∏ µ0 (x j ) − ∑ n z∈P [0,1]u ¯ j∈u

#1/p )¯ h i ¯p ¯ 0 D p (Pn ) = ∑ ∏ µ (x j ) + x j − 1{x j >z j } ¯¯ dxu , u6=∅ n j∈u (1) where u is a subset of the set {1, . . . , d}, βnis an arbitrary given positive constant and o R µ (·) is an arbitrary function satisfying µ ∈ f : ddxf ∈ L ∞ ([0, 1]) and 01 f (x) dx = 0 . In the general case of equation (1), the choices: ¡ ¢ β = 2, µ (x) = − 21 ³x2 − x + 16 , ¯ ¯ ¯ ¯ 1´ 1 ¯ 1 ¯2 ¯ 1¯ µ (x) = − 2 x − 2 − x − 2 + 6 , β = 1, Z

2

µ (x) = 61 − x2 ,

β = 1,

yield respectively the symmetric, the centered and the star discrepancy. In particular, the star L ∞ −discrepancy coincide with the Kolmogorov-Smirnov statistic and the star L 2 −discrepancy is the Cram´er-von Mises statistic (see Hickernell, 1999). It can be shown that, when n → +∞, the discrepancy D p (Pn ) converges to 0 if and only if the sample√Pn is uniformly distributed. Moreover, under the hypothesis of uniform distribution, nD p (Pn ) converges in distribution to a well-defined random variable: this means that the average-case error of a Monte Carlo integration procedure decreases as √1n . A worst-case error is given by a Law of the Iterated Logarithm ³p ´ (LIL) for the discrepancy D p (Pn ), stating that D p (Pn ) = O ln ln n/n P − as. The asymptotic distribution of D p (Pn ) can be obtained as a function of a stochastic integral with respect to a pinned Brownian Sheet (a multivariate generalization of the Brownian Bridge). This result is quite difficult to use and therefore, in the following, we will investigate how it can be specialized in the case p = 2.

3

A SYMPTOTIC R ESULTS FOR L 2 −D ISCREPANCIES

Now, we turn to the discrepancy D2 (Pn ) and we generalize some of the previous results. In Choirat and Seri (2003), we show that the statistic proposed by Liang et al.

(2000) can be written as the V −statistic [D2 (Pn )]2 =

1 n ∑ h (xk , x` ) n2 k,`=1

(2)

where h (xk , x` ) = f (xk , x` ) − g1 (xk ) − g1 (x` ) + M d is called the kernel and: · ¸¶ d µ ¯¢ ¡ ¢ ¡ ¢ 1 ¡¯ ¡ ¢ ¡ ¢ 2 ¯ ¯ f (xk , x` ) = ∏ M + β µ xk j + µ x` j + B2 xk j − x` j + B1 xk j B1 x` j , 2 j=1 Z 1 µ ¶2 d ¡ ¡ ¢¢ dµ 2 2 dx, g1 (xk ) = ∏ M + β µ xk j , M = 1+β dx 0 j=1 1 B1 (x) = x − , 2

1 B2 (x) = x2 − x + . 6

The possibility of writing a statistic for testing uniformity in the form of a V −statistic is a quite standard fact. Therefore, the results obtained in this Section are general enough and can be applied with minor modifications to other discrepancies, such as the diaphony and the weighted spectral test of Hellekalek and Niederreiter (1998), the Fourier discrepancy of Hoogland and Kleiss (1996), the discrepancies of Grabner et al. (1999) and van Hameren et al. (1997) and the class of chi-square tests, often used to test for randomness and presented by L’Ecuyer and Hellekalek (1998). The results of the previous Section imply that n · [D2 (Pn )]2 converges to a welldefined random variable. In this Section, we expose the theoretical results pertaining to the asymptotic distribution of the proposed test statistic. Under the null hypothesis of uniformity, from equation (2), it can be proved that the asymptotic distribution of the V −statistic is given by a weighted infinite mixture of χ2 distributions and is a particular instance of what is called Gaussian Chaos. Indeed, the statistic [D2 (Pn )]2 has the asymptotic distribution: D

n · [D2 (Pn )]2 −→



∑ λ j Z 2j ,

j=1

where (Z j ) are independent standard normal random variables and (λ j ) are the eigenR values of the integral operator A m (xk ) = [0,1]d h (xk , x` ) m (x` ) dx` . Moreover, the following Berry-Ess´een bound holds with C dependent on some features of the kernel h: ° ( )° ° n ° o ∞ C ° ° 2 °P n · [D2 (Pn )] ≤ y − P ∑ λ j Z 2j ≤ y ° ≤ . ° ° n j=1 ∞

As an example, we recall, from Hickernell (1998, equations 3.9b and 5.1c), that the star L 2 −discrepancy D2 (Pn ) coincides with the Cram´er-von Mises statistic. When d = 1, the asymptotic distribution is: D

n · [D2 (Pn )]2 −→





j=1

Z 2j j2 π2

.

We have drawn 10, 000 times a sample Pn of size n (for n ∈ {25, 50, 100, 200, 400}) of uniform independent random variables on [0, 1]d . We have calculated n · [D2 (Pn )]2 for each of the 10, 000 samples for the star discrepancy with d = 1. Then, we have represented in Figure 1 the density (as a histogram and a kernel estimator) and the Q − Q plot with respect to a Gaussian random variable with the same mean and the same variance. It is clear that the density is remarkably stable as reflected by the Berry-Ess´een bound in n1 , and very far from normality. 2 5

1

n = 25 0 0

0.00

0.25

0.50

0.75

1.00

1.25

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

2 5

1

n = 50

0 0

0.00

0.25

0.50

0.75

1.00

1.25 2

5

1

n = 100 0 0

0.00

0.25

0.50

0.75

1.00

1.25 2

5

1

n = 200 0 0

0.00

0.25

0.50

0.75

1.00

1.25 2

5

1

n = 400

0 0

0.00

0.25

0.50

Density

0.75

1.00

1.25

Q−Q Plot

Figure 1. Star n · [D2 (Pn )]2 for d = 1

o √ n Under the alternative hypothesis of nonuniformity, n· [D2 (Pn )]2 − E∗ [D2 (Pn )]2 ∗ converges to a normal random variable whose variance can be calculated ³ ´ (E is the 1 mean under the alternative); the Berry-Ess´een bound is of order O √n . The simulated example of Figure 2 shows that [D2 (Pn )]2 approaches a normal distribution when

the points are not uniformly distributed on [0, 1]d . For each graphic, we have drawn 10, 000 samples Pn of size n of Beta(2, 2) independent random variables on [0, 1]. The convergence towards a Gaussian random variable is evident, but slower than the convergence towards a second order Gaussian Chaos under the null: this is reflected by the Berry-Ess´een bound.

0.50

10

0.25

n = 25

0 −2

0

2

4

6

0.50

−2.5

0.0

2.5

5.0

7.5

10.0

12.5

0.0

2.5

5.0

7.5

10.0

12.5

0.0

2.5

5.0

7.5

10.0

12.5

10

0.25

n = 50

0 −2

0

2

4

6

0.50

n = 100

0.25 0 −2

0

2

4

6

0.50

n = 200

−2.5 10

0.25 0 −2

0

2

4

6

0.50

n = 400

−2.5 10

−2

0

2

4

6

10

0.25 0 −2

0

2

4

6

−2.5

0.0

Density

2.5

5.0

7.5

10.0

12.5

Q−Q Plot

Figure 2. Convergence of [D2 (Pn )]2 towards a normal random variable

It is interesting to remark that under a set of limiting hypotheses converging to uniD

formity (but in the alternative), the asymptotic distribution has the form n·[D2 (Pn )]2 −→ 2 ∑∞j=1 λ j (Z j + a j ) , where (Z j ) are independent standard normal random variables and (a j ) is a series of constants determined by the Pitman drift. When d converges to infinity with n at a certain rate, the statistic converges to a normal distribution. Under some quite complicated conditions on the kernel h, we have: n3/2 ·{[D2 (Pn )]2 −E[D2 (Pn )]2 } D √ −→ N (0, 1) . 2 2 2 E(h(X1 ,X1 )) −(Eh(X1 ,X1 )) +2(n−1)E(h(X1 ,X2 ))

A Berry-Ess´een bound can be obtained, but it is very long and the rate of decrease to 0 depends on the interplay between d and n. However, it is interesting to remark that this upper bound can be explicitly calculated and used to assess the distance between the two distributions.

4

P RACTICAL A SPECTS OF T ESTING

We propose also some techniques for the approximation of the finite sample and of the asymptotic distribution of the statistics n · [D2 (Pn )]2 . A first technique is a numerical procedure whose convergence rate is calculated in the form of a Berry-Ess´een bound. The rationale is as follows: we replace the integral operator A through a finite dimensional one (say AN ) yielding approximation to the first N eigenvalues of A , while the others are assumed to be 0. We investigate the error of this numerical procedure through a Berry-Ess´een bound; then, the probability values can be calculated using simulation or numerical inversion of the characteristic function. The finite sample distribution of the statistic [D2 (Pn )]2 can also be approximated through Monte Carlo sampling or through the bootstrap. However, in this case, the usual bootstrap does not work and particular techniques have to be used.

R EFERENCES CHOIRAT, C., SERI, R. (2003): Statistical properties of generalized discrepancies and related quantities, Working Paper. GRABNER, P.J., LIARDET, P., TICHY, R.F. (1999): Average case analysis of numerical integration. In :W. Haußmann, K. Jetter and M. Reimer (Eds): Advances in Multivariate Approximation, Wiley-VCH, Math. Res. 107, Berlin, 185–200. HELLEKALEK, P., NIEDERREITER, H. (1998): The weighted spectral test: Diaphony. ACM Transactions on Modeling and Computer Simulation, 8, 43–60. HICKERNELL, F.J. (1996): Quadrature error bounds and applications to lattice rules. SIAM J. Numer. Anal., 33, 1995–2016. HICKERNELL, F.J. (1998): A generalized discrepancy and quadrature error bound. Math. Comp., 67, 299–322. HICKERNELL, F.J. (1999): Goodness-of-fit statistics, discrepancies and robust designs. Statist. Probab. Lett., 44, 73–78. HOOGLAND, J., JAMES, F., KLEISS, R. (1997): Quasi-Monte Carlo, discrepancies and error estimates. In: Niederreiter, H. (ed.) et al.: Monte Carlo and quasi-Monte Carlo methods 1996, Lect. Notes Stat., Springer-Verlag, 127, 266–276. HOOGLAND, J., KLEISS, R. (1996): Discrepancy-based error estimates for quasi-Monte Carlo. I: General formalism. Comput. Phys. Comm., 98, 111–127. L’ECUYER, P., HELLEKALEK, P. (1998): Random number generators: selection criteria and testing. In P. Hellekalek and G. Larcher (Eds): Random and Quasi-Random Point Sets, Springer Verlag, Berlin, 223-265. LEEB, H. (2001): Asymptotic properties of the spectral test, diaphony, and related quantities. Math. Comp., 71, 297–309. LIANG, J.-J., FANG, K.-T., HICKERNELL, F.J., LI, R. (2000): Testing multivariate uniformity and its applications. Math. Comp., 70, 337–355. VAN HAMEREN, A., KLEISS, R., HOOGLAND, J. (1997): Gaussian limits for discrepancies. Comput. Phys. Comm., 107, 1–20.

statistical properties of generalized discrepancies ...

vestigation of the asymptotic properties of D2 (Pn) and have derived two alternative statistics, strictly linked to this one, that can be used to test statistically the efficiency of numerical integration procedures. Recently, other authors have investigated similar discrepancies (Leeb, 2001, Hoogland and Kleiss, 1996, Hoogland et ...

103KB Sizes 2 Downloads 210 Views

Recommend Documents

Statistical Properties of Generalized Discrepancies
Here, an inequality of Kiefer and. Wolfowitz (see [KW58]) (indeed a multivariate extension of the Dvoretzky-Kiefer-. Wolfowitz or DKW inequality) implies the existence of positive constants c and c such that: (5.4). P. {. D∗. ∞ (Pn,u) ≥ m. √

Statistical Properties of Generalized Discrepancies and ...
Liang et al. ([11]) have started an investigation of the asymp- totic properties of D2 (Pn) and have derived two alternative statistics, strictly linked to this one, that can be used to test statisti- cally the efficiency of numerical integration pro

Generalized discrepancies on the sphere
2 Department of Quantitative Methods, School of Economics and Business Management, Universidad de Navarra, Pamplona,. Spain. Generalized discrepancies ...

Qualitative properties of generalized principal ...
[3] H. Berestycki, L. Nirenberg, S.R.S. Varadhan, The principal eigenvalue and maxi- mum principle for second-order elliptic operators in general domains, ...

On the Network Properties of Generalized Hypercubes
Due to advances in technology, it is now feasible to build large-scale parallel ... low degree, regularity, small diameter, large bisection width and high fault .... For any pair (u, v) and (u , v ) there is an auto- morphism σ of .... even me or m1

A statistical examination of the evolution and properties ...
Sep 22, 2009 - FreeBSD ports. ○ 12,010 ports. ○ 6,556 written in C. ○ 1,109,697 files. ○ 447,612 in C. AnsiC. C++. Shell. Perl. Java. PHP. Python. Lisp.

Statistical Properties of the Warped Discrete Cosine ...
of the TIMIT database (Dialect Region: North Midland) to form the train and test ... SNR (dB). Vowel Class Separability in Babble noise. WDCTC. MFCC. Figure 1: ...

Statistical Verification of Probabilistic Properties with ...
probabilistic model checking, PRISM [17], relies on iterative methods to verify properties with unbounded until. Each iteration involves a matrix–vector multi- plication, which in the worst case is O(n2), but often O(n) (for sparse models), where n

Statistical Properties and Memory of Excursions in ...
The correlation method applied to the return intervals shows a weak correlation for both groups with changes as the threshold q increases. The DFA analysis.

Statistical Properties and Memory of Excursions in ...
statistics of return intervals between long excursions above ... The DFA analysis confirms the presence ... behavior (shuffled records) is observed in original data.

Statistical Properties and Memory of Excursions in ...
cial indexes [13], x-ray solar flares [14], climate records. [15], or even long heartbeat ..... invariance in the nonstationarity of human heat rate. Phys. Rev Lett 2001 ...

A Generalized Links and Text Properties Based Forum ...
Digital Forensics Lab. Cryptography ... information on the software used to create them. Cai et al. .... 1) Signature Generation and Signature Based Clustering:.

Generalized image models and their application as statistical models ...
Jul 20, 2004 - exploit the statistical model to aid in the analysis of new images and .... classically employed for the prediction of the internal state xПtч of a ...

Exploring Systematic Discrepancies in DFT ... - ACS Publications
May 5, 2017 - agreement with prior literature findings after accounting for different .... of 35Cl |CQ| and ηQ for glycine hydrochloride, calculated with CRYSTAL software using the neutron structure ...... CrystEngComm 2016, 18, 6213−6232.

Exploring Statistical Properties for Semantic Annotation ...
kth most common word. Proof by ... tures,” J. Machine Learning Research, vol. 3, 2003. [6] David M. Blei, ... Statistical Learning in Computer Vision, 2004.

Properties of Water
electron presence. Electron density model of H2O. 1. How many hydrogen atoms are in a molecule of water? 2. How many oxygen atoms are in a molecule of ...

properties
Type. Property Sites. Address. Zip. Code. Location. East or West. Site. Acres. Main Cross Streets. Status. Price. Bldg. (GSF). Year. Built. 1 Building. Brady School.

Unsupervised Learning of Generalized Gamma ...
model (GΓMM) to implement an effective statistical analysis of .... models in fitting SAR image data histograms for most cases. [5]. ..... the greatest for large u.

2003_C_c_bmc_7-Mechanical Properties of Concrete_Reinforced ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 2003_C_c_bmc_7-Mechanical Properties of Concrete_Reinforced with AR-Glass Fibers.pdf. 2003_C_c_bmc_7-Mechani

properties of materials
ENERGY-solar energy-solar cells,solar cooker-flat plate collector for power ... circuits-active,Reactive and Apparent power,Power factor,Resonance in R-L-C.

2001_C_a_pacrim_fly_ash_Mechanical Properties of Concrete with ...
2001_C_a_pacrim_fly_ash_Mechanical Properties of Concrete with Flyash.pdf. 2001_C_a_pacrim_fly_ash_Mechanical Properties of Concrete with Flyash.pdf.