Statistical Properties of Generalized Discrepancies and Related Quantities

Christine Choirat Universit`a degli Studi dell’Insubria [email protected]

Abstract When testing that a sample of n points in the unit hypercube [0, 1]d comes from a uniform distribution, the Kolmogorov-Smirnov and the Cram´er-von Mises statistics are simple and well-known procedures. To encompass these measures of uniformity, Hickernell (1996, 1998) introduced the so-called generalized Lp −discrepancies. These discrepancies can be used in numerical integration by Monte Carlo and quasiMonte Carlo methods, design of experiments, uniformity testing and goodness of fit tests. The aim of this paper is to derive the strong and weak asymptotic properties of these statistics. Keywords: Discrepancies, Statistics.

1

Introduction

Hickernell ([4], [5]) has introduced the generalized Lp −discrepancies Dp (Pn ) based on the sample of n points Pn in [0, 1]d : they extend the Kolmogorov-Smirnov and Cram´ervon Mises statistics and allow for measuring the degree of nonuniformity of a sample and the efficiency of numerical integration procedures. The links of these figures of merit with goodness-of-fit statistics and with optimal design of experiments have been pointed out by Hickernell ([6]). Liang et al. ([11])

Raffaello Seri Universit` a degli Studi dell’Insubria [email protected]

have started an investigation of the asymptotic properties of D2 (Pn ) and have derived two alternative statistics, strictly linked to this one, that can be used to test statistically the efficiency of numerical integration procedures. Recently, other authors have investigated similar discrepancies ([10], [8], [7], [12]). However, the asymptotic properties of generalized Lp −discrepancies are still largely unknown. We fill this gap in the literature, providing formulas for the asymptotic distribution of these statistics in various cases of interest. In the special case p = 2, we are able to exploit the structure of the statistic in order to get finer results, such as rates of convergence and computational approximations. The results so obtained are very general and can be applied with minor modifications to other discrepancies, such as the diaphony and the weighted spectral test of [3], the Fourier discrepancy of [8], the discrepancies of [2] and [12] and the class of chi-square tests, often used to test for randomness and presented in [9]. We define some notation that will be used in the following. For any index set u ⊂ {1, . . . , d}, we denote by |u| its cardinality, by [0, 1]u the |u| −dimensional unit hypercube and by xu a |u| −dimensional vector containing the elements of x indexed by the elements of u.

2

Asymptotic Results for Lp −Discrepancies

In the case 1 ≤ p ≤ +∞ (see equation (3.8b) in [5]) the generalized discrepancies are de-

fined by the equation: ¯   ¯ XZ ¯ |u| Y 0 1 ¯β · Dp (Pn ) =  µ (xj ) − u ¯  n j∈u u6=∅ [0,1] ¯ ¯p 1/p i¯¯ X Yh · µ0 (xj ) + xj − 1{xj >zj } ¯¯ dxu  ¯

3

Asymptotic Results for L2 −Discrepancies

Now, we turn to the discrepancy D2 (Pn ) and we generalize some of the previous results. In [1], we show that the statistic proposed in [11] can be written as the V −statistic

z∈Pn j∈u

(1) where u is a subset of the set {1, . . . , d}, β is an arbitrary given positive constant and µ satisfying µ n (·) is an arbitrary function o ∈ R1 df ∞ f : dx ∈ L ([0, 1]) and 0 f (x) dx = 0 . In the general case of equation (1), the choices: ¡ ¢ µ (x) = − 12 ³x2 − x + 16 , β = 2, ¯ ¯ ¯ ¯ 1´ 1 ¯ 1 ¯2 1¯ ¯ µ (x) = − 2 x − 2 − x − 2 + 6 , β = 1, µ (x) =

1 6



x2 2 ,

β = 1,

yield respectively the symmetric, the centered and the star discrepancy. In particular, the star L∞ −discrepancy coincide with the Kolmogorov-Smirnov statistic and the star L2 −discrepancy is the Cram´er-von Mises statistic (see [6]). It can be shown that, when n → +∞, the discrepancy Dp (Pn ) converges P−as to 0 if and only if the sample Pn is uniformly distributed. Moreover, under the hypothesis of √ uniform distribution, nDp (Pn ) converges in distribution to a well-defined random variable: this means that the average-case error of a Monte Carlo integration procedure decreases as √1n . A worst-case error is given by a Law of the Iterated Logarithm (LIL) for the discrepancy D´ p (Pn ), stating that Dp (Pn ) = ³p ln ln n/n P−as. The asymptotic distriO bution of Dp (Pn ) can be obtained as a function of a stochastic integral with respect to a pinned Brownian Sheet (a multivariate generalization of the Brownian Bridge). This result is quite difficult to use and therefore, in the following, we will investigate how it can be specialized in the case p = 2. On the other hand, under the alternative hypothesis of nonuniformity, the asymptotic distribution is a normal random variable.

n 1 X h (xk , x` ) n2

[D2 (Pn )]2 =

(2)

k,`=1

where h (xk , x` ) = f (xk , x` ) − g1 (xk ) − g1 (x` ) + M d is called the kernel and: d Y ¡ f (xk , x` ) = M + β 2 [µ (xkj ) + µ (x`j ) j=1

¸¶ 1 + B2 (|xkj − x`j |) + B1 (xkj ) B1 (x`j ) , 2 g1 (xk ) =

d Y ¡ ¢ M + β 2 µ (xkj ) , j=1

M =1+β

Z 2 0



dµ dx

¶2 dx,

1 B2 (x) = x2 − x + . 6

1 B1 (x) = x − , 2

The results of the previous Section imply that n · [D2 (Pn )]2 converges to a well-defined random variable. In this Section, we expose the theoretical results pertaining to the asymptotic distribution of the proposed test statistic. Under the null hypothesis of uniformity, from equation (2), it can be proved that the asymptotic distribution of the V −statistic is given by a weighted infinite mixture of χ2 distributions and is a particular instance of what is called Gaussian Chaos. Indeed, the statistic [D2 (Pn )]2 has the asymptotic distribution: D

n · [D2 (Pn )]2 −→

∞ X

λj Zj2 ,

j=1

where (Zj ) are independent standard normal random variables and (λj ) are the eigenvalues of the integral operator Am (xk ) =

R

h (xk , x` ) m (x` ) dx` . Moreover, the following Berry-Ess´een bound holds with C dependent on some features of the kernel h: °  ° ° ∞ X ° ° © 2 ° C ª 2 °P nD2 (Pn ) ≤ y − P °≤ . λ Z ≤ y j j °  ° n ° ° j=1 [0,1]d

As an example, we recall, from [5], equations 3.9b and 5.1c, that the star L2 −discrepancy D2 (Pn ) coincides with the Cram´er-von Mises statistic. When d = 1, the asymptotic distribution is: ∞ X Zj2 n · [D2 (Pn )] −→ . j 2π2 2

variance can be calculated (E∗ is the mean under the alternative); ³ ´ the Berry-Ess´een bound is of order O √1n . The simulated example of Figure 2 shows that [D2 (Pn )]2 approaches a normal distribution when the points are not uniformly distributed on [0, 1]d . For each graphic, we have drawn 10, 000 samples Pn of size n of Beta(2, 2) independent random variables on [0, 1]. The convergence towards a Gaussian random variable is evident, but slower than the convergence towards a second order Gaussian Chaos under the null: this is reflected by the Berry-Ess´een bound.

D

0.50

j=1

n = 25

10

0.25 0 −2

In order to illustrate this result, we have drawn 10, 000 samples Pn of size n (for n ∈ {25, 50, 100, 200, 400}) of uniform independent random variables on [0, 1]d . We have calculated n·[D2 (Pn )]2 for each of the 10, 000 samples for the star discrepancy with d = 1. Then, we have represented in Figure 1 the density (as a histogram and a kernel estimator) and the Q−Q plot with respect to a Gaussian random variable with the same mean and the same variance. It is clear that the density is remarkably stable as reflected by the Berry-Ess´een bound in n1 , and very far from normality. 2 5

0 0.00

0.25

0.50

0.75

1.00

1.25

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

2 5

1

n = 50

0 0

0.00

0.25

0.50

0.75

1.00

1.25 2

5

1

n = 100 0 0

0.00

0.25

0.50

0.75

1.00

1.25 2

5

1

n = 200 0 0

0.00

0.25

0.50

0.75

1.00

1.25 2

5

1

n = 400

0 0

0.00

0.25

0.50

Density

0.75

1.00

1.25

2

4

6

0.50

n = 50

5.0

7.5

10.0

12.5

0

2

4

6

−2.5

0.0

2.5

5.0

7.5

10.0

12.5

0.0

2.5

5.0

7.5

10.0

12.5

10

0.25 0 −2

0

2

4

6

0.50

−2.5 10

0.25 0 −2

0

2

4

6

0.50

n = 400

2.5

0 −2

n = 200

0.0

0.25

0.50

n = 100

−2.5 10

−2

0

2

4

6

10

0.25 0 −2

0

2

4

6

−2.5

0.0

2.5

5.0

7.5

10.0

12.5

Q−Q Plot

Density

Figure 2: Convergence of [D2 (Pn )]2 towards a normal random variable It is interesting to remark that under a set of limiting hypotheses converging to uniformity (but in the alternative), the asymptotic D

1

n = 25 0

0

distribution has the form n · [D2 (Pn )]2 −→ P ∞ 2 j=1 λj (Zj + aj ) , where (Zj ) are independent standard normal random variables and (aj ) is a series of constants determined by the Pitman drift. When d converges to infinity with n at a certain rate, the statistic converges to a normal distribution. Under some quite complicated conditions on the kernel h, we have:

Q−Q Plot

Figure 1: Star n · [D2 (Pn )]2 for d = 1 and varying n Under the alternative hypothesis of nonunin o √ formity, n · [D2 (Pn )]2 − E∗ [D2 (Pn )]2 converges to a normal random variable whose



n3/2 ·{[D2 (Pn )]2 −E[D2 (Pn )]2 }

E(h(X1 ,X1 ))2 −(Eh(X1 ,X1 ))2 +2(n−1)E(h(X1 ,X2 ))2 D

−→ N (0, 1) . A Berry-Ess´een bound can be obtained, but it is very complex and the rate of decrease to 0 depends on the interplay between d and n. However, it is interesting to remark that this

upper bound can be explicitly calculated and used to assess the distance between the two distributions.

4

Practical Aspects of Testing

We propose also some techniques for the approximation of the finite sample and of the asymptotic distribution of the statistics n · [D2 (Pn )]2 . A first technique is a numerical procedure whose rationale is as follows: we replace the integral operator A through a finite dimensional one (say AN ) yielding approximation to the first N eigenvalues of A, while the others are assumed to be 0. We investigate the error of this numerical procedure through a Berry-Ess´een bound; then, the probability values can be calculated using simulation or numerical inversion of the characteristic function. The finite sample distribution of the statistic [D2 (Pn )]2 can also be approximated through Monte Carlo sampling or through the bootstrap. However, in this case, the usual bootstrap does not work and particular techniques have to be used. Acknowledgements This paper is a short exposition of the results derived in [1], where complete results and references to the literature can be found.

References [1] CHOIRAT, C., SERI, R. (2003): Statistical properties of generalized discrepancies and related quantities, Working Paper. [2] GRABNER, P.J., LIARDET, P., TICHY, R.F. (1999): Average case analysis of numerical integration. In :W. Haußmann, K. Jetter and M. Reimer (Eds): Advances in Multivariate Approximation, Wiley-VCH, Math. Res. 107, Berlin, 185–200. [3] HELLEKALEK, P., NIEDERREITER, H. (1998): The weighted spectral test: Diaphony. ACM Transactions on Modeling and Computer Simulation, 8, 43–60.

[4] HICKERNELL, F.J. (1996): Quadrature error bounds and applications to lattice rules. SIAM J. Numer. Anal., 33, 1995– 2016. [5] HICKERNELL, F.J. (1998): A generalized discrepancy and quadrature error bound. Math. Comp., 67, 299–322. [6] HICKERNELL, F.J. (1999): Goodnessof-fit statistics, discrepancies and robust designs. Statist. Probab. Lett., 44, 73–78. [7] HOOGLAND, J., JAMES, F., KLEISS, R. (1997): Quasi-Monte Carlo, discrepancies and error estimates. In: Niederreiter, H. (ed.) et al.: Monte Carlo and quasi-Monte Carlo methods 1996, Lect. Notes Stat., Springer-Verlag, 127, 266– 276. [8] HOOGLAND, J., KLEISS, R. (1996): Discrepancy-based error estimates for quasi-Monte Carlo. I: General formalism. Comput. Phys. Comm., 98, 111–127. [9] L’ECUYER, P., HELLEKALEK, P. (1998): Random number generators: selection criteria and testing. In P. Hellekalek and G. Larcher (Eds): Random and Quasi-Random Point Sets, Springer Verlag, Berlin, 223-265. [10] LEEB, H. (2001): Asymptotic properties of the spectral test, diaphony, and related quantities. Math. Comp., 71, 297–309. [11] LIANG, J.-J., FANG, K.-T., HICKERNELL, F.J., LI, R. (2000): Testing multivariate uniformity and its applications. Math. Comp., 70, 337–355. [12] VAN HAMEREN, A., KLEISS, R., HOOGLAND, J. (1997): Gaussian limits for discrepancies. Comput. Phys. Comm., 107, 1–20.

Statistical Properties of Generalized Discrepancies and ...

Liang et al. ([11]) have started an investigation of the asymp- totic properties of D2 (Pn) and have derived two alternative statistics, strictly linked to this one, that can be used to test statisti- cally the efficiency of numerical integration procedures. Recently, other authors have in- vestigated similar discrepancies ([10], [8], [7],.

157KB Sizes 4 Downloads 186 Views

Recommend Documents

Statistical Properties of Generalized Discrepancies
Here, an inequality of Kiefer and. Wolfowitz (see [KW58]) (indeed a multivariate extension of the Dvoretzky-Kiefer-. Wolfowitz or DKW inequality) implies the existence of positive constants c and c such that: (5.4). P. {. D∗. ∞ (Pn,u) ≥ m. √

statistical properties of generalized discrepancies ...
vestigation of the asymptotic properties of D2 (Pn) and have derived two alternative statistics, strictly linked to this one, that can be used to test statistically the efficiency of numerical integration procedures. Recently, other authors have inve

Generalized discrepancies on the sphere
2 Department of Quantitative Methods, School of Economics and Business Management, Universidad de Navarra, Pamplona,. Spain. Generalized discrepancies ...

Qualitative properties of generalized principal ...
[3] H. Berestycki, L. Nirenberg, S.R.S. Varadhan, The principal eigenvalue and maxi- mum principle for second-order elliptic operators in general domains, ...

A statistical examination of the evolution and properties ...
Sep 22, 2009 - FreeBSD ports. ○ 12,010 ports. ○ 6,556 written in C. ○ 1,109,697 files. ○ 447,612 in C. AnsiC. C++. Shell. Perl. Java. PHP. Python. Lisp.

Statistical Properties and Memory of Excursions in ...
The correlation method applied to the return intervals shows a weak correlation for both groups with changes as the threshold q increases. The DFA analysis.

Statistical Properties and Memory of Excursions in ...
statistics of return intervals between long excursions above ... The DFA analysis confirms the presence ... behavior (shuffled records) is observed in original data.

Statistical Properties and Memory of Excursions in ...
cial indexes [13], x-ray solar flares [14], climate records. [15], or even long heartbeat ..... invariance in the nonstationarity of human heat rate. Phys. Rev Lett 2001 ...

A Generalized Links and Text Properties Based Forum ...
Digital Forensics Lab. Cryptography ... information on the software used to create them. Cai et al. .... 1) Signature Generation and Signature Based Clustering:.

On the Network Properties of Generalized Hypercubes
Due to advances in technology, it is now feasible to build large-scale parallel ... low degree, regularity, small diameter, large bisection width and high fault .... For any pair (u, v) and (u , v ) there is an auto- morphism σ of .... even me or m1

Generalized image models and their application as statistical models ...
Jul 20, 2004 - exploit the statistical model to aid in the analysis of new images and .... classically employed for the prediction of the internal state xПtч of a ...

Statistical Properties of the Warped Discrete Cosine ...
of the TIMIT database (Dialect Region: North Midland) to form the train and test ... SNR (dB). Vowel Class Separability in Babble noise. WDCTC. MFCC. Figure 1: ...

Statistical Verification of Probabilistic Properties with ...
probabilistic model checking, PRISM [17], relies on iterative methods to verify properties with unbounded until. Each iteration involves a matrix–vector multi- plication, which in the worst case is O(n2), but often O(n) (for sparse models), where n

Exploring Systematic Discrepancies in DFT ... - ACS Publications
May 5, 2017 - agreement with prior literature findings after accounting for different .... of 35Cl |CQ| and ηQ for glycine hydrochloride, calculated with CRYSTAL software using the neutron structure ...... CrystEngComm 2016, 18, 6213−6232.

STUDY OF MECHANICAL AND ELECTRICAL PROPERTIES OF ...
STUDY OF MECHANICAL AND ELECTRICAL PROPERTIES OF VINYLESTER NANOCOMPOSITES.pdf. STUDY OF MECHANICAL AND ELECTRICAL ...

Generalized and Doubly Generalized LDPC Codes ...
The developed analytical tool is then exploited to design capacity ... error floor than capacity approaching LDPC and GLDPC codes, at the cost of increased.

Exploring Statistical Properties for Semantic Annotation ...
kth most common word. Proof by ... tures,” J. Machine Learning Research, vol. 3, 2003. [6] David M. Blei, ... Statistical Learning in Computer Vision, 2004.

Dynamical and Correlation Properties of the Internet
Dec 17, 2001 - 2International School for Advanced Studies SISSA/ISAS, via Beirut 4, 34014 Trieste, Italy. 3The Abdus ... analysis performed so far has revealed that the Internet ex- ... the NLANR project has been collecting data since Novem-.

Synthesis and physicochemical properties of merocyanine ... - Arkivoc
Mar 30, 2017 - dyes find wide use in many areas of human activity: optoelectronics, photovoltaics, biology, and medicine. 2,15,16. Thermophotoresistors ...

PROJECT ON PROPERTIES AND APPLICATION OF PARABOLA ...
PROJECT ON PROPERTIES AND APPLICATION OF PARABOLA AND ELLIPSE.pdf. PROJECT ON PROPERTIES AND APPLICATION OF PARABOLA AND ...

Structural, magnetic, and electric properties of La0.7Sr0.3MnO3 ...
Sep 23, 2008 - ricate high quality films and heterostructures of these mate- rials. Since the ... Curie temperature of about 360 K above room temperature and since it is known to ... tion data the substrate contribution was subtracted. Table I.

Synthesis and antimitotic properties of orthosubstituted ... - Arkivoc
Jun 20, 2017 - 60 human cancer cell lines with mean GI50, TGI and LC50 values of 3.39, ... products 7–9 was reduced by formation of byproducts 5 and 6.