The Bernstein Copula and its Applications to Modelling ...

Viewer
Transcript

The Bernstein Copula and its Applications to Modelling and Approximations of Multivariate Distributions∗ Alessio Sancetta Department of Applied Economics and Trinity College, University of Cambridge e-mail: [email protected], Tel +44-1223-335272 Stephen Satchell Faculty of Economics and Politics and Trinity College, University of Cambridge e-mail: [email protected], Tel +44-1223-338409 October 16, 2006

∗ We

would like to thank Mark Salmon for interesting us in the copula function, and Peter Phillips, an associate

editor, and the referees for many valuable comments. All remaining errors are our sole responsibility.

1

Bernstein Copula Corresponding Author: Stephen Satchell, Trinity College, Cambridge CB2 1TQ. Tel +441223-338409.

2

We define the Bernstein copula and study its statistical properties in terms of both distributions and densities. We also develop a theory of approximation for multivariate distributions in terms of Bernstein copulae. Rates of consistency when the Bernstein copula density is estimated empirically are given. In order of magnitude, this estimator has variance equal to the square root of the variance of common nonparametric estimators, e.g. kernel smoothers, but it is biased as an histogram estimator.

3

1

Introduction

The task of modelling multivariate distributions has always been a challenging one. For this reason, it is very common to use elliptic distributions due to the fact that these are simple to characterize. However, in many cases of interest in economics, it is found that this simple characterization contrasts with empirical evidence. A typical example is returns distributions in financial economics and the complex range of dependence that they exhibit. Among many references, the reader should look at Embrechts et al. (1999). When dealing with vectors of random variables, the copula function becomes a very useful object because it allows us to model the dependence between the variables separately from their marginals. In practice we may have a clear idea of what the marginals are, while knowledge of their joint dependence could be limited. Therefore, being able to use a two step procedure may be advantageous. Further, computer estimation can be speeded up. Though this may result in a loss of eﬃciency, corrections for the estimators have been proposed (e.g. White, 1994). For example, in the first step we may model either unconditional marginal distributions, or conditional distributions as in the case of GARCH models. Then, cross dependence of the marginal distributions could be modelled in the second step. The same approach is valid in the case of stationary time series when we want to model their joint distribution given their unconditional marginals (see Darsow et al., 1992, when the series is Markovian). While the copula function is a fairly new concept in econometrics, there has been a growing interest in financial econometrics; for example, Hu (2002), Bouyé et al. (2001), Patton (2001), Rockinger and Jondeau (2001), and Longin and Solnik (2000). The last authors deal with multivariate extreme value theory in the context of financial assets and use a dependence function equivalent to the copula function. The copula has been considered elsewhere to study extreme values (see Joe, 1997, for a list of extreme value copulae). We introduce a family of copulae defined in terms of Bernstein polynomials. We call a copula from this family a Bernstein copula (BC). After defining a new class of copulae, we show how the Bernstein copula can be used as an approximation to known and unknown copulae. When used as an approximation for known copulae, we call the approximation the approximate Bernstein copula (ABC) representation. This new representation leads to a general approach in estimation

4

as well as simplification of many operations whenever a parametric copula is given. Moreover, for parametric copulae, the Bernstein representation is useful in studying properties of the copula function itself. When the same approach is used to approximate unknown copulae, we call this the empirical Bernstein copula (EBC). There are many possible representations of continuous functions in terms of polynomials, e.g. Hermite polynomials (the Edgeworth expansion) and Padé approximations (the extended rational polynomials in Phillips, 1982, 1983). However, none of these polynomial representations share the same properties of Bernstein polynomials in the context of the copula function. Importantly, Bernstein polynomials are closed under diﬀerentiation; for simple restrictions on the coeﬃcients they always lead to a proper copula function; and when used as empirical estimators, they have lower variance than commonly used nonparametric estimators. The plan for the paper is as follows. Section 2 introduces the Bernstein copula and derives some of its mathematical and statistical properties. In Section 3 we introduce the important extension of the Bernstein representation of given copulae. Issues related to making the polynomial representation operational are considered in Section 4, where, as mentioned above, it is shown that empirical estimation of the Bernstein copula density has remarkably good properties in terms of its asymptotic variance. Moreover, a short simulation is provided in order to highlight some of the theoretical results in a finite sample. Other estimation procedures are possible, and we discuss some, but by no means all of the issues, in Section 5.

2

The Bernstein Copula

We first recall Sklar’s representation for multivariate distributions. Let H be a k-dimensional distribution function with one dimensional marginals F1 , ..., Fk , then there exists a function C from the unit k-cube to the unit interval such that H(x1 , ..., xk ) = C (F1 (x1 ), ..., Fk (xk )) ; C is referred to as the k-Copula. If each Fj is continuous, the copula is unique. For more details see Sklar (1973). We list some properties of the copula function C: (1) C is nondecreasing in all its arguments; 5

(2) C satisfies the Fréchet bounds min (0, u1 + ... + uk − (k − 1)) ≤ C (u1 , ..., uk ) ≤ min (u1 , ..., uk ) , which implies that C is grounded: i.e. C(u1 , ..., uk ) = 0 if uj = 0 for at least one j, and C(1, ..., 1, uj , 1, ..., 1) = uj , ∀j; k Q (3) uj is a copula for independent random variables, which is called the product copula; j=1

(4) C is Lipschitz with constant one, i.e.

|C (x1 , ..., xk ) − C (y1 , ..., yk )| ≤

k X j=1

|xj − yj | .

Properties (1) and (2) are necessary and suﬃcient for the definition of the copula function. ³ ´ vk v1 Let α m , ..., m be a real valued constant indexed by (v1 , ..., vk ) , such that 0 ≤ vj ≤ mj ∈ 1 k

N. We could also use αv1 ,...,vk , 0 ≤ vj ≤ mj , ∀j. Let µ

Pvj ,mj (uj ) ≡

¶ mj vj uj (1 − uj )mj −vj . vj

(1)

Define the following map CB : [0, 1]k → [0, 1] , where CB (u1 , ..., uk ) =

m1 X

...

v1 =0

Under specific conditions on α

³

mk X

α

vk =0

vk v1 m1 , ..., mk

µ

´

v1 vk , ..., m1 mk

¶

Pv1 ,m1 (u1 ) · · · PvK ,mk (uk ).

(2)

, (2) can be shown to be a copula. In particular, (2)

is defined in terms of k-dimensional Bernstein polynomials. We state the following. Definition 1. A Bernstein polynomial approximation to f ∈C [0,1]k ( C [0,1]k is the space of k continuous bounded functions on [0, 1]k ) is obtained by applying a linear operator Bm to f ∈C [0,1]k

such that k (Bm f )(x) ≡

m1 X

...

v1 =0

mk X

vk =0

f

µ

v1 vk , ..., m1 mk

¶

Pv1 ,m1 (x1 ) · · · Pvk ,mk (xk ),

(3)

or in the more general singular integral representation via the Stieltjes integral k (Bm f )(x)

=

Z1 0

···

Z1

f (t1 , .., tk )dt1 Km1 (x1 , t1 ) · · · dtk Kmk (xk , tk ),

0

for the kernel Km (x, t) ≡

X µm¶ xv (1 − x)m−v , v

v≤mt

Km (x, 0) ≡ 0,

6

(4)

which is constant for

v m

≤t<

v+1 m

and has jumps of

¡m¢ v m−v at points t = v x (1 − x)

v m.

The singular integral representation establishes some clear parallels to kernel density estimation in statistics. One important property of multivariate Bernstein polynomials is given next (e.g. DeVore and Lorentz, 1993, p. 10). Lemma 1. Let C [0,1]k be the space of bounded continuous functions in the k dimensional hypercube [0, 1]k . Then, the set of Bernstein polynomials defined in (3) is dense in C [0,1]k . Further, the derivatives of a Bernstein polynomial approximation converge to the approximand whenever this is diﬀerentiable (see, e.g., (12) and (13) below). This concept and its usefulness will become clear in Section 2.2. Throughout the paper, we reserve the symbols CB , Cn and C for the Bernstein copula, the empirical copula based on n observations, and a general copula, respectively, and use cB for the Bernstein copula density; definitions will be given in due course. The letters m and n will only be used to define the order of polynomial and the sample size, respectively.

2.1

The Definition of Bernstein Copula

In order to restrict (2) to be a copula function, we need to impose some conditions on α Theorem 1. CB (u1 , ..., uk ) is a copula function if 1 X

l1 =0

....

1 X

l1 +....+lk

(−1)

α

lk =0

µ

v1 + l1 vk + lk , ..., m1 mk

¶

³

vk v1 m1 , ..., mk

≥ 0,

´

(5)

letting 0 ≤ vj ≤ mj − 1, j = 1, ..k, and µ ¶ µ ¶ µ ¶ v1 vk v1 vk vk v1 min 0, + ... + − (k − 1) ≤ α , ..., , ..., ≤ min , m1 mk m1 mk m1 mk in particular lim α

vj →0

and

µ

v1 vk , ..., m1 mk

¶

= 0, ∀j = 1, ..., k,

µ

¶ vj vj α 1, ..., 1, , 1, ...., 1 = , ∀j = 1, ..., k. mj mj

(6)

(7)

(8)

As a result of (7) in Theorem 1, we could take 1 ≤ vj ≤ mj , ∀j, in (2). Theorem 1 gives suﬃcient conditions which are necessary only if m → ∞. Therefore, we can define the object to be studied in this paper.

7

.

Definition 2. If α Bernstein copula.

³

vk v1 m1 , ..., mk

´

satisfies (5) and (6), we call CB (u1 , ..., uk ) in (2) the

From Theorem 1 we see that a Bernstein copula is a genuine copula, and by Lemma 1, we know that the Bernstein copula may serve as an approximation to a given arbitrary parametric copula. By the properties of Bernstein polynomials, the coeﬃcients of the Bernstein copula have a direct interpretation as the points of some arbitrary approximated copula. In this approach, we use the terms ABC or EBC (see Sections 3 and 4) The Bernstein copula belongs to the family of polynomial copulae. Polynomial copulae are special cases of copulae with polynomial sections in one or more variables (details on copulae with polynomial sections can be found in Nelsen, 1998, ch. 3). In some cases, it is useful to consider the following representation of the Bernstein copula as the sum of the product copula and a perturbation term, µ ¶ mk m1 X X v1 vk CB (u1 , ..., uk ) = u1 · · · uk + ... γ , ..., Pv1 ,m (u1 ) · · · PvK ,m (uk ) m1 mk v1 =0 vk =0 µ ¶ m m X X v1 vk ... α , ..., Pv1 ,m (u1 ) · · · PvK ,m (uk ), = m1 mk v =0 v =0 1

where γ

µ

(9)

k

v1 vk , ..., m1 mk

¶

=α

µ

vk v1 , ..., m1 mk

¶

−

vk v1 ··· . m1 mk

(10)

The equality follows from the fact that u1 · · · uk =

2.2

¶ mk µ X v1 vk ... , ..., Pv1 ,m (u1 ) · · · PvK ,m (uk ). m1 mk =0 v =0

m1 X

v1

k

The Bernstein Density

Any Bernstein copula has a copula density; this is because the Bernstein copula is absolutely continuous. Define ∆1,...,k as the k dimensional forward diﬀerence operator, i.e. µ ¶ 1 1 ³v v1 + l 1 vk ´ X X vk + l k 1 k+l +...+lk ∆1,...,k α ... (−1) 1 α , ..., ≡ , ..., , m m m m l1 =0

(11)

lk =0

where for the ease of notation, we set mj = m ∀j. By the properties of Bernstein polynomials, the Bernstein copula density has the following appealing structure cB (u1 , ..., uk ) = mk

m−1 X

...

v1 =0

m−1 X

vk =0

∆1,...,k α

³v

1

m

, ...,

×Pv1 ,m−1 (u1 ) · · · PvK ,m−1 (uk ), 8

vk ´ m

where cB =

∂ k CB ∂u1 ···∂uk

and the expression is obtained by direct diﬀerentiation of (2) with respect to

each variable and rearranging; see Lorentz (1953) for the univariate case. Diﬀerentiating, a term in the summation is lost, and the coeﬃcients of the polynomial are written in diﬀerence form which is directly linked to the k-dimensional rectangle inequality in (5), i.e. the copula density is always positive. For convenience, we use the following definition for the Bernstein copula density, m X

cB (u1 , ..., uk ) =

v1 =0

where β

¡ v1

vk m , ..., m

Notice that ∆α

¢

³v

β

1

m

vk =0

, ...,

vk ´ m

k µ ¶ Y m vj uj (1 − uj )m−vj , × v j j=1

(12)

is defined accordingly, i.e. ³v

vk ´ k β , ..., ≡ (m + 1) ∆1,...,k α m m

¡v¢ m

m X

...

=α

1

¡v

m

+

1 m

¢

−α

¡v¢

lim ∆α

m→∞

m

µ

v1 vk , ..., m+1 m+1

¶

.

(13)

so that

³v´ m

³v´

m = (∂/∂v) α

m

(when the derivative exists) and similarly in higher dimensions, i.e. lim β

m→∞

³v

1

m

, ...,

¢ vk ´ ¡ k = ∂ /∂v1 · · · ∂vk α m

µ

v1 vk , ..., m+1 m+1

¶

(when this exists).

2.3

Spearman’s Rho and the Moment Generating Function of the Bernstein Copula

We consider the moments of the Bernstein copula. Many operations find convenient representation in terms of hypergeometric functions. For an introduction to hypergeometric functions for economists and many of the symbols we use, the reader is referred to Abadir (1999). The copula is CB (u1 , ..., uk ) =

m X

...

v1 =0

m X

vk =0

α

³v

1

m

, ...,

vk ´ m

k µ ¶ Y m vj uj (1 − uj )nj −vj , × v j j=1

9

and its bivariate marginal distribution, say for u1 and u2 , is ³v

´ v2 , 1, ..., 1 m m v1 =0 v2 =0 µ ¶ Y m v uj j (1 − uj )m−vj . × v j j=1,2 m m X X

CB (u1 , u2 , 1, ..., 1) =

α

1

,

We now calculate Spearman’s rho (ρS ). Using well known properties of the uniform distributions on [0, 1], namely that it has mean

1 2

and variance ρS

1 12 ,

= 12cov (u, v) = 12E (uv) − 3,

where Z

E (uv) =

Z

=

uvdC (u, v) (1 − u − v + C (u, v)) dudv,

using integration by parts. It should be noted that ρS is independent of the definition of the marginals whereas Pearson’s correlation coeﬃcient (i.e. conventional correlation) does depend upon the marginal distributions; see Schweizer and Wolﬀ (1981) for further discussion. Random variables, which have zero covariance, could have non-zero ρS . The use of ρS in financial economics could be advocated on the basis of the documented non-linearities and its simple estimation. For the Bernstein copula ρS can be derived as follows, ρS

= 12

Z1 Z1 0

= 12

[1 − u1 − u2 + CB (u1 , u2 , 1, ..., 1)] du1 du2 − 3

0

m X m X

γ

v1 =0 v2 =0

³v

´ v2 , 1, ..., 1 m m 1

,

1 1 Y µ m¶ Z Z v uj j (1 − uj )m−vj du1 du2 × v j j=1,2 0

= 12

m m X X

v1 =0 v2 =0

γ

0

³v

´ v2 , 1, ..., 1 m m 1

,

Y µ m¶ B (vj + 1, m + 1 − vj ) , × vj j=1,2 where γ was defined in (10) and B(a, b) is the beta function. The first equality follows by writing

10

the Bernstein copula as the sum of the product copula and the perturbation term. Notice that 12

Z1 Z1 0

(1 − u1 − u2 + u1 u2 ) du1 du2 = 3.

0

Even when the Bernstein copula is used as an approximation (see next Section), the above Spearman’s rho can be used as an approximation to the true Spearman’s rho of any copula. If enough terms of our proposed Bernstein approximation are included, Spearman’s rho can be easily found to any degree of accuracy without the need for evaluating complicated integrals. We derive the moment generating function (mgf )of the density in (12). We do it for the one variable case, then we just extend it to the k-dimensional case. Z1

Mu (t) =

exp {tu} c(u)du

0

where β

¡v¢ m

m ³ v ´ µm¶ Z1 X β exp {tu} uv (1 − u)m−v du, = m v v=0 0

is given by (13) for k = 1. Before proceeding any further, we notice the following

(see Marichev, 1983, p. 87), 1

F 1 (a; c; z) B (a, c) =

Z1

exp {zτ } τ a−1 (1 − τ )c−a−1 dτ ,

0

Re c >Re a > 0, where 1 F 1 (a; c; z) is Kummer’s confluent hypergeometric function and Γ (c) is the gamma function. For a ≡ v + 1, c ≡ m + 2, and z ≡ t this implies Z1

exp {tu} uv (1 − u)m−v du = 1 F 1 (v + 1; m + 2; t) B (v + 1, m − v + 1) .

0

Therefore, Mu (t) =

m ³ v ´ µm¶ X β 1 F 1 (v + 1; m + 2; t) B (v + 1, m − v + 1) m v v=0

To obtain the moment generating function for the k-dimensional Bernstein copula density, we replace the univariate result in the multivariate definition. Defining Mc (t) as the mgf of (12), for t a k × 1 vector, Mc (t) =

Z1

···

0

Z1

exp {t • u}

0

m X

v1 =0

...

m X

vk =0

β

³v

1

m

, ...,

k µ ¶ Y m vj u (1 − uj )m−vj du1 · · · duk × v j j=1

=

m X

v1 =0

...

k µ ¶ vk ´ Y m β 1 F 1 (vj + 1; m + 2; tj ) B (vj + 1, m − vj + 1) , , ..., m m j=1 vj =0

m X

vk

vk ´ m

³v

1

11

where • is the inner product. These results can be used to further investigate the properties of the Bernstein copula and its approximations. Deriving results for the joint moments of the Bernstein copula is quite easy in virtue of its incomplete Beta function representation. The joint moments are important to study the scale-free dependence properties of the variables.

3

ABC: Bernstein Representation of Arbitrary Copulae

While the Bernstein copula should be regarded as a copula in its own right, it is particularly suited to problems where a parametric copula is available but in a very complicated form. In this case, the Bernstein copula can be used in place of the original copula. We call this the approximate Bernstein copula (ABC) representation. By the approximation properties of Bernstein polynomials, the coeﬃcients are simple to find. One may object that Bernstein polynomials have a slower rate of convergence as compared to other polynomial approximations (see Theorem 2, below). However, they have the best rate of convergence within the class of all operators with the same shape preserving property (see Berens and DeVore, 1980). To give an example of the viability of the Bernstein approximation and its range of dependence we approximate the Kimeldorf and Sampson copula (see e.g. Joe, 1997, p. 141), which is equal to ¡ ¢− 1 C (u, v) ≡ u−θ + v −θ − 1 θ , θ ≥ 0.

(14)

Notice that this copula has unbounded density. Figure I shows the 3-dimensional graph of the Kimeldorf and Sampson copula density. [FigureI, about here] In Table I, we report the values of Spearman’s rho as a function of the dependence parameter θ in the approximation for m = 10, 30, 50, 100, 200, 300, and the corresponding ones for the Kimeldorf and Sampson copula (KS). Because of computational diﬃculties we do not calculate the approximation when the dependence parameter achieves its limiting value (∞). Figure II shows the contour plot of the two copulae when θ = 1.06 and m = 30. These seem to be indistinguishable. [ FigureII, about here ]

12

All diﬀerences are due to polynomials being fairly slow in adjusting at turning points. Improvements can be achieved by increasing the order of the polynomial, keeping all computations manageable and straightforward. Integral evaluation for the computation of Spearman’s rho for the Kimeldorf and Sampson copula can only be done numerically.

[Table I, about here] To lend some rigour to this numerical example, we state the following. Theorem 2. Let f ∈C [0,1]k and

∂f ∂xj

be Lipschitz ∀j, then

k X ¯ ¯ k xj (1 − xj ) ¯(Bm f ) (x1 , ..., xk ) − f (x1 , ..., xk )¯ ≤ M , 2m j=1

k is the k dimensional Bernstein operator and M is a constant. where Bm

Proof. See the Appendix. This rate of approximation can be improved using linear combinations of Bernstein polynomials (see Butzer, 1953). Let f (2l) (i.e. the 2l derivative of f ) be Lipschitz of order γ, then Butzer (1953) shows that one can construct a linear combination of 1-dimensional Bernstein polynomials ¡ ¢ ¡ ¢ of order m such that the error is O m−l−γ compared to O n−2l−γ for the best approximating polynomials of order m (Jackson, 1930, p. 18). However, the coeﬃcients of Bernstein polyno-

mials have clear interpretations both in the case of the copula and the copula density. Further, all the derivatives of Bernstein polynomials converge to the true derivatives of the approximated function when the latter is diﬀerentiable. Other optimal properties of Bernstein polynomials in connection with the copula function and joint continuity of Markov operators are discussed in Li et al. (1998) and Kulpa (1999). A limitation of the Bernstein copula is that it cannot be used to model extreme tail behaviour defined in terms of the coeﬃcient of tail dependence.1 This happens because convergence under the sup norm is not suﬃcient for assuring that the Bernstein copula and its approximand converge to an arbitrary limit at the same speed. Nevertheless, the Bernstein copula can capture increasing dependence as we move to the tails. In fact, the Kimeldorf and Sampson copula in (14) exhibits lower tail dependence and the Bernstein copula can approximate arbitrarily well the tail behaviour of this copula inside the cube.

13

4

EBC: An Estimation Procedure

There are several possible estimation procedures which can be employed to make the Bernstein copula operational. However, the Bernstein copula might also be used as an estimator for unknown copulae. In this case, we would use the Bernstein copula density as an estimator, where the coeﬃcients of the copula are given by the empirical copula and the order of polynomial would be related to the smoothing properties of the estimator. We call this the EBC density.

4.1 Let Cn

The EBC Density ¡ v1

vk m , ..., m

¢

¡ ¢ be the empirical copula at vm1 , ..., vmk , i.e. ⎧ ⎫ n k ¤⎬ 1 X ⎨\ £ I ujs ≤ tvj , ⎩ ⎭ n s=1

(15)

j=1

where I{A} is the indicator of the set A and tvj ≡

vj m.

By construction, the empirical copula is a

valid distribution function. However, it has marginals which are uniform only asymptotically, i.e. n

1X I {ujs ≤ uj } = uj , n→∞ n s=1 lim

for j = 1, ..., k. Therefore it is a valid copula only asymptotically.2 The EBC, say C˜B (u) , is defined as the usual Bernstein copula where α

³v

1

m

, ...,

³v vk ´ vk ´ 1 = Cn , ..., . m m m

Diﬀerentiating, it is easy to see that the coeﬃcients of the polynomial are equivalent to a kdimensional histogram estimator (see Scott, 1992, for details of the histogram estimator), ⎧ ⎫⎞ ⎛ m−1 m−1 n k ⎨\ k X X X £ ¤⎬ m c˜B = .... 41,....,k ⎝ I ujs ≤ tvj ⎠ ⎩ ⎭ n v =0 v =0 s=1 j=1 1

k

µ ¶ m − 1 vj × uj (1 − uj )m−1−vj , v j j=1 k Y

(16)

where we use c˜B to stress that it is a particular estimator and 41,....,k is the k-dimensional diﬀerence operator, i.e. ⎧ ⎧ ⎫ ⎫ ¸⎬ k 1 1 k ∙ ⎨\ ⎨\ X £ ¤⎬ X l j l +....+lk 41,....,k I .... (−1) 1 I ujs ≤ tvj + ujs ≤ tvj = . ⎩ ⎩ ⎭ ⎭ m j=1 j=1 l1 =0

lk =0

14

The optimal choice of m depends on the topology we use. We choose m to minimize the mean 2

square error of the density, i.e. min k˜ cB − ck2 where k...k2 is the L2 norm under the true probability measure, and c is the true copula density. Just increasing m will reduce the bias but increase the variance of c˜B .

4.2

Consistency in MSE of the EBC Density

We want to choose (16) such that it is optimal under the L2 norm. The asymptotic properties of this estimator under the L2 norm have been studied in details in Sancetta (2003) under the following condition. Condition 1. u1 , ..., un is a sequence of independent strictly stationary uniform [0, 1]k random vectors with copula C (u) and copula density c (u) which has finite first derivatives everywhere in the k-cube. Remark. The independence condition is not required, but we use it in order to shorten the proof as much as possible. From the proof of Theorem 3 it can be seen that the results are still valid under appropriate mixing conditions by use of known coupling results. We use . to indicate greater or equal up to a multiplicative absolute constant, e.g. a . b implies ∃C, 0 < C < ∞, such that a ≤ Cb. (This notation is commonly used in the empirical process literature, e.g. van der Vaart and Wellner, 2000.) Theorem 3. Let c˜B be the k dimensional Bernstein copula density. Under Condition 1 i. Bias(˜ cB ) . m−1 ; 1

ii. Let λj ≡ [uj (1 − uj )] 2 , (a.) for uj ∈ (0, 1) , ∀j,

(b.) for uj = 0, 1, ∀j,

⎛

var (˜ cB ) . ⎝n var (˜ cB ) =

k Y

j=1

⎞−1

λj ⎠

¢ k ¡ m 2 1 + m−1 ,

mk c (u) + O n

µ

iii. c˜B (u) → c (u)

15

mk−1 n

¶

;

in mean square error: k

m2 n

(a.) for uj ∈ (0, 1) , ∀j, if (b.) for uj = 0, 1, ∀j, if

k

m n

→ 0 as m, n → ∞;

→ 0 as m, n → ∞;

iv. The optimal order of the polynomial in a mean square error sense is 2

(a.) m . n k+4 if uj ∈ (0, 1) , ∀j; 1

(b.) m . n k+2 if uj = 0, 1, ∀j;

p k v. If m−k ≥ 2, then c˜B (u) and C˜B (u) are Donsker in (0, 1), i.e. zB (u) ≡ m− 2 n [˜ cB (u) − E˜ cB (u)] h i √ and Z˜B (u) ≡ n C˜B (u) − E C˜B (u) converge to a zero mean Gaussian process with continuous h i sample paths and covariance function E [zB (u1 ) zB (u2 )] , and E Z˜B (u1 ) Z˜B (u2 ) , respectively. The proof of Theorem 3 is given in the next subsection. The weak limit implies that conver-

gence is uniform: if a class of functions is Donsker, then it is Glivenko-Cantelli, i.e. convergence holds uniformly over the class (e.g. van der Vaart and Wellner, 2000, p. 82). Therefore, the results of Theorem 3 hold uniformly. For comparison purposes, let h ≡ m−1 be the smoothing factor in the usual sense. The bias is of the same order as the one for the histogram estimator. In this respect, kernel smoothers ¡ ¢ ¡ ¢ would lead to a bias not higher than O m−2 , as opposed to O m−1 in our case. Notice that ¡ ¢ it is not possible to reduce the bias to O m−2 by shifting the histogram, i.e. using frequency

polygons (e.g. Scott, 1992, ch. 4, for details on frequency polygons). In this case the first term ¡k¢ in the expansion would vanish, but other terms of the same order would not. Further, the O m

rate of approximation of k dimensional Bernstein polynomials imposes a lower bound on the bias

unless linear combinations of Bernstein polynomials are used (see discussion at the end of Section 2.4). While the bias is of the same order as the histogram estimator, the variance is of smaller order ³ k´ ¡ ¢ (except at the edges of the hypercube): var (˜ cB ) = O m 2 instead of O mk as is the case for

the histogram and kernel estimators. On the other hand, for uj = 0, 1, for all j’s, the variance is of the same order as for common nonparametric estimators. The case uj = 0, 1 for only some j is not included because the result is just a mixture of the two extreme case: the variance goes ³ 1´ down by a factor that is O m 2 for all the coordinates inside the k-hypercube while for the

coordinates on the boundaries the contribution to the variance is O (m).

k

As m and n go to infinity, it follows that this estimator has a rate of consistency 16

m2 n

→0

inside the hypercube, versus

mk n

→ 0 for other common nonparametric estimators, e.g. Gaussian ³ 2 ´ kernel. Inside the hypercube, the optimal order of smoothing is m = O n k+4 in mean square ³ 1 ´ ³ 1 ´ error sense, versus m = O n k+2 for the histogram and m = O n k+4 for a first order kernel. This implies that the Bernstein polynomials require very little smoothing (i.e. a large order of

polynomial, which implies a less smooth graph). This is due to the fact that Bernstein polynomials are fairly slow to adjust, as already mentioned in Section 3.

4.3

Simulation

In this subsection, we use some short simulations to investigate the finite sample properties of the EBC density. We choose the KS copula in (14) as the true one. In particular, we let θ = 0.6. This corresponds to ρS = 0.34. Figure III shows the plot for the copula density of the KS copula with dependence parameter θ = 0.6. The KS copula is singular at the origin. As a condition in Theorem 3 we used the fact that the copula density is non-singular. Therefore, comparing this copula density with the EBC will be of interest for several reasons. [Figure III, about here] We notice that this copula density exhibits lower tail dependence. This means specifically that random variables become more dependent on the left side of their support. This appears to be an important property when modelling joint financial returns: asset returns are more dependent when they are negative (Silvapulle and Granger, 2000, Fortin and Kuzmics, 2002 for the case of daily financial returns). By the properties of the copula function we are not concerned with the marginal distributions: these could be either known, or estimated using a parametric specification or the empirical distribution function. Assuming the marginals are correctly specified, then, by definition, the copula is the joint distribution of uniform [0, 1]k marginals. For example, if we were to model daily financial returns parametrically, the modified Weibull distribution or the double gamma could be good choices (e.g. Knight et al., 1995, Laherrere and Sornette, 1998). Therefore, the copula allows us to divide the estimation problem into two steps, which are independent. In this simulation, we only consider the performance of our estimator, so that the marginals are uniform. This corresponds to the case of marginals being known. Carrying out a simulation of the whole joint distribution (i.e. copula and marginals) might be of interest as well. However, in 17

this case, the error in the parametric estimation of the marginal would confound the error in the estimation of the copula. On the other hand, the purpose of this simulation is to provide a brief direct illustration of the small sample properties of our estimator and their link to the theoretical asymptotic results of Theorem 3. The realism of the simulation needs to be discussed. We have arbitrary marginals that could reflect skewness and kurtosis, and a copula that has, as mentioned above, lower tail dependence in line with current empirical research on financial returns. It is the latter we are focusing on in our simulation. We generate 100 samples of n = 500 observations from the KS copula with θ = 0.6, the value of n corresponds to the number of our daily observations (see Li et al., 1996, for details on simulating data from multivariate distributions and copulae). We consider estimation of the Bernstein copula density for m = 4, 6, ..., 40. This boils down to finding the histogram estimator with bin width equal to m−1 , and then applying the Bernstein operator to it. We plot the estimated copula for m = 12 in Figure IV. [Figure IV, about here] In order to study the global performance of our estimator, we should look at the integrated mean square error (IMSE), i.e. Z

2

[0,1]

Z h i 2 E (c (u, v) − c˜B (u, v)) dudv =

[0,1]

2

[Bias (˜ cB (u, v))] dudv + 2

Z

[0,1]

var (˜ cB (u, v)) dudv. 2

For each m, we use our 100 samples of 500 observations from the KS copula to compute an approximate bias squared and variance, say Bias (˜ cB (u, v))2 and var (˜ cB (u, v)). Both Bias (˜ cB (u, v))2 2

and var (˜ cB (u, v)) are obtained by evaluating c˜B (u, v) at 10,000 uniform [0, 1] pseudo random points (for each m, we just use the same points). Therefore, we approximate the above integral by averaging over these 10,000 points, i.e. Z

[Bias (˜ cB (u, v))]2 dudv

'

10,000 X

'

10,000 X

[0,1]2

Z

var (˜ cB (u, v)) dudv

[0,1]2

j=1

j=1

Bias (˜ cB (uj , vj ))2 10, 000 var (˜ cB (uj , vj )) , 10, 000

is our uniform [0, 1]2 sample of points where both the KS copula density where {(uj , vj )}10,000 j=1 and the EBC density are evaluated. We summarize our results in Table II.3 We also report the 18

same results when the copula is estimated using the 2-dimensional histogram estimator. Since the EBC density is obtained from the histogram, it is natural to make comparisons with the 2-dimensional histogram.

[TABLE II, about here] The main result of Table II is that applying the Bernstein operator to the 2-dimensional histogram estimator allows us to decrease the size of the mesh, and consequently the bias keeping the uncertainty (i.e. the variance) low. Further, the results for the ECB do not seem to be particularly sensitive to m as opposed to the histogram. The results agree with the asymptotic results in Theorem 3 in the sense that the variance grows linearly in m as oppossed to the histogram estimator whose variance appears to be quadratic in m. The biases also agree with Theorem 3 (the EBC and the histogram have similar biases), though not so accurately in small samples. In Table III we report the results from regressing the log of the integrated variance of the two estimators (the ECB and the histogram) on the log of m. We use m = 4, 5, ..., 41 with the corresponding estimated variances (i.e. 38 observations). The slopes of the regression support the conclusions of Theorem 3. For the sake of conciseness, we do not report results from the regression of the biases, as the coeﬃcients of log m are similar for the two estimators (as implied by Theorem 3), but very sensitive to the choice of range of m due to the influence of higher order terms.

[TABLE III, about here]

4.4

Proof of Theorem 3

The proof of part ii. in Theorem 3 is based on the normal approximation to the binomial distribution, e.g. see Stuart and Ord (1994, p. 138-140). In particular, let Pv,m (u) ≡ and

µ ¶ m v u (1 − u)m−v , v

½ 1 Pv,m (v) ≡ (2πu (1 − u) m)− 2 exp −

19

³v ´2 ¾ m , −u 2u (1 − u) m

(17)

then

Z∞ ³ ´ m ³v´ X v f f Pv,m (u) ' Pv,m (v) dv. m m v=0 −∞

A formal proof may be given through the Edgeworth expansion for z =

¡v

m

¢ − u in order to

prove that the error is uniform. Taking squares of the two distributions (i.e. the binomial and the Gaussian),

Z∞ ³ ´ m ³v´ X v 2 f f (Pv,m (u)) ' (Pv,m (v))2 dv, m m v=0

(18)

−∞

where again the error holds uniformly. With this remark, we can prove Theorem 3. Notation. We use u1 , ..., un to indicate n copies of k dimensional uniform random vectors in [0, 1]k . On the other hand u = (u1 , ..., uk ) denotes a fixed, but arbitrary, k-dimensional vector. Following Abadir and Magnus (2000), we define Di c (u) ≡

∂c(us ) ∂ui |us =u .

Proof of Theorem 3. Bias(˜ cB ) ≡ E (˜ cB ) − c (u) , which we can rewrite as E

Z

cn (t) dt Km (u, t) − c (u) .

We shall proceed as in the previous proof. Clearly, E

Z

cn (t) dt Km (u, t) − c (u) ≤ E

Z

[cn (t) − c (t)] dt Km (u, t) µZ ¶ +E c (t) dt Km (u, t) − c (u) µ ¶ Z k ≤ E [cn (t) − c (t)] dt Km (u, t) + O , m

where the second inequality follows from the same argument as in the previous proof. Since the Bernstein operator is bounded, by Fubini’s theorem, it is suﬃcient to consider E [cn (t) − c (t)] , i.e. the bias for a histogram estimator. Therefore, (e.g. Scott, 1982, p. 81, or just use (22) below, where no Taylor expansion is required) ¡ ¢ E [cn (t) − c (t)] = O m−1 . Therefore, Bias (˜ cB ) = O

20

µ

1 m

¶

.

For the variance, notice that the probability of one observation falling inside a subset of the hypercube is equal to the probability of a success in a Bernoulli trial. We know that the probability of n successes, where n is the sample size, is given by a binomial distribution. By the variance of n independent Bernoulli trials Ã ! k n ³ m−1 m−1 ´ Y X X m2k X ¢2 ¡ 2 var(˜ cB ) = ··· ps,v1 ···vk − (ps,v1 ···vk ) Pvj ,m−1 (uj ) , 2 n s=1 v =0 v =0 j=1 1

(19)

k

where

1 tvk + m

Z

ps,v1 ···vk ≡

tvk

1 tv1 + m

Z

···

tv1

c (us ) du1s · · · duks .

Consider the following simple identity for any diﬀerentiable function f , 1 t+ Zm

t

f (t) f (u) du = − m

1 t+ Z mµ

t

1 u−t− m

¶

df (u) ,

(20)

where the left hand side can be recovered by simple integration of the second term on the right hand side. By direct application of (20) we have 1 tvk + m

ps,v1 ···vk

Z

=

Z

···

tvk

−

1 tv2 + m

(c (t

v1 , u2s , ..., uks )

1 tv1 + m µ

Z

u1s − tv1 −

tv1

m

tv2

1 m

¶

)

D1 c (u1s , u2s , ..., uks ) du1s du2s · · · duks . (21)

Now, by Condition 1, D1 c (u1s , u2s , ..., uks ) ≤ M for some M < ∞. Therefore, the last term in (21) is bounded by M

1 tv1 + m µ

¶ ¡ ¢ 1 tv1 + − u1s du1s = O m−2 . m

Z

tv1

Substituting in (21), 1 tvk + m

ps,v1 ···vk =

Z

tvk

···

1 tv2 + m µ

Z

tv2

¶ ¢ ¡ c (tv1 , u2s , ..., uks ) + O m−2 du2s · · · duks . m

Applying (20) repeatedly, we have ps,v1 ···vk =

³ ´ c (tv1 , ..., tvk ) + O m−(k+1) . k m 21

(22)

2

Since ps,v1 ···vk ≤ 1, it follows that (ps,v1 ···vk ) = o (ps,v1 ···vk ) . Therefore, substituting (22) in (19), we have that var(˜ cB ) =

m−1 m−1 k ´¶ Y ³ X µ c (tv , ..., tv ) ¡ ¢2 m2k X 1 k −(k+1) Pvj ,m−1 (uj ) . ··· + O m k n v =0 v =0 m j=1 1

(23)

k

We use (18) to approximate (23). By Condition 1, c (tv1 , ..., tvk ) is bounded, say by M < ∞. Recall that tvj ≡

vj m,

j = 1, ..., k. Consequently, solve the following type of integral, ½ ³ ´2 ¾ vj (m−1) Z ³ − u exp − j uj (1−uj ) m−1 v1 vk ´ c Γj = , ..., dvj m m [2π (m − 1) uj (1 − uj )] R ½ ³ ´2 ¾ vj Z exp − uj(m−1) − u j (1−uj ) m−1 ≤ M dvj . [2π (m − 1) uj (1 − uj )] R

Simply make the following change of variable, xj = p (m − 1) uj (1 − uj ) . Then Γj

Z

≤

R

2π

q

(m−1) uj (1−uj )

³

vj m−1

´ − uj , with Jacobian

© ª exp −x2j

p dxj (m − 1) uj (1 − uj )

1 p . 2π (m − 1) uj (1 − uj )

=

³ ´ 1 Therefore, Γj = O m− 2 . This shows that integration leads to a drop in asymptotic magnitude 1

1

equal to m− 2 for each dimension. Let λj ≡ [uj (1 − uj )] 2 , then

⎞−1 ⎛ µ ¶ k k Y m2k ⎝ M −(k+1) 2 ⎠ λj +m var (˜ cB ) . [4π (m − 1)] n mk j=1 ⎛ ⎞−1 k Y ¢ k ¡ . ⎝n λj ⎠ m 2 1 + m−1 . j=1

¡ ¢2 At the edges of the hypercube, i.e. u = 0, 1, Pvj ,m−1 (u) = Pvj ,m−1 (u), then var (˜ cB ) = =

³ ´¶ c (u) −(1+k) + O m mk µ k−1 ¶ mk m c (u) + O . n n m2k n

µ

The mean square error (MSE) convergence simply follows by considering the leading terms for 2

cB ) . The the square bias and the variance for the two distinct cases: M SE = Bias (˜ cB ) + V ar (˜ optimal order of the polynomial follows by minimization of asymptotic MSE with respect to m. 22

Inside the hypercube we have µ

∂ ∂m

¶

M SE

! ¶Ã k m2 −2 . m + n Ã ! k km 2 −1 −3 . −m + 2n µ

∂ ∂m

= 0, which implies m

k+4 2

= O (n). Similarly, for the density at the boundaries of the k-cube, µ

∂ ∂m

¶

M SE

¶µ ¶ mk m−2 + n ¶ µ k−1 km . −m−3 + 2n

.

µ

∂ ∂m

= 0, which implies mk+2 = O (n) . The finite dimensional distributions of the EBC density converge to a normal distribution. This follows from the fact that it is the sum of bounded random variables and Condition 1 (weaker conditions than iid are clearly suﬃcient for the central limit theorem). But the Bernstein copula density has m − 1 bounded derivatives (recall that Bernstein polynomials are closed under diﬀerentiation) and any Bernstein polynomial is Lipschitz. By Theorem 2.7.1 in van der Vaart and Wellner (2000, p. 155) the class of functions that satisfy the properties just mentioned has n o k finite ε bracketing numbers of order exp ε− m−1 . It follows that their entropy integral with bracketing is finite. This is enough to show (see Ossiander, 1987, for the iid case or Pollard,

2001, for generalizations) that the Bernstein copula density converges to a Gaussian process p k with continuous sample paths. The m− 2 n term is required for the leading term in the variance

expansion to be independent of n, i.e. m → ∞ as n → ∞ (as usual for nonparametric estimators).

The same condition applies to the copula because it is m times diﬀerentiable together with the same properties of the density. Clearly, the simple root-n standardization is employed in this case (integration absorbs the smoothing parameter). From the proof it is clear that what drives the variance down is the fact that approximating ³ ´ 1 the square of Pv,m−1 (u) leads to a normal approximation times an extra term that is O m− 2 .

In order to provide more intuition on this result and the diﬀerence between the edges of the box and the points inside it, we provide the following heuristic explanation. Bernstein polynomials 23

average the information about the function throughout its support; recall the singular integral representation in (4). On the other hand, the result at the corners of the hypercube is clear: the approximation at these points is exact and it is not influenced by the behaviour of the function in its domain, i.e. it is exactly local so that we just recover the properties of the histogram estimator.

5

Conclusion and Some Further Extensions

We studied a new object in multivariate analysis called the Bernstein copula. Furthermore, we showed that, subject to regularity, any copula can be represented (approximated) by some Bernstein copula. This copula representation should allow us to take advantage of the properties of the copula function whenever multivariate normality is not a good assumption. We made the procedure operational by providing an empirical estimation procedure with its rates of consistency and an interesting result for the variance. The result for the variance may provide a partial solution to the curse of dimensionality, particularly in semiparametric estimation, i.e. when the marginals are known. The study of the Bernstein copula led us to consider many topics all at once. It is clear that a lot has been left out from this paper. We did not discuss joint continuity of the Bernstein copula under the *-product defined by Darsow et al. (1992) (see Kulpa, 1997, and Li et al., 1998). The *-product is a powerful tool that allows us to define Markov processes and general time series dependence concepts. The Bernstein copula is closed under this operation. The consistency results for the EBC have been derived under the condition of iid observations. While the more general case of stationary random variables could be dealt with (under suitable mixing conditions), for the sake of conciseness we refrained from doing so. We restricted m to be the same for all coordinates. This is not necessary, but the notation would have been cumbersome. What seems more relevant in practice is the actual choice of m (possibly diﬀerent in each coordinate) in empirical work. We did not discuss this issue since it is a common problem for similar (nonparametric) estimators to the ECB. As shown in our example, one may consider some metric and minimize it in terms of m. The results in Table II show that the error is not particularly sensitive to the choice of m as opposed to the histogram estimator. In practice we may use cross validation or some modified version of it (the true estimator would be approximated

24

using the Jackknife). For more details on this and related limitations of this approach, the reader is referred to Scott (1992). We could have compared our estimator with the Genest and Rivest (1993) nonparametric procedure for selecting the best bivariate Archimedean copulae (e.g. Joe, 1997, p. 86, for details on Archimedean copulae). These authors derive an estimator, say Kn (v) , for a function K (v) , which is closely related to the generator of an Archimedean copula. Therefore, their approach could be used to estimate empirically the generator of a bivariate Archimedean copula and from that obtain an estimator for a bivariate Archimedean copula. However, using the words of these authors: their approach ”often proves more convenient in application to use it as a tool for identifying the parametric family of Archimedean copulas that provides the best possible fit to the data”, further, ”one may be tempted to determine directly from Kn (v) the Archimedean copula [...]” (p. 1035). ”Though this would be formally possible, whenever v − Kn (v − ) < 0 for all 0 < v < 1, it generally will be theoretically more meaningful as well as computationally more convenient - to use Kn as a tool to help identify the parametric family of Archimedean copulas that provides the best possible fit to the data” (p. 1038). Indeed, we experienced computational diﬃculties estimating empirically an Archimedean copula using this approach. While the KS copula used in our simulation as the true model belongs to the class of bivariate Archimedean copulae, our estimator is not restricted to this class, has wider application, and is easier to compute. Finally, alternative estimation procedures have not been considered in detail. While the paper provided a promising result for the variance of the empirical estimator, this is one among many others that could be studied. For example, one could look at the following estimation problems ∙ ¸ Z 2 α max Pn ln cB − λn (D cB ) , or

∙ ¸ Z 2 2 α min Pn (Cn − CB ) + λn (D cB ) ,

where Pn is the empirical measure, Cn is the empirical copula, λn is a smoothing parameter going to zero as n → ∞, the unqualified integral is a Lebesgue integral, Dα is the diﬀerential k P ∂ k CB ∂cB operator of order α, i.e. D1 cB = ∂uj , and cB = ∂u1 ···∂uk . However, unlike the case of the EBC j=1

density, these estimators will not automatically lead to a copula unless constraints are imposed.

Nevertheless, under suitable constraints, it seems plausible that the study of these estimators may 25

lead to analogous results in virtue of the kernel representation of the Bernstein copula. Some of the issues left out of this paper are the subject of current research.

Notes 1. Lower and upper tail dependence are respectively defined as λL = lim Pr (u1 < u|u2 < u) , u→0

and λU = lim Pr (u1 > u|u2 > u) , u→1

where λL and λU are between zero and one. No tail dependence corresponds to these probabilities being exactly zero. 2. Clearly we could easily impose the boundary condition Cn (1, ..., 1, uj , 1, ..., 1) = uj , 1 ≤ j ≤ k. However, this will not ensure that Cn is monotonically increasing in a finite sample. 3. The results in Table II should be considered as purely illustrative. A larger number of Monte Carlo simulations would be required in order to achieve more accurate convergence. Unfortunately, our calculations were labour-intensive, and a larger number of simulations would require more investment in programming in order to obtain results within a reasonable time. Another problem is caused by the use of Monte Carlo integration in place of integration over the unit square. In this case, more advanced approaches could have been used, such as quasi-random ³ ´ k numbers in order to bound the error by a quantity O ln (n) /n (for k = 2 dimensions in our √ case) instead of the usual O (1/ n) for the Monte Carlo integration approach (e.g. Spanier and Maize, 1994, for further details).

References [1] Abadir, K.M. (1999) An Introduction to Hypergeometric Functions for Economists. Econometric Reviews 18, 287-330. [2] Andrews, D.W.K. (1987) Consistency in Nonlinear Econometric Models: A Generic Uniform Law of Large Numbers. Econometrica 55, 1465-1471. 26

[3] Berens, H. and R. DeVore (1980) A Characterization of Bernstein Polynomials. In E.W. Cheney (ed.), Approximation Theory III, 213-219. New York: Academic Press. [4] Buoyé, E., N. Gaussel and M. Salmon (2001) Investigating Dynamic Dependence Using Copulae. FERC Working Paper. [5] Butzer, P.L. (1953a) On Two Dimensional Bernstein Polynomials. Canadian Journal of Mathematics 5, 107-113. [6] Butzer, P.L. (1953b) Linear Combinations of Bernstein Polynomials. Canadian Journal of Mathematics 5, 559-567. [7] Darsow, W.F., B. Nguyen and E.T. Olsen (1992) Copulas and Markov Processes. Illinois Journal of Mathematics 36, 600-642. [8] Devore, R.A. and G.G. Lorentz (1993) Constructive Approximations. Berlin: Springer. [9] Embrechts, P., A. McNeil and D. Straumann (1999) Correlation: Pitfalls and Alternatives. Risk 5, 69-71. [10] Fortin, I. and C.A. Kuzmics (2002) Tail-Dependence in Stock-Return Pairs. International Journal of Intelligent Systems in Accounting, Finance and Management 11, 89-107. [11] Genest, C. and L.P. Rivest (1993) Statistical Inference Procedures for Bivariate Archimedean Copulas. Journal of the American Statistical Association 88, 1034-1043. [12] Hu, Ling (2002) Dependence Patterns Across Financial Markets: Methods and Evidence. Preprint, Department of Economics, Yale University. [13] Joe, H. (1997) Multivariate Models and Dependence Concepts. London: Chapman and Hall Ltd. [14] Kulpa, T. (1999) On Approximation of Copulas. International Journal of Mathematics and Mathematical Science 22, 259-269. [15] Laherrère, J. and D. Sornette (1998) Stretched Exponential Distributions in Nature and Economy: ”Fat Tails” with Characteristic Scales. The European Physical Journal B 2, 525539. 27

[16] Li, H., M. Scarsini and M. Shaked (1996) Linkages: a Tool for the Construction of Multivariate Distributions with Given Nonoverlapping Multivariate Marginals. Journal of Multivariate Analysis 56, 20-41. [17] Li, X., P. Mikusi´nski and M.D. Taylor (1998) Strong Approximation of Copulas. Journal of Mathematical Analysis and Applications 225, 608-623. [18] Longin, F. and B. Solnik (2001) Extreme Correlation of International Equity Markets. Journal of Finance 56, 651-678. [19] Lorentz, G.G. (1953) Bernstein Polynomials. Toronto: University of Toronto Press. [20] Marichev, O.I. (1983) Handbook of Integral Transforms of Higher Transcendental Functions. Chichester, England: Ellis Horwood Ltd. [21] Nelsen, R.B. (1998) An Introduction to Copulas. Lecture Notes in Statistics 139. New York: Springer Verlag. [22] Ossiander, M. (1987) A Central Limit Theorem under Metric Entropy with L2 Bracketing. The Annals of Probability 15, 897-919. [23] Patton, A.J. (2001) Modelling Time Varying Exchange Rate Dependence Using the Conditional Copula. Discussion Paper Department of Economics, UCSD. [24] Phillips, P.C.B. (1982) Best Uniform and Modified Padé Approximations of Probability Densities in Econometrics. In W. Hildenbrand (ed.), Advances in Econometrics, 123-167. Cambridge: Cambridge University Press. [25] Phillips, P.C.B. (1983) ERA’s: A New Approach to Small Sample Theory. Econometrica 51, 1505-1525. [26] Pollard, D. (2001) General Bracketing Inequalities. Preprint, Department of Statistics, Yale University. [27] Rockinger, M. and E. Jondeau (2001) Conditional Dependency of Financial Series: An Application of Copulas. Preprint.

28

[28] Sancetta, A. (2003) Nonparametric Estimation of Multivariate Distributions with Given Marginals: L2 Theory. Cambridge Working Papers in Economics 0320. [29] Sancetta, A. and S. Satchell (2001a) Bernstein Approximations to Copula Function and Portfolio Optimization. DAE Working Paper, University of Cambridge. [30] Schweizer, B and E.F. Wolﬀ (1981) On Nonparametric Measures of Dependence for Random Variables. The Annals of Statistics 9, 879-885. [31] Scott, D.W. (1992) Multivariate Density Estimation: Theory, Practice and Visualization. New York: John Wiley and Sons Inc. [32] Silvapulle, P. and C. Granger (2001) Large Returns, Conditional Correlation and Portfolio Diversification: A Value at Risk Approach. Quantitative Finance 1, 542-551. [33] Sklar, A. (1973) Random Variables, Joint Distribution Functions, and Copulas. Kybernetika 9, 449-460. [34] Spanier, J. and E.H. Maize (1994) Quasi-Random Methods for Estimating Integrals Using Relatively Small Samples. SIAM Review 36, 18-44. [35] Stuart A. and J.K. Ord (1994) Kendall’s Advanced Theory of Statistics. London: Edward Arnold. [36] Van der Vaart, A. and J.A. Wellner (2000) Weak Convergence of Empirical Processes. Springer Series in Statistics. New York: Springer.

A

Proofs

Proof of Theorem 1. Consider the Bernstein copula CB (u1 , ..., uk ) as an approximation to a copula C (u1 , ..., uk ), i.e.

³v

³v vk ´ vk ´ 1 =α , ..., . m m m m ¢ ¡ Then (5) and (6) are suﬃcient for C vm1 , ..., vmk to be a copula. Since Bernstein polynomials do C

1

, ...,

not interpolate exactly, (5) and (6) are not necessary for CB (u1 , ..., uk ) to be a copula.

29

Proof of Theorem 2. k (Bm f )(x) − f (x) =

m1 X

...

mk X

v1 =0

vk =0

m1 X

mk X

Pv1 ,m1 (x1 ) · · · Pvk ,mk (xk )

¸ ¶ ∙ µ vk v1 , ..., − f (x1 , ..., xk ) × f m1 mk

=

...

v1 =0

vk =0

Pv1 ,m1 (x1 ) · · · PvK ,mK (xk )

v1 m1

v

,..., mk

Z

k

∇f dr

(x1 ,...,xk )

∇f ≡ [f 01 (s1 , ..., sk ) , ..., f 0k (s1 , ..., sk )], where f 0j (s1 , ..., sk ) ≡

∂f (s1 ,...,sk ) , ∂sj

and r is a vector

valued function that defines the path between the end points of the integral. By definition, ∇f is a conservative vector field, so the path of integration is irrelevant. The above line integral can be split into k integrals along any paths parallel to the axis and perpendicular to each other. For example, we can write

v1 m1

v

,..., mk

Z

k

v1

∇f dr =

(x1 ,...,xk )

Zm1

vk

f 01 (s1 , x2 , x3 , ..., xk ) ds1 + ... +

x1

Zmk

f 0k

xk

µ

¶ vk−1 v1 v2 , , ..., , sk dsk . m1 m2 nk−1

Now, considering the j th term, v j

Zmj

xj

f

0j

µ

¶ v1 v2 , , ..., sj , ..., xk dsj m1 m2

= f

0j

µ

v1 , ..., xj , xj+1 , ..., xk m1

¶µ

vj − xj mj

¶

v j

¶ µ ¶ Zmj µ vj v1 v2 , , ..., sj , ..., xk . sj − df 0j − mj m1 m2 xj

(24) From here the crude result of the Theorem can be obtained assuming that f 0j ∈ LipMj 1, i.e. f 0j satisfies the Lipschitz condition with constant Mj and exponent 1: |f 01 (s1 , ..., sj + hj , ..., sk ) − f 01 (s1 , ..., sj , ..., sk )| ≤ Mj |hj | . v j

It follows that the last integral in (24) does not exceed Mj

Rnj

xj

30

(sj − xj ) dsj = 12 Mj

³

vj mj

− xj

´2

.

Therefore, ¯ k ¯ ¯(Bm f )(x) − f (x)¯ ≤

m1 X

v1 =0

...

mk X

vk =0

Pv1 ,m1 (x1 ) · · · Pvk ,mk (xk )

µ ¶2 k 1X vj × Mj − xj 2 j=1 mj ¸ ∙ 1 x1 (1 − x1 ) xk (1 − xk ) = + ... + Mk M1 2 m1 mk

for any X, where the first term in the right hand side of (24) is exactly zero when the Bernstein ³ ´ v operator is applied to mjj − xj .

31

Table I. Spearman’s rho for diﬀerent values of the dependence parameter θ. θ

0

.14

.31

.51

.76

1.06

1.51

2.14

3.19

5.56

∞

ρS (KS)

0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1

ρS (B10 )

0

.08

.16

.24

.32

.4

.48

.57

.65

.73

*

ρS (B30 )

0

.09

.19

.28

.37

.46

.56

.65

.75

.84

*

ρS (B50 )

0

.09

.19

.29

.38

.48

.58

.67

.77

.86

*

ρS (B100 )

0

.1

.2

.29

.39

.49

.59

.69

.78

.88

*

ρS (B200 )

0

.1

.2

.3

.39

.49

.59

.69

.79

.89

*

ρS (B300 )

0

.1

.2

.3

.4

.49

.6

.7

.8

.89

*

32

Table II. IMSE for Simulation. EBC 2 Int. Bias m Int. Var 2 0.307 0.002 4 0.242 0.005 6 0.207 0.009 8 0.186 0.012 10 0.169 0.015 12 0.157 0.019 14 0.147 0.023 16 0.138 0.026 18 0.130 0.030 20 0.121 0.033 22 0.114 0.037 24 0.109 0.040 26 0.104 0.043 28 0.098 0.047 30 0.093 0.051 32 0.088 0.055 34 0.082 0.058 36 0.079 0.062 38 0.075 0.066 40 0.072 0.069

IMSE 0.308 0.247 0.216 0.198 0.185 0.176 0.169 0.164 0.159 0.154 0.150 0.149 0.147 0.145 0.145 0.143 0.141 0.141 0.140 0.141

Figures are rounded to 3 decimal places

33

Int. Bias 0.300 0.235 0.205 0.187 0.171 0.159 0.148 0.143 0.137 0.128 0.127 0.114 0.112 0.105 0.103 0.103 0.102 0.099 0.097 0.090

2

Histogram Int. Var 0.007 0.031 0.070 0.126 0.196 0.294 0.390 0.512 0.640 0.792 0.969 1.151 1.327 1.547 1.787 2.030 2.291 2.586 2.853 3.171

IMSE 0.306 0.266 0.275 0.313 0.367 0.453 0.538 0.656 0.777 0.920 1.095 1.265 1.439 1.653 1.890 2.133 2.393 2.685 2.950 3.260

Table III. Regression of log(Int.V ar) on log (m) .

ECB Estimator Coefficients: Value Std.Error (Intercept) -6.715 0.0154 log(m) 1.1003 0.0051 Multiple R-Squared: 0.9992

34

Histogram Estimator Coefficients: Value Std.Error (Intercept) -6.238 0.0083 log(m) 2.0054 0.0027 Multiple R-Squared: 0.9999

. Figure I. KS Copula Density, θ = 1.06.

35

Figure II. KS Copula Density, θ = 1.06.

ABC for the KS copula (θ = 1.06, m = 30)

36

Figure III. KS Copula Density, θ = .6.

37

Figure IV. EBC Density, m = 12.

38