Robust estimation of skewness and kurtosis in distributions with infinite higher moments SUPPLEMENTARY APPENDIX Matteo Bonato Credit Suisse∗ September 2010
1
Sample skewness and kurtosis for for heavy tail distributions
Consider a process {yt }t=1,...,T and assume the yt ’s are independent and identically distributed (i.i.d. ) with cumulative distribution function F . No assumption is made in the existence of the moments of the d.g.p. originating the process. If the moments up to the fourth exist, the conventional coefficients of skewness and kurtosis for yt are given by SK1 = E
yt − µ σ
3
,
KR1 = E
yt − µ σ
4
− 3,
where u = E(yt ) and σ 2 = E(yt − µ)2 , and the expectation E is taken with respect to F . If the sample comes from a distribution which does not posses the second moment, then it also does not possess any other higher moment, and thus skewness and kurtosis are not defined. For any unknown d.g.p. the indexes of skewness and kurtosis can always be computed by the sample averages 3 4 T T X X yt − µ ˆ yt − µ ˆ −1 −1 d [ SK1 = T , KR1 = T − 3, σ ˆ σ ˆ t=1 t=1
PT PT 2 where µ ˆ = T −1 t=1 yt and σ ˆ 2 = T −1 t=1 (yt − µ ˆ) . In the presence of one or more outliers the values of these sample measure can be arbitrarily large. This is due to the fact that we are raising to the third and fourth power. Thus a sensible interpretation of large values for these indexes is not straightforward, i.e. these measure are not capable to discriminate between heavy tailed distributions as d.g.p. or simply the presence of a single outlier. To go deeper in this issue I present a simple but very effective example. Assume we are given a time series of returns. The first thing one researcher or practitioner would probably do is to compute the descriptive statistics of the sample: mean, variance, skewness and kurtosis. As already mentioned, stock market returns tend to display negative skewness and excess kurtosis. To account for these facts I simulate series of returns coming from a GARCH(1,1) (Bollerslev, 1986) model with different distributional assumption for the residuals. It is very well known that the simple GARCH with Gaussian innovations fails in capturing the fat tails of the the returns and, given that the normal distribution is symmetric, it also does not take into account possibly asymmetry in the distribution. To overcome these problems, fat tailed and possibly skewed distributions have been adopted, see the manuals of Rachev (2003) and Rachev et al. (2005) for a more detailed exposition. Amongst the possible specifications of fat tail distributions I focus here on two which are widely used in literature: the symmetric stable Paretian and the Student’s t. The choice of symmetric distributions is motivated by the findings in Kim and White (2004) on the effect ∗ E-mail address:
[email protected]. The views expressed herein are those of the authors and not necessarily those of the Swiss National Bank, which does not accept any responsibility for the contents and opinions expressed in this paper.
1
2 of extreme observations on the indexes of skewness and kurtosis. This indexes are computed as sample averages and are therefore not robust to the present of outliers. This implies that a sample coming from a symmetric but fat tailed distribution is likely to be (mistakenly) considered asymmetric according to the sample skewness, which results, due the presence of extreme observations, not close to zero. 20
250 200
10 150 0
100 50
−10 0 −20
−50 −100
−30 −150 −40
0
200
400
600
800
−200
1000
40
0
200
400
600
800
1000
0
200
400
600
800
1000
0
200
400
600
800
1000
35 30
30 25 20
20 15
10
10 0
5 0
−10
−5 −20 −10 −30
0
200
400
600
800
−15
1000
40
30
30
25 20
20
15 10 10 0 5 −10
0
−20
−30
−5
0
200
400
600
800
−10
1000
Figure 1:
Sample paths of simulated Stableα -Garch with parameters ω = 0.1, a = 0.02 and b = 0.9 and stability index α = 1.7 (top left), 1.8 (middle left), and 1.9 (bottom left) and simulated Student’s t Garch with parameters ω = 0.1, a = 0.02 and b = 0.9 and degrees of freedom ν equal to 2 (top right), 3 (middle right) and 4 (bottom right)
The Student’s t was first adopted, along with the GARCH structure, in Bollerslev (1987). It’s ability to capture possible heaviness in the tails renders it the natural candidate to replace the Gaussian distribution for the residuals of the GARCH models. However, this distribution is an ad hoc choice not supported by any theoretical result (such as the central limit theorem for the Gaussian case). Also, the Student’s t distribution is not close under summation and this may cause theoretical results to be cumbersome to derive. A family of distributions able to capture the phenomenon of heavy tails originates from a generalization of the central limit theorem (and thus being supported by a strong theoretical result): the family of stable Paretian distributions. The theory of univariate stable distributions was essentially developed in the 1920’s and 1930’s by Paul L´evy and Aleksandr Yakovlevich Khinchin. More recently, it was object of a monograph by Zolotarev (1986). This class of distribution nests two special distributions: the normal and the Cauchy. In finance, they were first used in the pioneeristic work of Mandelbrot (1963) and Fama (1965) and extended in the GARCH context by Mittnik et al. (2000). As the do not possess variance (and thus all higher moments), the Stable-GARCH(r, s) assumes the form of a power GARCH of Ding et al. (1993). It reads s r X X δ δ δ bj σj−1 , (1) aj |yt−1 − µ| + yt = µ + ct ǫt , σt = ω + i=1
i.i.d.
j=1
where ǫt ∼ Sα (0, 1), and Sα (0.1) denotes the standard symmetric stable Paretian distribution. I simulated a sample coming from a GARCH process with either stable or Student’s t residuals. For the stable innovations, the stability index α assumes values 1.9, 1.8 and 1.7. The parameters of the StableGARCH equation reads ω = 0.1, a = 0.02 and b = 0.9. For the Student’s t case, the degrees of freedom ν
3
take the value 2,3 and 4. The parameters of the GARCH equation are equivalent to the stable-GARCH case. The coefficient δ is set to be equal to 1. Figure 1 shows the sample path of the simulated series. The plots display clusters of volatility and extreme events. All of them, except for the Student’s t with 2 degrees of freedom (top right panel) are compatible with a sample coming from returns on a stock. SK1 KR1 SK1′ KR1′
Stable1.7 -2.7812 43.7023 0.0135 5.8810
Stable1.8 1.7523 51.8331 -1.5334 16.9920
Stable1.9 2.5172 51.6145 -0.6026 13.9244
t2 2.1041 65.5710 -0.7150 43.9753
t3 2.8665 37.1610 1.2074 17.7475
t4 2.7177 38.4444 0.3493 6.9720
Table 1:
Sample skewness and sample kurtosis before (SK1 and KR1 ) and after (SK1 and KR1 ) the removal of the outliers for the simulated GARCH(1,1) processes with different distributions for the error term.
Table 1 displays, in the first two rows, the sample skewness index SK1 and the sample kurtosis index KR1 . The third and fourth line show the sample skewness and kurtosis once one or two outliers have been removed from the sample. I considered outlier the largest observation in absolute value.1 . I first focus on the sample skewness. The first surprising result is that for all the simulated samples (coming from symmetric distributions), the skewness is not close to zero. It indicates negative asymmetry for one series and positive asymmetry for the other five. The explanation of this results comes straightforward from the findings in Kim and White (2004). I simulated symmetric heavy tailed distribution. The index of skewness is not robust to the presence of outliers, in our case extreme observations located in the the tails of the distribution. Thus, according to the sole use of this measure, the samples generated appear as heavily skewed. Analyzing the index of kurtosis, it is clear that it is far away from the value expected from a normal distribution. This is again imputable to the fat tails of the simulated distributions. The second part of Table 1 reports the same statistics after one single extreme observation was removed. The result is again unexpected. Four of the six values of the skewness have changed sign and still indicates presence of asymmetry in the distribution. The other two values of the skewness diminishes toward zero but still at first sight they might suggest the there the samples are skewed. Kurtosis decreases in great measure after the outlier removal. The most notable change happens for the simulated Stable with tail index α = 1.7 with a shift from 43 to 5. This simple exercise confirms that, first, the indexes of skewness and kurtosis are not robust to the presence of extreme events. Secondly, skewness is totally an uninformative measure when applied to sample coming from a fat tail distribution as it is not capable to discriminate between symmetric and asymmetric distributions. These limitations of the indexes of skewness and kurtosis, joint with the fact that financial returns are not likely to be normally distributed, require the development of more robust measures. The object of interest of this paper is therefore to test the behavior of the alternative measures of skewness and kurtosis introduced in literature when applied to fat tailed distributions that, in the extreme case of stable Paretian distributions, do not possess any finite integer moment except for the first.
2
Monte Carlo Simulations
In this section I conduct Monte Carlo simulations designed to investigate the behavior of the classical measures of skewness and kurtosis when the population comes from a distribution which does not possess finite second, third or fourth moment and thus does not admit finite theoretical skewness and kurtosis. The same analysis is repeated using the alternative measures proposed in Kim and White (2004). The goal of this paper is to investigate the robustness of these alternative measures when applied to very fat-tailed distributions. Following the discussion presented in the previous sections, I simulated samples coming from symmetric stable Paretian and Student’s t distributions. The stability index α was chosen to be 1.9, 1.8 and 1.7, the 1 From a statistical point of view, none of them can be categorized as outliers as the are compatible with the g.d.p.It ˙ would be more appropriate to consider them as extreme observations rather than outliers.
4
SK1 SK2 SK3 SK4 KR1 KR2 KR3 KR4 Table 2:
S1.9 0 0 0 0.02 0.3 0.15
S1.8 0 0 0 0.04 0.65 0.38
S1.7 0 0 0 0.07 1.04 0.70
t2 0 0 0 0.28 1.75 2.36
t3 0 0 0 0.17 0.91 1.25
t4 0 0 0 0 0.12 0.61 0.83
Values of the measures of skewness and kurtosis of the various simulated distributions.
scale parameter σ and the location parameter µ are set to 0 and 1, respectively. The degrees of freedom ν of the Student’s t assumed values 2, 3 and 4. In the case of the simulated stable distributed samples, the second and higher moments are not finite for the previously chosen value of α. For the simulated Student’s, the values of ν = 2, 3 and 4 rule out the existence of finite variance, skewness and kurtosis, respectively. For sample sizes T = 50, 250, 500, 1000, 2500, and 5000 I generate observations yt , t = 1, . . . , T using this six distributions and calculate the various measures of skewness and kurtosis presented in the previous sections. Each experiment is repeated 1,000 times. The true values of the skewness and kurtosis measures for the various distributions are reported in Table 2. As the sample size N → ∞ if these statistics are consistent, their values from the simulation experiment should collapse around the true values. Figures 2-4 report the box-plot of sample skewness and kurtosis of the simulated series. Each figure contains six windows for the six different distributions and each window displays six box-plots for the different sample sizes, as reported on the vertical axes. Each box-plot represents the lower quartile, median and upper quartile values. The whiskers are lines extending from each end of the box and their length is chosen to be the same as the length of the corresponding box. Reported are also the observations beyond the end of the whiskers. This to give an idea of which values these measure can take in extreme cases. The performance of SK1 (Figure 2 left column) is, as expected, inconsistent. The median is centered at zero, as the simulated distribution are symmetric. However, as the sample size increases, more extreme observations are present in the process and the values of the skewness does not collapse to the expected true value of zero. On the contrary, the dispersion around zero increases as clearly shown by the box-plots. This pattern is less clear for the Student’s t with ν = 4 as it assumes thinner tails than all the other distributions. Note also how the observations beyond the whiskers take values far away from the center as N increases. A preliminary analysis of these values for the skewness would indicate to the researcher an indubitable rejection of the null hypothesis of symmetry in the distribution. The distributions of SK2 , SK3 and SK4 (Figures 2 and3) are stable and collapse around the true value of zero as the sample size increase. Notice, however, how in very small sample size (N = 50) all these measures are not centered in zero but are visibly downward biased (SK2 ) or upward biased (SK3 and SK4 ). The performance of KR1 (Figure 3) is impressively bad. As the sample size increases, the dispersion of the distribution increases, it is heavily skewed on the right and the median is always above the true value of 3. Also, the number of observations lying outside the right end of the whisker increases consistently with values for the kurtosis that exceed 4,000. These results are somehow expected as all the simulated distributions do not posses finite fourth moment and thus the index of kurtosis loses its meaning. The alternative measures of kurtosis, not relying on the existence of finite moments for the distribution, do not display any erratic behavior when applied to the simulated series. KR2 and KR4 ( Figures 3 and 4) do not present problems except in small samples where the skewness of the distribution is evident. KR3 (Figure 4) is centered around the true values for all the simulated distributions but displays small finite sample bias when the number of observations is less than 1,000. The simulation exercise presented above confirms once again that the standard measures of skewness and kurtosis are uninformative indicators of a distributions which possesses fat tails. On the contrary, the alternative measures provide a reliable tool to assess the characteristic of a distribution.
5
SK1
SK2
α = 1.7
SK3
α = 1.7
α = 1.7
5000
5000
5000
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50
50 −60
−40
−20
0 Values
20
40
60
50 −0.6
−0.4
−0.2
α = 1.8
0 Values
0.2
0.4
−0.6
5000
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50
50 −20
0 Values
20
40
60
−0.4
−0.2
α = 1.9
0 Values
0.2
0.4
−0.8
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50 −20
0 Values
20
40
60
−0.4
−0.2
ν=2
0 Values
0.2
0.4
−0.8
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50 −20
0 Values
20
40
60
−0.4
−0.2
0
0.2
0.4
−0.8
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50
50 20
40
60
−0.4
−0.2
ν=4
0 Values
0.2
0.4
0.6
−0.4
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50
50 20
0.6
−0.2 Values
0
0.2
0.4
0.6
−0.2 0 Values
0.2
0.4
0.6
0.5
ν=4 5000
10
0.4
0 Values
ν=4 5000
Figure 2:
−0.6
−0.5
5000
0 Values
0.2
50 −0.6
Values
−10
0 Values
ν=3 5000
−20
−0.4
ν=3 5000
0
−0.6
Values
ν=3
−20
−0.2
50 −0.6
5000
−40
0.8
ν=2 5000
−40
−0.4
ν=2 5000
−60
0.6
50
−0.6
5000
50
0.4
α = 1.9 5000
−40
−0.6
α = 1.9 5000
50
0.2 Values
50 −0.6
5000
−60
0
α = 1.8
5000
−40
−0.2
α = 1.8
5000
−60
−0.4
50 −0.6
−0.4
−0.2
0 Values
0.2
0.4
−0.4
−0.3
−0.2
−0.1
0 0.1 Values
0.2
0.3
0.4
Sampling Distribution of SK1 (left), SK2 (middle) and SK3 (right) for simulated stable Paretian r.v.s with α = 1.7, 1.8 and 1.9 and Student’s t r.v.s with degrees of freedom ν = 2, 3 and 4
6
SK4
KR1
α = 1.7
KR2
α = 1.7
α = 1.7
5000
5000
5000
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50
50 −0.3
−0.2
−0.1
0 Values
0.1
0.2
0.3
50 0
1000
2000
3000
4000
5000
−0.6
α = 1.8
5000
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50
50 −0.2
−0.1
0 Values
0.1
0.2
0.3
1000
2000
3000
4000
5000
−0.5
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50 0.1
0.2
0.3
1000
2000
3000
4000
5000
−0.6
ν=2
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50 0.1
0.2
0.3
1000
2000
3000
4000
5000
−0.5
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50
50 0.1
0.2
0.3
500
1000
ν=4
1500
2000 2500 Values
3000
3500
4000
−0.5
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50
50
Figure 3:
0 Values
0.1
0.2
0.3
0.4 Values
0.6
0.8
1
1.2
1
1.5
0.5 Values
1
ν=4 5000
−0.1
0.2
0.5 Values
0
ν=4 5000
−0.2
2
50 0
5000
−0.3
1.5
ν=3 5000
0 Values
0
ν=3 5000
−0.1
0
Values
ν=3
−0.2
1
50 0
5000
−0.3
1.2
ν=2 5000
0 Values
−0.2
ν=2 5000
−0.1
−0.4
Values
5000
−0.2
1
50 0
50
0.8
α = 1.9 5000
0 Values
0.5
α = 1.9
50
0.6
Values
5000
−0.1
0
Values
α = 1.9
−0.2
0.2 0.4 Values
50 0
5000
−0.3
0
α = 1.8
5000
−0.3
−0.2
α = 1.8
5000
−0.4
−0.4
Values
50 0
200
400
600 800 Values
1000
1200
1400
−0.5
0
0.5 Values
1
1.5
Sampling Distribution of SK3 (left), KR1 (middle) and KR2 (right) for simulated stable Paretian r.v.s with α = 1.7, 1.8 and 1.9 and Student’s t r.v.s with degrees of freedom ν = 2, 3 and 4
REFERENCES
α = 1.7
α = 1.8
α = 1.9
5000
5000
5000
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50
50 0
1
2
KR3
3 Values
4
5
50
6
0
1
2
3 Values
ν=2
4
5
6
7
0
5000
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50 1
2
3
4 5 Values
6
7
8
0
1
2 Values
α = 1.7
3
4
−0.5
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50 20
30
KR4
40 Values
50
60
70
80
10
20
30 40 Values
ν=2
50
60
70
0
2500
2500
2500
1000
1000
1000
500
500
500
250
250
250
50 100
150 Values
1.5 Values
200
250
300
2
2.5
3
3.5
50
100
150 Values
200
250
ν=4 5000
50
1
ν=3 5000
0
0.5
50 0
5000
50
8
α = 1.9 5000
10
0
α = 1.8 5000
50
6
50
9
5000
0
4 Values ν=4
5000
0
2
ν=3
5000
50
7
50 0
5
10
15 Values
20
25
−2
0
2
4
6
8 Values
10
12
14
Figure 4:
Sampling Distribution of KR3 (rows 1-2) and KR3 (rows 3-4) and for simulated stable Paretian r.v.s (rows 1 and3) with α = 1.7, 1.8 and 1.9 and Student’s t r.v.s (rows 2 and 4) with degrees of freedom ν = 2, 3 and 4
References Bollerslev, T. (1986): “Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 31, 307–327. ——— (1987): “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return,” The Review of Economics and Statistics and Probability Letters, 69, 542–574. Ding, Z., C. J. Granger, and R. Engle (1993): “A long memory property of stock market returns and a new model,” Journal of Empirical Finance, 1, 83–106. Fama, E. (1965): “The behavior of stock market prices,” Journal of Business, 38, 34–105. Kim, T. and H. White (2004): “On more robust estimation of skewness and kurtosis,” Finance Research Letters, 1, 65–70. Mandelbrot, B. (1963): “The variation of certain speculative prices,” Journal of Business, 36, 391–419.
16
8
REFERENCES
Mittnik, S., M. Paolella, and S. Rachev (2000): “Diagnosing and treating the fat tails in financial returns data,” Journal of Empirical Finance, 7, 389–416. Rachev, S. (2003): Handbook of Heavy Tailed Distributions in Finance, S.T. Rachev. Rachev, S., C. Menn, and F. Fabozzi (2005): Fat-Tailed and Skewed Asset Return Distributions : Implications for Risk Management, Portfolio Selection, and Option Pricing, Wiley. Zolotarev, V. (1986): One-dimensional stable distributions, Translations of Mathematical Monographs, vol. 65. American Mathematical Society.