JOURNAL OF APPLIED ECONOMETRICS J. Appl. Econ. 28: 353–371 (2013) Published online 23 November 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/jae.1274

REVERSE REGRESSIONS AND LONG-HORIZON FORECASTING MIN WEIa AND JONATHAN H. WRIGHTb* a

Division of Monetary Affairs, Federal Reserve Board, Washington, DC, USA Department of Economics, Johns Hopkins University, Baltimore, MD, USA

b

SUMMARY Long-horizon predictive regressions in finance pose formidable econometric problems when estimated using available sample sizes. Hodrick in 1992 proposed a remedy that is based on running a reverse regression of shorthorizon returns on the long-run mean of the predictor. Unfortunately, this only allows the null of no predictability to be tested, and assumes stationary regressors. In this paper, we revisit long-horizon forecasting from reverse regressions, and argue that reverse regression methods avoid serious size distortions in long-horizon predictive regressions, even when there is some predictability and/or near unit roots. Meanwhile, the reverse regression methodology has the practical advantage of being easily applicable when there are many predictors. We apply these methods to forecasting excess bond returns using the term structure of forward rates, and find that there is indeed some return forecastability. However, confidence intervals for the coefficients of the predictive regressions are about twice as wide as those obtained with the conventional approach to inference. We also include an application to forecasting excess stock returns. Copyright © 2011 John Wiley & Sons, Ltd. Received 15 July 2010; Revised 27 July 2011 Supporting information can be found in the online version of this article.

1. INTRODUCTION Asset returns are widely thought to be somewhat forecastable, and perhaps more so at long than at short horizons. But inference in long-horizon predictive regressions is well known to be complicated by severe econometric problems in empirically relevant sample sizes. The problems arise because the predictors that are used are variables like the dividend yield or term spread that are highly persistent, while the regressor is an overlapping sum of short-term returns. This creates something akin to a spurious regression. This is compounded by the feedback effect, or absence of strict exogeneity—a shock to returns will in turn affect future values of the predictors. As a result, conventional t-statistics have rejection rates that are well above their nominal levels. The vast literature on the problems with long-horizon predictive regressions includes work such as Goetzmann and Jorion (1993), Elliott and Stock (1994), Stambaugh (1999), Valkanov (2003), Campbell and Yogo (2006) and Rossi (2007). Hodrick (1992) proposed an approach to test the null hypothesis that a certain predictor does not help forecast long-horizon returns. His idea was instead of regressing the cumulative h-period returns onto the predictor at the start of the holding period, to regress the one-period return onto the sum of the predictors over the previous h periods. Under stationarity, in population, the coefficient in the regression of h-period returns onto the predictor is equal to zero if and only if the coefficient in the regression of one-period returns onto the sum of the predictors over the previous h periods is equal to zero. However, while the first regression has a persistent error term, the second regression does not. Intuitively, this might mean that the size distortions of a test based on the second regression are smaller, because a persistent error term makes it difficult to estimate the variance of the ordinary least squares (OLS)

* Correspondence to: Jonathan H. Wright, Department of Economics, Johns Hopkins University, 3400 N Charles St, Baltimore, MD 21218, USA. E-mail: [email protected]

Copyright © 2011 John Wiley & Sons, Ltd.

354

M. WEI AND J. H. WRIGHT

regression estimator precisely. Hodrick finds that this is indeed the case. Hodrick also considered a standard error for the original long-horizon predictive regression that uses the same reverse regression logic. The reverse regression approach to inference has become fairly widely used. For example, in their study of the long-horizon predictability of stock returns, Ang and Bekaert (2007) rely mainly on reverse-regression-based standard errors. The methodology proposed by Hodrick (1992), however, has two limitations. First, its justification relies on stationarity. Secondly, it is only valid for testing the null of no predictability. Many researchers believe that there is some time series predictability in asset returns, even after controlling for econometric problems (see, for example, Campbell, 2000), and would like to test other hypotheses about the slope coefficient in a long-horizon predictive regression and, in particular, would like to form a confidence interval for this coefficient. This paper revisits the use of reverse regressions in long-horizon asset return prediction, making a number of contributions. First, in the case with stationary regressors, we propose a methodology related to the reverse regression, and show that it can be used more widely for inference on the slope coefficient in a long-horizon regression, and not just to test that it is equal to zero. Second, we derive the asymptotic distribution of the various reverse regression test statistics if the predictors are highly persistent, modeled as having roots that are local to unity. In these results, we allow for some predictability in returns. Although the standard reverse regression does not give the correct size asymptotically in this case, we show that the size distortions are modest, provided that the degree of predictability is not too great. This contrasts with results for the usual forward regression Wald statistics, where asymptotic size distortions with near unit roots are enormous (Valkanov, 2003). Third, we assess the properties of the various reverse regression procedures in Monte Carlo simulations with highly persistent predictors and some predictability. We find that the reverse regression procedures give confidence intervals with good coverage properties. In contrast, the standard forward regression with Newey–West standard errors has effective coverage that is well below the nominal level. Finally, we consider two empirical applications of our proposed methodology. We first apply the reverse regression to the prediction of excess bond returns, considering the regressions of Fama and Bliss (1987) and Cochrane and Piazzesi (2005). We find that the confidence intervals for the slope coefficients are roughly twice as wide as we would obtain from the usual long-horizon regression with Newey–West heteroskedasticity- and autocorrelation-robust standard errors. Using the reverse regression does not eliminate the clear empirical evidence for the predictability of bond returns using the term structure of forward rates. But the p-values testing joint significance of all the slope coefficients go from eye-popping values around 10 6 in the ordinary regression to around 0.01 in the reverse regressions. We also re-examine the predictability of excess stock returns and find that the standard errors on the slope coefficients are wide and include zero at all but the shortest horizons. The reverse regression approach to inference applies regardless of whether there is a single predictor or multiple predictors. That is an advantage of this approach to inference relative to some others that have been proposed, such as the methods of Valkanov (2003), Campbell and Yogo (2006), Torous et al. (2004) and Rossi (2007) that are feasible only for a scalar predictor. The plan for the remainder of the paper is as follows. Section 2 considers the case with stationary predictors and describes long-horizon regressions, reverse regressions, and the proposed extension to the reverse regression methodology that allows us to test any hypothesis on the long-horizon slope coefficient, not just the null that it is equal to zero. Section 3 derives the limiting distribution of reverse regression Wald tests if the predictors have roots that are local to unity. Section 4 contains Monte Carlo simulations. Section 5 uses the reverse regression methodology to re-examine the forecasting of excess bond returns. Section 6 contains an application to forecasting excess stock returns. Section 7 concludes. Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

355

LONG-HORIZON REVERSE REGRESSIONS

2. FORWARD AND REVERSE REGRESSIONS ðhÞ

Let rt + 1 denote the continuously compounded return on any asset from t to t + 1 and let rtþh ¼ rtþ1 þ rtþ2 . . . þ rtþh denote the h-period return. Let xt be some px1 vector of predictors. Assume that   yt ¼ rt ; xt′ ′ is covariance stationary and that A(L)yt = et, where A(L) is a lag polynomial with all roots outside the unit circle and et is a martingale difference sequence with 2 + d finite moments for some d > 0. Consider the standard long-horizon predictive regression: ðhÞ

ð hÞ

rtþh ¼ aðhÞ þ xt′bðhÞ þ etþh

(1)

ð hÞ

^ denote the OLS estimator of this regression. Researchers commonly estimate equation (1), Let b using Newey–West or Hansen–Hodrick standard errors (Newey and West, 1987; Hansen and Hodrick, 1980), to control for serial correlation in the errors. Alternative standard errors that are robust to heteroskedasticity and serial correlation in equation (1) are given by Hodrick standard errors 1B (Hodrick, 1992). This involves estimating the variance of  1  1 ′ Σ~xt ~xt′ , where (a(h), b(h) ′)′ in the forward regression (equation (1)) as W ¼ Σ~xt ~xt′ Σwtþ1 wtþ1   wtþ1 ¼ ðrtþ1  r ÞΣh ~xti , ~xt ¼ 1; x ′ ′ and r is the sample mean of returns. Hodrick standard errors t

i¼1

1B are valid if and only if b(h) = 0, because it is in this case alone that the sample variance of wt + 1 is a ðhÞ consistent estimate of the zero-frequency spectral density of xt etþh . The Wald statistic testing the ^ ðhÞ′W 1 b ^ ðhÞ , where W is partitioned conhypothesis that b(h) = 0 using Hodrick standard errors 1B is b 22   W11 W12 . This has a w2(p) asymptotic distribution under the null. formably with (a(h), b(h) ′)′ as W21 W22 Consider also the reverse regression of the one-period return on the h-period sum of the regressor: ð hÞ rtþ1 ¼ mðhÞ þ xt ′gðhÞ þ utþ1

(2)

ð hÞ

where xt ¼ xt þ xt1 . . . þ xthþ1. The coefficients in the forward and reverse regressions are related as follows:       ðhÞ 1 1 h 1 h cov rtþh ; xt ¼ Vxx Σj¼1 cov rtþj ; xt ¼ Vxx Σj¼1 cov rtþ1 ; xtþ1j bðhÞ ¼ Vxx     ðhÞ 1  ð hÞ ðhÞ 1 1 ðhÞ 1 ðhÞ ðhÞ cov rtþ1 ; xt Vxx Vxx cov rtþ1 ; xt Vxx g ¼ Vxx ¼ Vxx ¼ Vxx

(3)

ð hÞ

ðhÞ are the variance–covariance matrices of xt and xt , respectively, and the last equality where Vxx and Vxx on the first line uses the assumption of covariance stationarity. A consequence of this is that b(h) = 0 if and only if g(h) = 0. However, inference in the reverse regression is less prone to size distortions. Consequently, Hodrick (1992) also proposed testing the hypothesis that b(h) = 0 by testing the implication that g(h) = 0 in the reverse regression (equation (2)). This can be implemented by the Wald statistic:

 1 ^xx ðhÞ^g ðhÞ ^xx ðhÞvar utþ1 xðt hÞ V F1 ¼ ðT  hÞ^g ðhÞ′V

(4)

ð hÞ ^xx ðhÞ is the sample variance using a heteroskedasticity-robust estimate of the variance of utþ1 xt , where V ð hÞ 2 of xt . This statistic also has a w (p) asymptotic distribution under the null. Note that Hodrick proposed the reverse regression in addition to his standard errors 1B, where the latter are alternative standard errors for the forward regression. Both apply only as tests of the hypothesis of no predictability.

Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

356

M. WEI AND J. H. WRIGHT

2.1. Testing the Hypothesis of Some Predictability via Reverse Regressions However, the evidence for some predictability in asset returns at long horizons is quite strong, and we are perhaps more interested in testing other hypotheses about b(h), or forming a confidence set for it. The first contribution of this paper is to propose a method for inference on b(h) in a longhorizon regression that is based on the reverse regression logic, but which goes beyond just testing the null that b(h) = 0. Like the work of Hodrick (1992), its formal justification relies on covariance stationarity. 1 ðhÞ ðhÞ The idea is that from equation (3), under covariance-stationarity bðhÞ ¼ Vxx Vxx g , and so inference (h) (h) can be used for inference on b , taking account of the distribuabout g from the reverse regression   ð hÞ

ðhÞ1 cov rtþ1 ; xt tion of the xts. Since gðhÞ ¼ Vxx

, we only need to adjust the numerator of the reverse

regression, as   ðhÞ 1 cov rtþ1 ; xt bðhÞ ¼ Vxx

(5)

  ð hÞ We now describe concretely how to use equation (5) for inference on b(h). Let θ1 ¼ cov rtþ1 ; xt and     ð h Þ ð h Þ 1 1 ^ 1 ¼ ðT  hÞ ΣT1 ðrtþ1 rÞ xt  xðhÞ ¼ ðT  hÞ ΣTh ðrtþh rÞ x xðhÞ θ2 = Vxx. Also let θ t¼h t¼1 tþh1  ^ 2 ¼ ðT  hÞ1 ΣTh ðxt  xÞðxt  xÞ′ be the sample counterparts where r ¼ T 1 ΣT rt , x ¼ and θ t¼1 t¼1 ð hÞ

ð hÞ Th xt and xðhÞ ¼ ðT  hÞ1 ΣT1 ¼ θ1 ðT  hÞ1 Σt¼1 2 θ1 and assume that t¼h xt . We have b

  ^  θ !d N ð0; V Þ T 1=2 θ      ′ ′ ^ ^ ′; vech θ ^ 2 ′ and V is 2p times the spectral density at frequency ¼ θ where θ ¼ θ1′; vechðθ2 Þ′ , θ 1   ðhÞ ðrtþh  r Þðxtþh1  x ðhÞ Þ V11 V12 . zero of , which can be partitioned conformably as V ¼ V21 V22 vechððxt x Þðxt  x Þ′Þ

ð

Þ

We can then use the delta method for inference on b(h), using the fact that b(h) is a nonlinear function of θ that is itself root-T consistently estimable and asymptotically normal. Concretely, consider the estimator ~ðhÞ ¼ θ ^1 ^1 θ b 2

(6)

It follows that 

ðhÞ

~ T 1=2 b

 bðhÞ



@bðhÞ @bðhÞ V !d N 0; @θ @θ



! (7)

    ðhÞ ′θ1 θ1 Dp , and Dp denotes the duplication matrix, allowing conven θ where @b@θ ¼ θ1 1 2 2 2 tional Wald tests to be conducted. Implementation of the proposed confidence intervals requires choosing a specific estimator of V. We use a Newey–West estimator with lag length equal to h. We refer to this method as a ‘reverse regression’ estimate even though it does not require explicit estimation of equation (2), because it is based on assessing the covariance between one-period returns and the h-period sum of the predictor. We henceforth call this the reverse regression delta method approach to inference. Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

LONG-HORIZON REVERSE REGRESSIONS

357

3. REVERSE REGRESSIONS WITH NEAR UNIT ROOTS AND SOME PREDICTABILITY The second contribution of the paper is to derive the asymptotic distributions of reverse regression tests when there is some predictability and the predictors are so persistent that it is suitable to model them as having roots local to unity. For this purpose, we consider the model rtþ1 ¼ ar þ xt′b þ er;tþ1

(8)

ðI  ΦLÞxt ¼ ax þ BðLÞex;t

(9)

with t = 1, 2, . . . T, where the following assumptions are made.   ′ Assumption 1. er;t ; ex;t ′ is a martingale difference sequence with 2 + d finite moments for some d > 0, where r is 2p times the zero-frequency cross-spectral density between ex, t and er, t (the long-run correlation between ex, t and er, t).

Assumption 2. B(L) is a 1-summable matrix lag polynomial with all roots outside the unit circle.

Assumption 3. Φ = I + T 1C. The matrix C = diag(c1, c2, . . . cp) is a fixed diagonal matrix where ci ≤ 0 8 i. We write the matrix as c in the case of a scalar predictor (p = 1). Having a diagonal matrix C guarantees that while each of the regressors has a near-unit root, there is no linear combination of them that is integrated of a different order.

Assumption 4. ar = ax = 0 (an assumption that involves no loss of generality). The near-I(1) parameterization in Assumption 3 is not designed to be a literal description of the datagenerating process, as we do not believe that predictors become more persistent as the sample size increases. Rather it is a well-known device that is designed to provide a good approximation to the small-sample behavior of estimators and test statistics when times series are highly persistent (Phillips, 1987; Stock, 1991, 1996). In this section, we derive the limiting distributions of reverse regression Wald tests under this near-I (1) parameterization. It should be emphasized that our objective in doing this is not to provide alternative usable critical values. This is because the distributions will depend on the parameter C, which is in practice not known, and not consistently estimable.1 Rather, what we are considering is how severe the size distortions will be when the researcher simply uses the incorrect conventional w2 critical values. We consider the h-period predictive regression ðhÞ

ð hÞ

rtþh ¼ aðhÞ þ xt′bðhÞ þ etþh

(10)

but also let the horizon be an increasing function of the sample size to represent the idea that the forecast horizon is non-negligible relative to the sample size (as in Richardson and Stock, 1989, and Stock, 1996). Thus h is equal to [lT], where l is the forecast horizon as a fraction of the sample size T, and [.] 1

Some papers get around this problem by forming a confidence set for C and then appealing to the Bonferroni inequality (Campbell and Yogo, 2006; Torous et al., 2004; Rossi, 2007). But these papers all consider just one predictor because if there are multiple predictors the number of nuisance parameters in C makes this strategy impractical.

Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

358

M. WEI AND J. H. WRIGHT

denotes the integer part. The set-up of local-to-unit roots and a horizon that is an increasing function of the sample size was also considered by Valkanov (2003) and Rossi (2005, 2007). Under these assumptions,2 it is well known from Phillips (1987) that ½Ts

T 1=2 Σt¼1 er;t ! sW ðsÞ T 1=2 x½Ts ! Ω1=2 JC ðsÞ   T 1=2 x½Ts  x ! Ω1=2 JCm ðsÞ ðhÞ T 3=2 x½Ts ! Ω1=2 JC ðsÞ   m ð hÞ T 3=2 x½Ts  xðhÞ ! Ω1=2 J C ðsÞ where s 2 [0, 1], s2 = var(er, t), Ω = B(1)var(ex, t)B(1)′, JC(s) denotes an Ornstein–Uhlenbeck process defined by dJC ðsÞ ¼ CJC ðsÞ þ dV ðsÞ Rs Rl 1  m  JCm ðRsÞ ¼ JC ðsÞ  1l 0 JC ðoÞdo , J C ðsÞ ¼ sl JC ðoÞdo (defined for s ≥ l), J C ðsÞ ¼ J C ðsÞ  1 1  1l l J C ðoÞdo , and V(s) and W(s) are px1 and scalar standard Brownian motions with correlation r. Following Rossi (2007), we have the following results:   h1 i Φ b ¼ ðI  ΦÞ1 I  Φh b ¼ ½TC1 ðexpðClÞ  I Þ þ Oð1Þb bðhÞ ¼ Σi¼0

h1  i1 i   ðhÞ etþh ¼ Σhi¼1 er;tþi þ Σi¼1 Σk¼0 Φ Bð1Þex;tþhi ′b þ op T 1=2 where exp(Cl) denotes a diagonal matrix where each diagonal element is the exponent of the product of the ith diagonal element of C and l. Having completed the set-up, first consider the Wald statistic testing the hypothesis of no predictability (g(h) = 0) in the reverse regression (equation (4)). Theorem 1 provides the limiting distribution of this test statistic with near unit roots, under the null hypothesis of no predictability. Theorem 1. Under the null b(h) = g(h) = 0, in the limit as T goes to infinity: Z

1

F1 ! l

J mC ðsÞdW ðsÞ

′ Z

1 l

J mC ðsÞJ mC ðsÞ′ds

1 Z

1 l

J mC ðsÞdW ðsÞ

The proofs of the theorems are provided as supporting information in an online web appendix. The reverse regression Wald statistic does not have a standard w2 null limiting distribution, and its distribution depends on the unknown local-to-unit root parameter that is not consistently estimated. Therefore we cannot obtain usable critical values for the asymptotic distribution of F1. Nonetheless, for any parameter configuration, we can compute the size distortions when standard w2 critical values are used. These are reported in Table I in the case p = 1 (scalar regressor) for different choices of c, r and l, using the 5% nominal significance level. As can been seen from the table, the Hodrick test that uses conventional critical values is close to being asymptotically correctly sized, except when c is equal to zero, or very close (1 or 2). And even when c is very close or equal to zero, the effective size is around We think of the case where ci < 0 8 i as the leading case, but can accommodate exact unit roots (ci = 0) as well. The matrix C 1(exp(Cl)  I) is diagonal, and if there are some exact unit roots the relevant diagonal elements of this matrix are set equal to l.

2

Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

359

LONG-HORIZON REVERSE REGRESSIONS

Table I. Asymptotic probability that F1 exceeds the w2(1) critical values

l = 0.02 r=0 r = 0.5 r = 0.9 l = 0.05 r=0 r = 0.5 r = 0.9 l = 0.1 r=0 r = 0.5 r = 0.9

c =  30

c =  25

c =  20

c =  15

c =  10

c=5

c=2

c=1

c=0

5.5 4.7 7.4

5.8 5.1 7.4

5.3 4.7 7.6

4.7 4.7 8.1

4.1 4.7 8.7

3.5 5.5 10.4

4.3 7.0 14.6

4.0 8.8 17.7

4.8 10.5 21.4

4.1 4.6 6.5

3.9 4.3 7.0

3.6 4.1 6.8

3.5 4.5 7.0

3.3 4.8 8.0

3.8 5.3 9.5

4.2 7.4 13.9

4.0 8.2 16.3

4.2 10.2 21.0

3.5 3.9 6.1

3.5 4.2 6.4

3.5 4.3 6.1

3.6 4.1 6.9

3.8 4.4 7.6

3.8 5.5 9.1

4.3 7.3 12.4

4.7 8.1 16.0

5.4 9.1 19.4

Note: This table shows the probability that the limiting distribution of F1 exceeds 3.84, the upper 5th percentile for the w2(1) distribution. Entries were obtained by simulating the limiting distribution derived in Theorem 1, replacing continuous processes by discrete approximations of length 1000. Results are shown for different values of c (the local-to-unit root parameter), r (the longrun correlation between the shocks in assumption 1) and l (the forecast horizon as a fraction of the sample size).

20%, at worst. This is in marked contrast to a t-test based on the forward regression which diverges at the rate T1/2 for all values of c (Valkanov, 2003). The results for the exact unit root case are entirely consistent with the simulations of Hodrick (1992). To give an idea of the reasonable range for c, in the empirical applications below the OLS estimate of c ranges from 5 to 15. Next consider the Wald statistic testing the hypothesis that b(h) = b0 using Hodrick standard errors 1B:  ð hÞ ′  ð hÞ  ^  b W 1 b ^ b F2 ¼ b 0 0 22

(11)

The test that compares F2 with conventional w2 critical values is justified only as a test of the null hypothesis that b(h) = 0, and only under covariance stationarity. Here we are considering its properties with near unit roots and a more general null hypothesis. Theorem 2 provides the null limiting distribution of this test statistic. Theorem 2.

Under the null b(h) = b0, in the limit as T goes to infinity: F2 ! ðx1 þ x2 b0 Þ′Ξ1 ðx1 þ x2 b0 Þ

where Z x1 ¼ 0

1l

JCm ðsÞJCm ðsÞ′ds

1 Z 0

1l

JCm ðsÞðW ðs þ lÞ  W ðsÞÞds;

R 1 R   1l 1l m ′ds Ω1=2 ′C ~  I;  J ð s Þ J ð s þ l Þ x2 ¼ s1 0 JCm ðsÞJCm ðsÞ′ds C C 0 hR i1 hR ihR i1 1l 1 m 1l m m ′ Ξ ¼ 0 JCm ðsÞJCm ðsÞ′ds JC ðsÞJCm ðsÞ′ds l J C ðsÞJ C ðsÞ ds 0 ~ ¼ ½expðClÞ  I 1 C. Table II shows the simulated asymptotic rejection rates from comparing the and C test statistic F2 to the 5% conventional w2 critical values in the case p = 1 (scalar regressor) for different Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

360

M. WEI AND J. H. WRIGHT

Table II. Asymptotic probability that F2 exceeds the w2(1) critical values

b0 = 0, l = 0.02 r=0 r = 0.5 r = 0.9 b0 = 0, l = 0.05 r=0 r = 0.5 r = 0.9 b0 = 0, l = 0.1 r=0 r = 0.5 r = 0.9 b0 = 0.2, l = 0.02 r=0 r = 0.5 r = 0.9 b0 = 0.2, l = 0.05 r=0 r = 0.5 r = 0.9 b0 = 0.2, l = 0.1 r=0 r = 0.5 r = 0.9 b0 = 0.5, l = 0.02 r=0 r = 0.5 r = 0.9 b0 = 0.5, l = 0.05 r=0 r = 0.5 r = 0.9 b0 = 0.5, l = 0.1 r=0 r = 0.5 r = 0.9

c =  30

c =  25

c =  20

c =  15

c =  10

c=5

c=2

c=1

c=0

5.1 4.8 6.9

4.9 4.8 6.7

4.5 4.8 7.2

4.5 4.7 7.6

3.9 5.0 8.9

3.1 5.6 10.0

4.1 7.2 14.2

4.0 8.8 17.4

4.4 11.0 21.6

3.7 4.3 5.7

3.8 4.0 5.8

3.7 4.0 5.9

3 3.9 6.1

3.1 4.6 7.1

2.7 4.6 8.1

4.0 6.6 13.1

4.3 8.4 16.4

3.8 10.6 20.7

3.1 3.8 5.8

3.2 3.6 5.9

3.4 3.8 5.7

3.2 4.0 6.5

3.4 4.3 6.6

3.4 4.8 8.4

3.6 6.8 12.4

4.5 8.9 17.1

4.4 10.4 21.1

4.8 6.5 10.2

4.8 6.4 10.3

4.6 6.1 10.8

3.8 6.4 11.3

4.1 6.8 11.9

3.2 7.5 15.2

4.4 10.5 20.9

4.2 11.8 24.9

4.9 15.3 29.9

4.0 6.2 9.6

4.2 5.8 9.5

3.4 5.8 9.8

3.6 6.1 10.2

3.1 6.7 10.6

3.1 7.4 13.6

4.1 10.4 18.9

4.8 12.3 23.6

4.9 14.1 28.0

3.4 5.9 11.1

3.1 5.5 10.3

3.4 5.7 10.2

3.2 5.7 9.8

3.1 5.6 10.7

3.4 7.4 13.5

4.2 10.3 19.0

4.8 12.4 23.8

5.1 14.2 27.1

5.4 11.0 16.2

5.6 10.8 16.2

4.7 9.8 16.5

4.6 10.0 16.8

4.8 9.9 17.8

4.5 10.7 21.1

5.1 15.7 27.6

5.8 17.9 34.7

7.1 21.3 42.3

4.4 10.1 16.9

4.7 10.2 16.3

4.4 9.9 16.1

4.2 9.7 16.7

4.7 9.6 17.2

3.9 11.4 20.6

5.8 15.8 28.8

6.1 18.9 34.3

6.3 21.6 38.8

5.0 10.8 18.0

4.4 10.0 17.7

4.3 9.9 16.4

4.3 10.2 17.4

4.1 10.4 16.9

4.6 11.9 20.0

6.1 16.3 28.4

6.4 19.1 32.2

6.9 20.9 37.5

Note: This table shows the probability that the limiting distribution of F2 exceeds 3.84, the upper 5th percentile for the w2(1) distribution. Entries were obtained by simulating the limiting distribution derived in Theorem 2, replacing continuous processes by discrete approximations of length 1000. Results are shown for different values of b0 (the hypothesized degree of predictability), c (the local-to-unit root parameter), r (the long-run correlation between the shocks in assumption 1) and l (the forecast horizon as a fraction of the sample size).

choices3 of c, r, l and b0. The asymptotic size of this test is increasing in b0, r, and c. It is not surprising that the size distortions are increasing in b0 because the equivalence of forward and reverse regressions applies only in the case of no predictability. For values of c at or below 10, the size distortions are mild: comparing F2 to w2 critical values yields a test with an asymptotic size of around 15% or less. With roots closer still to the unit circle, the size distortions from treating F2 as though it were w2 distributed get worse, and can be quite large with the combination of exact or near-exact unit roots (c ’ 0), a high degree of predictability and a large value of r (the long-run correlation between the errors in Assumption 1). Finally, we conclude this section by considering the limiting distribution of the Wald statistic based on the reverse regression delta method proposed in Section 2.1 (equations (6) and (7)). This Wald statistic testing the null b(h) = b0 is 3

For r = 0 and l = 0.05, setting b0 = 0.5 as in the lower panels of Table II implies a population R2 value from the forecasting regression of 0.19, 0.32, 0.54, 0.70 and 0.82 for c =  10, -5, -2, -1 and 0 respectively. Setting b0 = 0.2, as in the middle panels of Table II, implies population R2 values of 0.04, 0.14, 0.17, 0.28 and 0.44, for these respective values of c.

Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

361

LONG-HORIZON REVERSE REGRESSIONS

 1 ′          1  1 1 ^ 1 b0 ^ θ ^ 1 b0 ^ θ ^ ′θ ^ 1 Dp V ^ ′θ ^ 1 Dp ′ g1 θ ^ 1 θ ^ 1 θ ^ ^ ^ F3 ¼ θ  θ  θ θ θ 1 2 1 2 2 2 2 2 2 2 (12) Its null limiting distribution is provided in Theorem 3. Under the null b(h) = b0, in the limit as T goes to infinity:

Theorem 3.

F3 ! ðb  b0 Þ′ 





θ1 2

 b

′

θ1 2

 Dp

 V 

11  V21

   ′  V12 ′ 1 1  b θ2 Dp g1 ðb  b0 Þ θ2  V22

where

 V11

 ¼ sf V21

þ

R l R 1lmaxð0;oÞ l

minð0;oÞ

R 1l R minð1ls;lÞ 0

maxðs;lÞ

  l vech θ2 θ1′  ′ ¼ V21 V12 Z l Z  V22 ¼ l

 b ¼ θ1 2 θ 1

i R J mC ðsÞBC ðsÞds þ 1 J mC ðsÞdW ðsÞ l R 1l θ2 ¼ 0 HC ðsÞds   R l R 1maxð0;oÞ joj dsdo ¼ s2 f l lminð0;oÞ GC ðs; oÞBC ðsÞBC ðs þ oÞ 1  l   R 1 R minð1s;lÞ joj þ l maxðls;lÞ GC ðs; oÞ 1  dW ðs þ oÞdW ðsÞ l   R 1 R minð1s;lÞ joj þ l maxðls;lÞ GC ðs; oÞ 1  dW ðs þ oÞBC ðsÞds l   R l R 1maxð0;oÞ joj þ l lminð0;oÞ GC ðs; oÞBC ðs þ oÞ 1  dW ðsÞdog  l θ1 θ1 ′ l θ1 ¼ sΩ1=2

1lmaxð0;oÞ minð0;oÞ

hR

1 l

  joj m dsdo vechðHC ðsÞÞJ C ðs þ l þ oÞ′Ω1=2′BC ðs þ l þ oÞ 1  l   joj m dW ðs þ l þ oÞdsg vechðHC ðsÞÞJ C ðs þ l þ oÞ′Ω1=2′ 1  l

      joj ′ do  l vech θ2 vech θ2 ′ vechðHC ðsÞÞvechðHC ðs þ oÞÞ ds 1  l

    ~ 0 , GC ðs; oÞ¼Ω1=2 J m ðsÞJ m ðs þ oÞ′ Ω1=2 ′, HC ðsÞ¼Ω1=2 JCm ðsÞJCm ðsÞ′ Ω1=2 ′ BC ðsÞ¼s1 JC ðsÞ′ðΩ1=2 Þ′Cb C C 2 and l ¼ l  4l3 . Table III shows the simulated asymptotic rejection rates from comparing the test statistic F3 to the conventional w2 critical values in the case p = 1 (scalar regressor) for different choices of c, r, l and b. The size of this test is again increasing in b0, r, and c. But the size distortions are noticeably milder than for the test comparing F2 to w2 critical values (shown earlier in Table II). The asymptotic rejection rate of the test comparing F3 to conventional w2 critical values does not exceed 20% for any parameter configuration considered here. Also, it is below 16% for all cases in which c is 5 or less. We conclude this section by observing that it would be possible to test the hypothesis of no predictability horizons jointly, taking the supremum of F1, F2 or F3 over all horizons between h i at multiple   lT and lT , effectively testing for predictability at any horizon in this range. The method of Rossi  Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

362

M. WEI AND J. H. WRIGHT

Table III. Asymptotic probability that F3 exceeds the w2(1) critical values

b0 = 0, l = 0.02 r=0 r = 0.5 r = 0.9 b0 = 0, l = 0.05 r=0 r = 0.5 r = 0.9 b0 = 0, l = 0.1 r=0 r = 0.5 r = 0.9 b0 = 0.2, l = 0.02 r=0 r = 0.5 r = 0.9 b0 = 0.2, l = 0.05 r=0 r = 0.5 r = 0.9 b0 = 0.2, l = 0.1 r=0 r = 0.5 r = 0.9 b0 = 0.5, l = 0.02 r=0 r = 0.5 r = 0.9 b0 = 0.5, l = 0.05 r=0 r = 0.5 r = 0.9 b0 = 0.5, l = 0.1 r=0 r = 0.5 r = 0.9

c =  30

c =  25

c =  20

c =  15

c =  10

c=5

c=2

c=1

c=0

4.4 4.7 7.9

4.3 4.9 8.2

3.7 4.8 7.8

3.7 4.9 8.3

3.9 4.4 8.9

3.7 4.7 10.0

4.1 6.5 11.6

4.1 7.7 13.0

4.0 8.7 15.7

3.5 4.1 6.3

3.5 3.6 6.0

3.0 3.5 5.9

3.2 3.4 6.3

3.0 2.9 6.3

3.0 3.7 7.6

3.3 5.1 10.3

3.9 6.4 11.3

3.8 6.4 13.3

2.4 2.5 2.9

2.7 2.5 3.3

2.7 2.6 3.3

2.6 2.8 3.2

3.1 3.4 4.0

4.4 4.1 5.4

5.2 5.3 6.5

5.1 7.6 7.9

5.5 7.3 10.4

3.8 5.5 8.3

4.1 5.7 8.8

3.9 5.7 8.6

4.0 5.7 9.4

4.0 5.6 10.9

4.6 6.8 11.8

5.0 8.4 12.9

4.3 9.1 14.8

5.9 10.8 17.0

3.3 6.1 9.6

3.6 6.2 9.4

3.8 5.9 9.8

4.0 5.4 10.5

4.2 5.8 11.5

4.5 6.9 12.2

5.1 9.4 13.7

4.7 10.5 14.8

5.3 9.4 15.8

3.8 5.3 9.0

3.7 5.2 8.4

3.5 5.1 8.0

3.6 5.6 9.0

4.0 6.1 9.6

4.6 7.3 10.7

5.5 8.8 11.6

5.9 9.7 11.9

6.1 9.0 13.1

2.7 5.6 7.7

2.7 5.1 8.3

2.6 5.0 8.4

2.7 5.1 8.6

2.5 5.2 9.5

2.3 6.3 10.5

4.2 9.8 14.4

5.2 9.8 18.6

4.3 10.7 17.9

4.9 8.8 11.9

5.0 8.6 12.3

5.4 8.3 12.6

5.6 8.1 12.9

5.5 8.8 14.2

5.9 10.1 13.9

7.0 11.1 15.4

8.0 12.6 16.1

7.9 13.3 18.9

6.5 10.7 15.1

6.8 9.8 14.8

6.8 9.9 14.6

6.9 9.7 15.3

6.5 10.2 15.3

7.8 12.0 15.5

9.0 14.8 17.3

8.8 14.4 17.8

10.4 14.4 19.4

Note: This table shows the probability that the limiting distribution of F3 exceeds 3.84, the upper 5th percentile for the w2(1) distribution. Entries were obtained by simulating the limiting distribution derived in Theorem 3, replacing continuous processes by discrete approximations of length 1000. Results are shown for different values of b0 (the hypothesized degree of predictability), c (the local-to-unit root parameter), r (the long-run correlation between the shocks in assumption 1) and l (the forecast horizon as a fraction of the sample size).

(2007) can also be used to test predictability at multiple horizons jointly, but is the only other method which can do so in the presence of roots local-to-unity. The limiting distribution of this test is given by the supremum of the distribution in Theorems 1, 2 or 3 over all values of l from l to l. In the stationary  predictors case, the limiting distribution is the supremum of a functional of Brownian motions (Richardson and Stock, 1989). We conjecture that, just as F1, F2 and F3 can be reasonably well approximated by conventional w2 critical values, so the suprema of F1, F2 and F3 can be reasonably approximated by the critical values of Richardson and Stock (1989) that are in turn free of nuisance parameters, and have been simulated and tabulated.

4. MONTE CARLO SIMULATIONS The motivation for considering the proposed approaches to inference via reverse regressions is that they may work better in small samples. Like the conventional forward regression methods, their Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

363

LONG-HORIZON REVERSE REGRESSIONS

justification is based on an assumption of stationarity, and methods that assume stationarity often fare poorly in the presence of a unit root, or a near unit root, at least in empirically relevant sample sizes. However, the results of the previous section suggest that they might in practice be quite robust to near non-stationarity. The intuition is that they back out the implied coefficient in the long-horizon regression from the correlation between one-period returns and a long-run sum of the predictor, which avoids a spurious regression. How well the reverse regression methods actually work in finite samples with nearly non-stationary predictors is the key practical question that we answer in a Monte Carlo experiment. In this experiment, returns and the predictor follow a VAR(1): 

rtþ1 xtþ1



    er;tþ1 rt ¼Φ þ ex;tþ1 xt

where the errors are i.i.d. normal with mean zero and covariance matrixVe. Following Campbell (2001),  2    0 a sr rsr sx . As the units of measurement for returns and the we set Φ ¼ and Ve ¼ 0 ’ rsr sx s2x predictors are arbitrary, we can normalize sr = sx = 1 without loss of generality, leaving three free parameters: a, r and ’. h 2 The slope coefficient in the long-horizon regression is bðhÞ ¼ a 1’ 1’ . The population R in this regression is R2 ¼

bðhÞ2    ′ bðhÞ2 þ ð1  ’2 ÞΣhi¼1 e′1 Σij¼1 Φ j1 Ve Σij¼1 Φ j1 e1

where e1 = (1, 0)′. So long as we fix the sign of a, R2 will be a monotone increasing function of a (holding the other parameters fixed). Figure 1 plots the effective coverage of several different confidence intervals for the long-horizon slope coefficient (b(h)) against the population R2 for the case4 where a ≥ 0 with different choices of h and r. The coverage rates of the confidence sets are of course 1 minus the sizes of the test that b(h) is equal to its true value. The sample size is T = 500, which corresponds to about 40 years of monthly data, the nominal coverage is 95%, and the parameter ’ is 0.98. The confidence intervals considered are: (i) the ordinary confidence intervals based on estimating equation (1), using Newey–West standard errors with a lag truncation parameter of h; (ii) the confidence interval based on estimating equation (1) using standard errors 1B of Hodrick (1992); (iii) confidence intervals using the reverse regression delta method (equations (6) and (7)); (iv) confidence intervals formed using the method proposed by Rossi (2007), building on earlier work of Campbell and Yogo (2006)5; and (v) infeasible confidence intervals using the true asymptotic variance of the OLS estimate of the forward regression, if the parameters of the model were known. The supporting information Appendix

4

The point of plotting coverage against population R2 rather than a is just that this seems easier to interpret. Here and throughout this paper, the ‘Rossi’ confidence interval refers to the following approach, proposed by Rossi (2007). Assume that the scalar predictor follows an AR(p) with a root local to unity and with p chosen by the Bayesian information criterion. A 100(1-/2)% confidence interval for this local-to-unit root is formed by the DF-GLS test (Elliott et al., 1996). Armed with this confidence interval, and the Q-test of Campbell and Yogo (2006) for testing hypotheses on b(1) for a given local-to-unit root, we can form the Bonferroni confidence interval for b(1), of coverage at least 100(1-)%, as described by Campbell and Yogo. Finally, equation 2.7 of Rossi (2007) gives an implied confidence interval for b(h) under the assumption that h = O(T). This has asymptotic coverage of at least 100(1-) percent.

5

Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

364

M. WEI AND J. H. WRIGHT

h= 24, =-0.9

h= 24, =0

h= 24, =0.9

h= 48, =-0.9

h= 48, =0

h= 48, =0.9

1 0.9 0.8 0.7 0.6 0.5

1 0.9 0.8 0.7 0.6 0.5 0

0.1

0.2

0.3

0.4

R2

0

0.1

0.2

R2

0.3

0.4

0

0.1

0.2

0.3

0.4

R2

Figure 1. Effective coverage of alternative confidence intervals (nominal level: 95%) for the slope coefficient in an h-step-ahead predictive regression in the Monte Carlo simulation described in the text. In this figure, the sample size is T = 500 and the autoregressive parameter is ’ =0.98. The confidence intervals are as follows: (i) Circles, ordinary confidence intervals based on estimating equation (1), using Newey–West standard errors with a lag truncation parameter of h; (ii) solid line, confidence interval based on estimating equation (1) using standard errors 1B of Hodrick (1992); (iii) dotted line, confidence intervals using the reverse regression delta method proposed in this paper (equations (6) and (7)); (iv) squares, confidence intervals using the method of Rossi (2007); (v) dashed line, the infeasible confidence intervals based on estimating equation (1) using the true asymptotic variance of the slope coefficient. In all cases, the coverage is plotted against R2, which is a monotone function of a, given the normalization a ≥ 0. Other simulations are included in the supporting information Appendix

includes analogous figures for different choices of h, T and ’, and considering both positive and negative values of a. The results that we show in Figure 1 are representative of the results that apply in these other cases. The confidence interval for b(h) formed using conventional Newey–West standard errors has coverage that is considerably too low, regardless of whether there is no predictability or some predictability. In many cases, it has an effective coverage around 60%. The confidence interval formed using Hodrick standard errors 1B generally does much better. It gets the coverage about right in the case of no predictability (b(h) = 0, corresponding to R2 = 0). Even with mild predictability, the coverage does not fall below about 80%. It is only when the predictability is considerable that it can have coverage that is substantially too low, but then it can do really badly. All this is consistent with the asymptotic results under near unit roots in the previous section. The confidence interval formed using the reverse regression delta method approach proposed in this paper always has effective coverage of at least 80%, and usually a good bit more. The case where this fares better than Hodrick standard errors 1B is if the predictability is considerable (population R2 of above about 30%). The Rossi confidence interval has high coverage that is consistently above the nominal level, which is to be expected as it is conservative by design. The infeasible confidence interval that uses the true asymptotic variance of the OLS estimate of the forward regression has coverage that is fairly similar to that of the reverse regression delta method (in most cases 85% or higher), suggesting that the benefits of the reverse regression delta method come from more stable variance estimation. Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

365

LONG-HORIZON REVERSE REGRESSIONS

The supporting information appendix also shows results for confidence intervals based on the forward-regression using Newey–West standard errors and lag truncation parameter of h, but using the critical values from the alternative ‘fixed-b’ asymptotics6 of Kiefer and Vogelsang (2005). These confidence intervals do better than their conventional Newey–West counterparts, but still have effective coverage that is too low—around 75% at long horizons. Although in this Monte Carlo simulation we know the true value of b(h), in practice, of course, the researcher does not know the data-generating process and so it is important that the coverage of a confidence interval be as close as possible to the nominal level uniformly in reasonable values of b(h). In this regard, Figure 1 shows that among the feasible approaches to forming confidence intervals the reverse regression delta method and Rossi methods do best, but the Rossi method is only applicable with a single predictor. Coverage is of course not the only criterion for a confidence interval; precision matters too. The median width of the alternative confidence intervals is shown in Figure 2. As before, results for other parameter configurations are in the supporting information Appendix. There is a familiar tradeoff—confidence intervals with the highest coverage have the disadvantage of being relatively wide, while tighter confidence intervals have lower effective coverage rates. The two confidence intervals based on reverse regressions have comparable width, but both are wider than the Newey–West confidence intervals. That seems to be a price worth paying given that the conventional methods consistently fail to get an effective coverage rate that is even close to the nominal level. The Rossi confidence intervals, being conservative, tend to be wider still.

5. FORECASTING EXCESS BOND RETURNS We now apply the reverse regression methodology to an important predictive regression in finance: the prediction of excess bond returns using the term structure of interest rates. Many authors have found predictability in long-horizon excess bond returns. For example, Fama and Bliss (1987) found that the steeper the yield curve, the higher are the subsequent excess returns on holding a long-maturity bond. In an influential paper, Cochrane and Piazzesi (2005) argued that while the slope of the yield curve has some predictive power for bond returns, using a combination of forward rates gives better forecasting performance, and that a ‘tent-shaped’ function of forward rates has remarkable predictive ability for excess bond returns with R2 values up to 44%. These results represent strong evidence against the expectations hypothesis of the term structure. Yet one might wonder if they are—at least in part—an artifact of small-sample econometric problems. Let Pn, t be the price of an n-month zero-coupon  bond in month t; the per annum continuously log Pn;t . The excess return (over the 1-month risk-free compounded yield on this bond is zn;t ¼  12 n rate) from buying this bond in month t and selling it in month t + 1 is     rn;tþ1 ¼ log Pn1;tþ1  log Pn;t  z1;t =12 where z1, t is the 1-month yield (at an annualized rate). We can then construct the h-period excess return ð hÞ rn;tþh ¼ Σhj¼1 rn;tþj. This is very close to—though not exactly the same as—the excess return on holding an n-month zero-coupon bond for h months over the return on holding the h-month bond for that same holding period, considered by Cochrane and Piazzesi (2005) and others. This ‘fixed-b’ asymptotics treats the bandwidth as being a fixed fraction of the sample size, with the result that t-statistics have non-standard, but pivotal, asymptotic distributions that can easily be used to construct confidence intervals. A related approach is considered by Phillips et al. (2006), who set the bandwidth equal to the sample size, but downweight the highest-order sample autocovariances by exponentiating the kernel.

6

Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

366

M. WEI AND J. H. WRIGHT

h=24, =-0.9

h=24, =0

h=24, =0.9

h=48, =-0.9

h=48, =0

h=48, =0.9

2

1.5

1

0.5

0 8

6

4

2

0 0

0.1

0.2

0.3

0.4

0

R2

0.1

0.2

0.3

0.4

R2

0

0.1

0.2

0.3

0.4

R2

Figure 2. Median width of alternative confidence intervals (nominal level: 95%) for the slope coefficient in an hstep-ahead predictive regression in the Monte Carlo simulation described in the text. In this figure, the sample size is T = 500 and the autoregressive parameter is ’ =0.98. The confidence intervals are as follows: (i) circles, ordinary confidence intervals based on estimating equation (1), using Newey–West standard errors with a lag truncation parameter of h; (ii) solid line, confidence interval based on estimating equation (1) using standard errors 1B of Hodrick (1992); (iii) dotted line, confidence intervals using the reverse regression delta method proposed in this paper (equations (6) and (7)); (iv) squares: confidence intervals using the method of Rossi (2007); (v) dashed line, infeasible confidence intervals based on estimating equation (1) using the true asymptotic variance of the slope coefficient. In all cases, the coverage is plotted against R2, which is a monotone function of a, given the normalization a ≥ 0. Other simulations are included in the supporting information Appendix

A basic premise of term structure analysis is that today’s yield curve can be used to forecast future yield curves and, hence, the excess returns on long bonds. For example, Fama and Bliss (1987) argue that when the yield curve is steep, long-term bonds can be expected to subsequently have high excess returns. Accordingly, researchers project excess returns onto the term structure of interest rates at the start of the holding period, running regressions of the form ð hÞ

ðhÞ

rn;tþh ¼ a þ xt′bðhÞ þ etþh

(13)

where xt is some vector of yields or spreads at time t . We considered estimates of b(h) formed from estimating equation (13) with the long-term bond maturity, n, ranging from 2 to 5 years and the holding period, h, of 12 months. End-of-month data on zerocoupon bond yields and risk-free rates from the Fama–Bliss dataset were used.7 The sample period is 1964:01–2009:12. We first used the spread between the 5-year and 1-month yield as the sole predictor, xt. Panel A of Table IV shows the forward regression estimates (equation (1)) of b(h) along with Newey–West standard errors and Hodrick standard errors 1B. The Newey–West standard errors indicate statistical 7

We have also used the yields from the dataset of Gürkaynak et al. (2007), and obtained similar results. Note also that the Fama– Bliss dataset only gives yields at 1, 2, 3, 4 and 5 year maturities (in addition to short-term Following Campbell  risk-free  rates).  1 1 year bond as exp  n  12 zn;t , although this procedure and Shiller (1991) and others, we approximate the price of an n  12 introduces a small bias that does not go away asymptotically (Bekaert et al., 1997).

Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

367

LONG-HORIZON REVERSE REGRESSIONS

Table IV. Regression of n-year excess bond returns on the yield curve slope

Panel A: Forward regression estimate and SEs Slope coefficient Panel B: Reverse regression Wald statistics Test statistic Panel C: Reverse regression delta method estimate and SEs Slope coefficient

n=2

n=3

n=4

0.46 (0.28) [0.47]

0.94 (0.39)** [0.63]

1.37 (0.49)*** [0.80]*

n=5

1.88 (0.59)*** [0.92]**

0.75

1.80

2.49

3.64*

0.45 (0.40)

0.93 (0.55)

1.35 (0.69)*

1.85 (0.80)**

Panel D: Rossi confidence interval (0.41, 0.84) Panel E: Forward regression intervals using fixed-b asymptotics Bandwidth h (0.10,1.02) Bandwidth T (0.43,1.35)

(0.13, 1.68) (0.15, 2.43)

(0.30,3.36)

(0.16,1.72) (0.37,2.36) (0.33,2.22) (0.12,2.86)

(0.69,3.07) (0.20,3.56)

Note: This table reports the results of a regression of n-year excess bond returns on the slope of the yield curve with a holding period of h = 12 months. Panel A shows the forward regression estimate (equation (1)) along with the Newey–West standard errors in parentheses and Hodrick standard errors 1B in square brackets. Panel B shows the reverse regression Wald statistics testing the hypothesis that g(h) = 0 in equation (2), which have a w2(1) null limiting distribution. Panel C shows the estimates and standard errors (in parentheses) using the reverse regression delta method (equations (6) and (7)). Panel D shows the 95% confidence interval formed by the method of Rossi (2007). Panel E shows the 95% confidence intervals from the forward regression using Newey–West standard errors, but the critical values from the fixed-b asymptotics of Kiefer and Vogelsang (2005), instead of normal critical values, and with lag truncation parameters h and T. The data are Fama–Bliss yields spanning 1964:01–2009:12. Asterisks denote significance at the *10%; **5%; ***1% levels.

significance at the 5% level (except for n = 2). However, using Hodrick standard errors 1B, the slope coefficient is statistically significant at the 5% level only when n = 5. Panel B of Table IV reports the reverse regression Wald test (equation (4)). The hypothesis that b(h) = g(h) = 0 is not rejected at the 5% level for any maturity n. Panel C shows the reverse regression delta method estimates and standard errors of b(h) (from equations (6) and (7)). These are significant at the 5% level only in the case n = 5. Overall, the use of the reverse regression methods shows only marginal evidence of a relationship between the slope of the yield curve and subsequent excess bond returns. Panel D of Table IV shows the 95% confidence intervals formed by the method of Rossi (2007) (see footnote 5). These likewise show only marginal evidence of predictability of bond returns, as the confidence intervals for b(h) straddle zero in all cases except n = 5. The final panel shows the confidence intervals from the forward regression with Newey–West standard errors but using the critical values of Kiefer and Vogelsang (2005), rather than standard normal critical values. 5.1. Forecasting Excess Bond Returns with the Term Structure of Forward Rates We next follow Cochrane and Piazzesi (2005) in estimating equation (13), using as the predictors the 1-year yield, and the 1-year forward rates ending in 2, 3, 4 and 5 years. This regression has five predictors, which limits the number of approaches that are available to handle econometric inference in this context, but is not a problem for the reverse regression methodology. Table V shows p-values from the Wald test of the hypothesis that b(h) = 0 using the Newey–West standard errors, Hodrick standard errors 1B, the reverse-regression Wald test and the reverse regression delta method. In this case, all of the Wald tests are significant at conventional significance levels, but the Newey–West p-values are very extreme—we report them in scientific notation—and they are around 10 6 ! Meanwhile, the other Wald statistics that are all based on reverse regressions give p-values around 1%. Even these p-values may still somewhat overstate the case for predictability, as it should be remembered that the reverse regression tests are generally modestly oversized. Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

368

M. WEI AND J. H. WRIGHT

Table V. Cochrane–Piazzesi regression of n-year excess bond returns on forward rates: p-values from various Wald tests for the joint significance of the forward rates

Newey–West Hodrick standard errors 1B Reverse regression Wald Reverse regression delta method

n=2

n=3

n=4

n=5

1  10 5 0.031 0.021 0.014

2  10 6 0.022 0.013 0.005

9  10 8 0.008 0.002 0.001

3  10 6 0.029 0.014 0.003

Note: This table reports the p-values obtained from Wald tests of the hypothesis that the slope coefficients are jointly equal to zero in a regression of excess bond returns on 1-year forward rates ending 1, 2, 3, 4 and 5 years hence with a holding period of h = 12 months. The Wald statistics are compared with w2(5) critical values. The Wald statistics are based on (i) the forward regression (equation (1)) with Newey–West standard errors, (ii) the forward regression with Hodrick standard errors 1B, (iii) the reverse regression Wald statistic, and (iv) the reverse regression delta method (equations (6) and (7)). The data are Fama–Bliss yields spanning 1964:01–2009:12. Asterisks denote significance at the *10%; **5%; ***1% levels, respectively.

Finally, in Table VI, we report the point estimates of the elements of b(h) and the associated standard errors for the regression of excess bond returns on the term structure of forward rates. The top panel shows the forward regression estimates and standard errors, along with Newey–West standard errors and Hodrick standard errors 1B. The bottom panel shows the reverse regression delta method estimates and standard errors. The two sets of point estimates are virtually identical, and show the ‘tent-shaped’ pattern highlighted by Cochrane and Piazzesi (2005). However, the two sets of standard errors that are

Table VI. Cochrane–Piazzesi regression of n-year excess bond returns on forward rates: alternative estimators and standard errors n=2 Panel A: Forward regression and standard errors 1-year yield 1–2 years forward 2–3 years forward 3–4 year forward 4–5 years forward Panel B: Reverse regression delta method 1-Year yield 1–2 years forward 2–3 years forward 3–4 years forward 4–5 years forward

n=3

n=4

n=5

 0.60 (0.38) [0.79]  0.15 (0.67) [1.08] 1.77 (0.61)*** [1.19] 0.62 (0.41) [0.64]  1.27 (0.44)*** [0.55]**

 1.27 (0.54)** [1.05]  0.25 (0.90) [1.43] 3.23 (0.82)*** [1.69]* 0.69 (0.58) [0.88]  1.95 (0.60)*** [0.79]**

 1.78 (0.68)*** [1.40]  0.27 (1.11) [1.98] 3.55 (0.99)*** [2.03]* 1.85 (0.74)** [1.23]  2.83 (0.75)*** [1.03]***

 2.37 (0.79)*** [1.57]  0.08 (1.28) [2.28] 4.00 (1.18)*** [2.33]* 1.66 (0.87)* [1.47]  2.59 (0.87)*** [1.21]**

 0.61 (0.64)  0.12 (1.02) 1.79 (1.00)* 0.61 (0.71)  1.30 (0.53)**

 1.28 (0.89)  0.21 (1.43) 3.26 (1.34)** 0.68 (0.93)  2.00 (0.73)***

 1.80 (1.14)  0.20 (1.86) 3.59 (1.64)** 1.84 (1.23)  2.90 (0.93)***

 2.40 (1.30)* 0.03 (2.13) 4.06 (1.92)** 1.64 (1.41)  2.72 (1.04)***

Note: This table reports the results of a regression of n-year excess bond returns on the 1-year forward rates ending 1, 2, 3, 4 and 5 years hence. The holding period is h = 12 months. Panel A shows the forward regression estimate (equation (1)) along with the Newey–West standard errors in parentheses and Hodrick standard errors 1B in square brackets. Panel B shows the estimates and standard errors (in parentheses) using the reverse regression delta method (equations (6) and (7)). The data are Fama–Bliss yields spanning 1964:01–2009:12. Asterisks denote significance at the *10%; **5%; 1% levels. Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

369

LONG-HORIZON REVERSE REGRESSIONS

based on the reverse regression methodology are both considerably larger than the Newey–West standard errors. Typically, they are roughly twice as big.8 The regression of Cochrane and Piazzesi (2005) implies that the ex ante risk premium on buying a 5-year bond and going short a 1-year bond are both large and volatile. Some find this surprising and implausible (Sack, 2006). In this regard, it seems relevant that the underlying parameters from the return prediction equation appear to be quite imprecisely estimated.

6. FORECASTING EXCESS STOCK RETURNS We also consider the regression of h-month cumulative excess returns for the value-weighted dividendinclusive CRSP index on the log dividend yield and the 1-month interest rate (Fama–Bliss risk-free rate) at the start of the holding period. The sample period is January 1952–December 2009. The horizons are 6, 12, 24 and 36 months. The supporting information Appendix includes results from the univariate regression of stock returns on the log dividend yield alone, but we focus on this bivariate regression because Ang and Bekaert (2007) and others find that stock predictability is greater for this specification. Table VII reports the coefficient estimates from estimating this regression with both Newey–West standard errors and Hodrick standard errors 1B, as well as the reverse regression delta method estimates and standard errors. Using the forward regression with Newey–West standard errors, the coefficients on the dividend yield and short-term interest rate are significantly positive and negative,

Table VII. Regression of h-month excess stock returns on log dividend yield and 1-month interest rates h=6 Panel A: Forward regression and standard errors Log dividend yield 7.82 (2.32)*** [2.78]*** One-month rate  0.83 (0.35)** [0.39]** Panel B: Reverse regression delta method Log dividend yield 7.69 (2.94)*** One-month rate  0.84 (0.42)** Panel C: Wald statistics Newey–West 15.62*** Hodrick standard errors 1B 10.67* Reverse regression Wald 10.92* Reverse regression delta method 10.26*

h = 12

h = 24

h = 36

14.46 (4.84)*** [5.53]***  1.32 (0.63)** [0.76]*

24.38 (8.54)*** [11.01]**  2.05 (0.75)*** [1.39]

29.98 (9.39)*** [16.50]*  2.64 (0.88)*** [1.88]

14.62 (6.52)**  1.48 (0.92)

24.77 (13.59)*  2.14 (1.66)

25.62 (17.34)  1.90 (1.85)

10.95* 8.45 9.58* 6.61

10.62* 6.03 6.68 3.83

14.87** 4.49 3.00 2.83

Note: This table shows the estimated coefficients in regressions of excess h-month cumulative CRSP value-weighted stock returns (relative to the 1-month rate) on the log dividend yield (divided by 100) and the level of short-term intertest rates. Panel A shows the forward regression estimate (equation (1)) along with the Newey–West standard errors in parentheses and Hodrick standard errors 1B in square brackets. Panel B shows the estimates and standard errors (in parentheses) using the reverse regression delta method (equations (6) and (7)). Panel C shows Wald statistics based on (i) the forward regression (equation (1)) with Newey–West standard errors, (ii) the forward regression with Hodrick standard errors 1B, (iii) the reverse regression Wald statistic, and (iv) the reverse regression delta method (equations (6) and (7)). The sample period is 1952:01–2009:12. Asterisks denote significance at the *10%; **5%; 1% levels. 8 Bekaert and Hodrick (2001) and Bekaert et al. (2001) are other papers arguing that some—but not all—of the evidence against the expectations hypothesis of the term structure is owing to small-sample problems. Those papers are, however, considering the tests of Campbell and Shiller (1991), not the forward rate regressions of Cochrane and Piazzesi (2005).

Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

370

M. WEI AND J. H. WRIGHT

respectively, at all horizons. Using Hodrick standard errors 1B, however, the predictability is significant only at the shorter horizons. This is even more true when using the reverse regression delta method. The Wald statistics give weak evidence for predictability, and only at the shortest horizon. All of this is consistent with Ang and Bekaert (2007), who argue that stock return predictability, such as it is, exists mainly at short horizons. 7. CONCLUSION In this paper, we have revisited the use of reverse regressions for inference in long-horizon forecasting. The reverse regression methodology of Hodrick (1992) assumes stationary predictors and gives only a test of the null hypothesis of no predictability. In this paper, we have evaluated the properties of reverse regression methodologies (including a new variant of the reverse regression) with some predictability and/or near unit roots. We find, both using local-to-unit root asymptotics and Monte Carlo simulations, that the reverse regression with standard w2 critical values offers an approach to inference in longhorizon predictive regressions that avoids serious size distortions, while working easily with an arbitrary number of predictors. In marked contrast, conventional Wald tests using the standard errors of Newey and West (1987) or Hansen and Hodrick (1980) in a long-horizon forecasting regression reject the null far too often, and indeed diverge to infinity under the null in the presence of local-to-unit roots (Valkanov, 2003). We have applied these reverse regressions to re-examine the predictability of excess bond returns using the term structure of interest rates, considered by Fama and Bliss (1987) and Cochrane and Piazzesi (2005). We continue to find some predictability of excess bond returns, contradicting the expectations hypothesis of the term structure, and indicating the existence of time-varying term premia. However, the standard errors on the equation coefficients for predicting excess bond returns are much larger than in the forward regression using conventional heteroskedasticity- and autocorrelation-robust standard errors. We also apply these reverse regressions to excess stock returns and find that the standard errors on the equation coefficients for predicting stock returns are wide, and include zero at all but the shortest horizons.

ACKNOWLEDGEMENTS

We are grateful to Bob Hodrick, Hashem Pesaran and three anonymous referees for very helpful comments on earlier drafts of this manuscript. The manuscript was previously entitled ‘Confidence Intervals for long-horizon predictive regressions via reverse regressions’. The views expressed in this paper are solely the responsibility of the authors and should not be interpreted as reflecting the views of the Board of Governors of the Federal Reserve System or of any other employee of the Federal Reserve System.

REFERENCES

Ang A, Bekaert G. 2007. Stock return predictability: is it there? Review of Financial Studies 20: 651–707. Bekaert G, Hodrick RJ. 2001. Expectations hypothesis tests. Journal of Finance 56: 1357–1394. Bekaert G, Hodrick RJ, Marshall D. 1997. On biases in tests of the expectations hypothesis of the term structure of interest rates. Journal of Financial Economics 44: 309–348. Bekaert G, Hodrick RJ, Marshall D. 2001. Peso problem explanations for term structure anomalies. Journal of Monetary Economics 48: 241–270. Campbell JY. 2000. Asset pricing at the millennium. Journal of Finance 55: 1515–1567. Campbell JY. 2001. Why long horizons? A study of power against persistent alternatives. Journal of Empirical Finance 9: 459–491. Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

LONG-HORIZON REVERSE REGRESSIONS

371

Campbell JY, Shiller RJ. 1991. Yield spreads and interest rate movements: a bird’s eye view. The Review of Economic Studies 58: 495–514. Campbell JY, Yogo M. 2006. Efficient tests of stock return predictability. Journal of Financial Economics 81: 27–60. Cochrane JH, Piazzesi M. 2005. Bond risk premia. The American Economic Review 95: 138–160. Elliott G, Stock JH. 1994. Inference in time series regression when the order of integration of a regressor is unknown. Econometric Theory 10: 672–700. Elliott G, Rothenberg TJ, Stock JH. 1996. Efficient tests for an autoregressive unit root. Econometrica 64: 813–836. Fama EF, Bliss RR. 1987. The information in long-maturity forward rates. The American Economic Review 77: 680–692. Goetzmann W, Jorion P. 1993. Testing the predictive power of dividend yields. Journal of Finance 48: 663–679. Gürkaynak RS, Sack B, Wright JH. 2007. The U.S. Treasury yield curve: 1961 to the present. Journal of Monetary Economics 54: 2291–2304. Hansen LP, Hodrick RJ. 1980. Forward exchange rates as optimal predictors of future spot rates: an econometric analysis. Journal of Political Economy 88: 829–853. Hodrick RJ. 1992. Dividend yields and expected stock returns: alternative procedures for inference and measurement. Review of Financial Studies 5: 357–386. Kiefer NM, Vogelsang TJ. 2005. A new asymptotic theory for heteroskedasticity-autocorrelation robust tests. Econometric Theory 21: 1130–1164. Newey WK, West KD. 1987. A simple, positive definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55: 703–708. Phillips PCB. 1987. Towards a unified asymptotic theory for autoregression. Biometrika 74: 535–547. Phillips PCB, Sun Y, Jin S. 2006. Spectral density estimation and robust hypothesis testing using steep origin kernels without truncation. International Economic Review 47: 837–894. Richardson M, Stock JH. 1989. Drawing inferences from statistics based on multiyear returns. Journal of Financial Economics 25: 323–348. Rossi B. 2005. Testing long-horizon predictive ability with high persistence, and the Meese–Rogoff puzzle. International Economic Review 46: 61–92. Rossi B. 2007. Expectations hypotheses tests at long horizons. The Econometrics Journal 10: 1–26. Sack B. 2006. Comment on ‘Can central banks target bond prices?’ In Monetary Policy in an Environment of Low Inflation: Proceedings of the Bank of Korea International Conference 2006, Bank of Korea, Seoul, Korea. Stambaugh R. 1999. Predictive regressions. Journal of Financial Economics 54: 375–421. Stock JH. 1991. Confidence intervals for the largest autoregressive root in U.S. macroeconomic time series. Journal of Monetary Economics 28: 435–459. Stock JH. 1996. VAR, error correction and pretest forecasts at long horizons. Oxford Bulletin of Economics and Statistics 58: 685–701. Torous W, Valkanov R, Yan S. 2004. On Predicting stock returns with nearly integrated explanatory variables. Journal of Business 77: 937–966. Valkanov R. 2003. Long-horizon regressions: theoretical results and applications. Journal of Financial Economics 68: 201–232.

Copyright © 2011 John Wiley & Sons, Ltd.

J. Appl. Econ. 28: 353–371 (2013) DOI: 10.1002/jae

reverse regressions and longhorizon forecasting - Wiley Online Library

Nov 23, 2011 - Long-horizon predictive regressions in finance pose formidable ... methods to forecasting excess bond returns using the term structure of ...

285KB Sizes 1 Downloads 169 Views

Recommend Documents

ELTGOL - Wiley Online Library
ABSTRACT. Background and objective: Exacerbations of COPD are often characterized by increased mucus production that is difficult to treat and worsens patients' outcome. This study evaluated the efficacy of a chest physio- therapy technique (expirati

Rockets and feathers: Understanding ... - Wiley Online Library
been much progress in terms of theoretical explanations for this widespread ... explains how an asymmetric response of prices to costs can arise in highly ...

XIIntention and the Self - Wiley Online Library
May 9, 2011 - The former result is a potential basis for a Butlerian circularity objection to. Lockean theories of personal identity. The latter result undercuts a prom- inent Lockean reply to 'the thinking animal' objection which has recently suppla

Openness and Inflation - Wiley Online Library
Keywords: inflation bias, terms of trade, monopoly markups. DOES INFLATION RISE OR FALL as an economy becomes more open? One way to approach this ...

Micturition and the soul - Wiley Online Library
Page 1 ... turition to signal important messages as territorial demarcation and sexual attraction. For ... important messages such as the demarcation of territory.

competition and disclosure - Wiley Online Library
There are many laws that require sellers to disclose private information ... nutrition label. Similar legislation exists in the European Union1 and elsewhere. Prior to the introduction of these laws, labeling was voluntary. There are many other ... Ð

Openness and Inflation - Wiley Online Library
related to monopoly markups, a greater degree of openness may lead the policymaker to exploit the short-run Phillips curve more aggressively, even.

Climate change and - Wiley Online Library
Climate change has rarely been out of the public spotlight in the first decade of this century. The high-profile international meetings and controversies such as 'climategate' have highlighted the fact that it is as much a political issue as it is a

Phenotypic abnormalities: Terminology and ... - Wiley Online Library
Oxford: Oxford University Press. 1 p]. The major approach to reach this has been ... Amsterdam, The Netherlands. E-mail: [email protected]. Received 15 ...

Wealth, Population, and Inequality - Wiley Online Library
Simon Szreter. This journal is devoted to addressing the central issues of population and development, the subject ... *Review of Thomas Piketty, Capital in the Twenty-First Century. Translated by Arthur Goldhammer. .... As Piketty is well aware, wit

Inconstancy and Content - Wiley Online Library
disagreement – tell against their accounts of inconstancy and in favor of another .... and that the truth values of de re modal predications really can change as our.

Scholarship and disciplinary practices - Wiley Online Library
Introduction. Research on disciplinary practice has been growing and maturing in the social sciences in recent decades. At the same time, disciplinary and.

Anaphylaxis and cardiovascular disease - Wiley Online Library
38138, USA. E-mail: [email protected]. Cite this as: P. Lieberman, F. E. R.. Simons. Clinical & Experimental. Allergy, 2015 (45) 1288–1295. Summary.

Enlightenment, Revolution and Democracy - Wiley Online Library
Within a century such typological or static evaluation had given way to diachronic analysis in Greek thought. However, in the twentieth century this development was reversed. This reversal has affected the way we understand democracy, which tends to

poly(styrene - Wiley Online Library
Dec 27, 2007 - (4VP) but immiscible with PS4VP-30 (where the number following the hyphen refers to the percentage 4VP in the polymer) and PSMA-20 (where the number following the hyphen refers to the percentage methacrylic acid in the polymer) over th

Recurvirostra avosetta - Wiley Online Library
broodrearing capacity. Proceedings of the Royal Society B: Biological. Sciences, 263, 1719–1724. Hills, S. (1983) Incubation capacity as a limiting factor of shorebird clutch size. MS thesis, University of Washington, Seattle, Washington. Hötker,

Kitaev Transformation - Wiley Online Library
Jul 1, 2015 - Quantum chemistry is an important area of application for quantum computation. In particular, quantum algorithms applied to the electronic ...

PDF(3102K) - Wiley Online Library
Rutgers University. 1. Perceptual Knowledge. Imagine yourself sitting on your front porch, sipping your morning coffee and admiring the scene before you.

Standard PDF - Wiley Online Library
This article is protected by copyright. All rights reserved. Received Date : 05-Apr-2016. Revised Date : 03-Aug-2016. Accepted Date : 29-Aug-2016. Article type ...

Authentic inquiry - Wiley Online Library
By authentic inquiry, we mean the activities that scientists engage in while conduct- ing their research (Dunbar, 1995; Latour & Woolgar, 1986). Chinn and Malhotra present an analysis of key features of authentic inquiry, and show that most of these

TARGETED ADVERTISING - Wiley Online Library
the characteristics of subscribers and raises advertisers' willingness to ... IN THIS PAPER I INVESTIGATE WHETHER MEDIA TARGETING can raise the value of.

Verbal Report - Wiley Online Library
Nyhus, S. E. (1994). Attitudes of non-native speakers of English toward the use of verbal report to elicit their reading comprehension strategies. Unpublished Plan B Paper, Department of English as a Second Language, University of Minnesota, Minneapo