Review Session 2 Applied Macroeconometrics Do-…les for TS analysis: ARMA models. Tommaso Trani September 25, 2009
Contents 1 Do-…les: identi…cation and estimation of time series
2
2 Identi…cation and estimation 2.1 Question 7.a-7.c . . . . . . . 2.2 Question 7.d . . . . . . . . 2.3 Question 7.e . . . . . . . . .
of an AR(1) . . . . . . . . . . . . . . . . . . . . . . . .
3 3 6 7
3 Identi…cation and estimation 3.1 Question 8.a . . . . . . . . . 3.2 Question 8.b . . . . . . . . 3.3 Question 8.c . . . . . . . . . 3.4 Question 8.d . . . . . . . .
of an ARMA(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . model . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
8 8 10 11 11
1
Do-…les: identi…cation and estimation of time series
This tutorial focuses on the use of do-…les for the identi…cation and the estimation of time series processes. In Stata, a do-…le is a "program" that acts upon an "input" …le (i.e., a data …le). This program is a text …le that contains a list of Stata commands. Each of the next two sections is based on a do-…le, and each do-…le serves for the solution of a problem. The problem we will solve are problems 7. and 8. in Enders (2004), chapter 2. By solving these two problems, we will learn: 1. how to import the data (copy/paste, "insheet" & "destring") 2. how to declare that they are time series data ("tsset") 3. how to plot the data ("line") 4. how to analyze the ACF and PACF ("ac", "pac" and "corrgram") 5. how to estimate ("reg" and "arima") 6. how to compute and use the IC (using "display").
2
2 2.1
Identi…cation and estimation of an AR(1) model Question 7.a-7.c
-4
-2
Y1 0
2
4
fy1 g is a time-series of 100 observations. Its graph is the following:
0
20
40
60
80
100
OBS
The indication given by the graph is that y1 can be modeled as an ARIMA(p,q) process without any drift. In fact, there is no clear intercept, and the data does not seem to display a unique trend. At most, there is a break just before the 30th data-point. In order to identify this process - i.e., to determine the orders
3
-0.50
Autocorrelations of y1 0.00 0.50
1.00
p and q - we analyze its ACF and PACF, which are
0
5
10
15
20
25
15
20
25
Lag
-0.20
Partial autocorrelations of y1 0.00 0.20 0.40 0.60
0.80
Bartlett's formula for MA(q) 95% confidence bands
0
5
10 Lag
95% Confidence bands [se = 1/sqrt(n)]
4
The ACF is informative of the order q of the MA component of the process; in turn, the PACF is informative of its AR component. Therefore, the exponential decay of the ACF and the signi…cant spike at the …rst lag of the PACF suggest that this can be an AR(1) process, such as y1t =
1 y1t 1
+ "t
(1)
100
where f"t gt=0 is a white-noise process. However, the PACF has a signi…cant spike also at lag 12, which can be a sign of seasonality in case of monthly data. At a …rst approximation, a way to capture seasonality is to add an MA term at the seasonal frequence; in this case, it is an MA(12) term. Hence, there is an alternative model to estimate, an ARMA(1,(12)) (i.e., an AR(1) with seasonal MA(12) term): y1t = 1 y1t 1 + "t + 12 "t 12 (2) The statistical test for the correct identi…cation of an ARMA process is the Ljung-Box Q-statistic. This is a test for the joint signi…cance of all the autocorrelations up to lag s. In the case of fy1 g, we test for the joint signi…cance of the …rst 24 autocorrelation functions, so that the test can be decribed as follows: H0 HA
: :
1 = ::: = otherwise.
24
=0
Being a test for joint signi…cance, this is a 2 or an asymptotic F -test. In fact, 2 Q (s) (s). Since we have not yet estimated any model, all the autocorrelations of fy1 g are signi…cant. In fact, Q (8) = 177:58 ; Q (16) = 197:84 ; Q (24) = 201:28 (0:0000)
(0:0000)
(0:0000)
where the number in parentheses are the p-values of the corresponding statistic and lead to the rejection of the null hypotheses. Post-estimation, the Q-test is a test on the behavior of the residual series f"t g, and it is informative of the correct speci…cation of the ARMA model used. If the model is correctly speci…ed, then f^"t g is white noise. This means that the Q-statistics will not be signi…cant anymore. In fact, for the residuals of an ARMA(p,q) model Q (s) 2 (s p q 1), where 1 is to account for the presence of a drift term. The result of the estimation of the models in (1) and (2) is
1
AR(1) - reg command 0:7905
(0:0000)
AR(1) - arima command 0:7840
(0:0000)
Q (16) Q (24)
(0:0000)
0:0260
12
Q (8)
ARMA(1,(12)) 0:7874 (0:8010)
6:1387
(0:6317)
15:743
(0:4710)
21:024
(0:6373)
6:3973
(0:6028)
15:608
(0:4806)
21:757
(0:5938)
5
6:4366
(0:5984)
15:502
(0:4883)
21:609
(0:6026)
where the numbers in parentheses are the p-values of the corresponding estimate. Both models represent an acceptable ARMA speci…cation because in both cases f^"t g is white noise: the Q-statistics cannot reject the null hypothesis that the autocorrelations are jointly insigni…cant. However, there does not seem to be any seasonal e¤ects: in model (2) ^12 is not statistically signi…cant. However, the objective of identi…cation is to …nd out the best speci…cation for the data at hand. As a general rule, priority should be given to the most parsimonious models. The lower is the number of parameters to estimate, the smaller is the loss in terms of degrees of freedom. The information criteria (IC) are the statistical tools that can be used to select the best …t. The three IC presented in class are the AIC, the SBC and the HQC. These can be computed using either one of two alternative formulas: AIC SBC HQC AIC SBC HQC
= T ln SSR + 2k = T ln SSR + k ln T = T ln SSR + 2k ln ln T = 2 ln L + 2k = 2 ln L + k ln T = 2 ln L + 2k ln ln T
(3) (4) (5) (3’) (4’) (5’)
where T is the number of observations, and k is the number of parametrs to estimate. For the AR(1) and AR(1,(12)) models, these criteria are AIC SBC HQC
AR(1) - reg command 441.9368 444.5319 442.9868
AR(1) - arima command 272.7832 277.9935 274.8919
ARMA(1,(12)) 274.7442 282.5597 277.9073
Hence, all the three IC are minimized by the AR(1) model, suggesting that this is the one that …ts the observations of yt in the best way.
2.2
Question 7.d
The AR(2) model without drift is y1t =
1 y1t 1
+
2 y1t 2
+ "t
(6)
and its estimation yields OPG Std. Err.
y1
Coef.
z
P>|z|
[95% Conf. Interval]
ar L1. L2.
.7015473 .1039888
.1075333 .1025969
6.52 1.01
0.000 0.311
.4907859 -.0970974
.9123087 .305075
/sigma
.9181842
.0644428
14.25
0.000
.7918786
1.04449
ARMA
6
which is in line with the results presented by Enders (2004). In the same way as the AR(1) and the ARMA(1,(12)) models, also the AR(2) correctly identi…es the true data-generating-process (DGP). In fact, it produces residuals that are WN up to the 24th lag: Q (8) = 5:0924 ; Q (16) = 15:408 ; Q (24) = 20:96 (0:7477)
(0:4950)
(0:6411)
Anyway, the IC indicate that the inclusion of an additional auto-regressive component does not augment the explanatory power of the model w.r.t. the AR(1) speci…cation. Indeed, ^ 2 is not statistically signi…cant. AR(1) 272.7832 277.9935 274.8919
AIC SBC HQC
AR(2) 273.6943 281.5098 276.8574
Notice however that the AR(2) does better than the ARMA(1,(12)).
2.3
Question 7.e
The ARMA(1,1) without drift is y1t =
1 y1t 1
+ "t +
1 "t 1
(7)
and its estimation yields OPG Std. Err.
y1
Coef.
ar L1.
.8383654
.0665173
ma L1.
-.1462522
/sigma
.9177205
z
P>|z|
[95% Conf. Interval]
12.60
0.000
.7079939
.9687369
.1335881
-1.09
0.274
-.4080801
.1155758
.0640591
14.33
0.000
.7921671
1.043274
ARMA
These values are similar to those obtained by Enders (2004), so that we also …nd. It is seemingly true that the residuals are not serially correlated: Q (8) = 5:1551 ; Q (16) = 15:382 ; Q (24) = 20:948 (0:7409)
(0:4969)
(0:6418)
Since the added MA component does not improve the explanatory power of the model, also this alternative speci…cation does not outperform the AR(1),
AIC SBC HQC
AR(1) 272.7832 277.9935 274.8919 7
ARMA(1,1) 273.5851 281.4006 276.7482
but it does better than both the ARMA(1,(12)) and the AR(2). This result is not surprising given the fact that the AR(1) and the ARMA(1,1) are the two "workhorse" speci…cations of univariate time-series analysis.
3
Identi…cation and estimation of an ARMA(1,1) model
3.1
Question 8.a 100
-5
Y2 0
5
The plot of the fy2t gt=1 series is
0
20
40
60
80
100
OBS
The series displays about zero, crosses this level quite frequently and does not show any trend. Hence, y2t seems to be a stationary variable. The absence of an intercept - which would introduce a deterministic trend - is a further signal that the series is probably stationary.
8
-1.00
Autocorrelations of y2 -0.50 0.00
0.50
The ACF and PACF are
0
5
10
15
20
25
15
20
25
Lag
-0.80
Partial autocorrelations of y2 -0.60 -0.40 -0.20 0.00
0.20
Bartlett's formula for MA(q) 95% confidence bands
0
5
10 Lag
95% Confidence bands [se = 1/sqrt(n)]
where we have introduce 24 lags as we did in the previous section. These 9
autocorrelations are all statistically signi…cant. This is the result of the Q-test, whose statistics at lags 8, 16 and 24 are Q (8) = 188:55 ; Q (16) = 204:86 ; Q (24) = 206:37 (0:0000)
(0:0000)
(0:0000)
The numbers in parentheses are the p-values of the statistics. The ACF does not decay toward zero slowly. There is a sort of jump between j 1 j = j0:8344j and j 2 j = j0:5965j. The exponential decay seems to start rightly from the second autocorrelation; indeed, j 3 j = j0:4400j. The jump points to the presence of an MA(1) term. In addition, we expect 1 < 0 because the ACF is oscillating between positive and negative values. In turn, the PACF implies that there is, at least, an auto-regressive term of order 1. In fact, though the signi…cant spikes are at lags 1, 2, 3 and 5, there is a clear jump between lag 1 and lag 2. Notice how clearly this example shows that the identi…cation of the correct DGP is much tougher in mixed ARMA structures than in simple ones. Both the ACF and the PACF are characterized by a mixture of jumps and exponential decays.
3.2
Question 8.b
The AR(1), ARMA(1,1) and AR(2) models for fy2t g are similar to those for fy1t g, that is, equations 1, 7 and 6, respectively. The output obtained from the estimation of these models is summarized as follows:
1
AR(1) 0:8502 (0:0000)
ARMA(1,1) 0:7086 (0:0000)
(0:0000)
0:6649
1
Q (16) Q (24)
(0:0000)
0:38413
2
Q (8)
AR(2) 1:1774
(0:0000)
17:999
(0:0212)
28:089
(0:0309)
34:368
(0:0783)
2:5776
(0:9580)
12:897
(0:6803)
16:36
(0:8747)
7:0481
(0:5314)
15:238
(0:5073)
18:696
(0:7680)
Only the second and the third models are valid representations of the data. The coe¢ cients of each of the three structures are statistically signi…cant, but the estimated residuals are WN only in the case of the ARMA(1,1) and the AR(2). Between the latter two, the model that minimizes the IC is the ARMA(1,1):
AIC SBC HQC
AR(1) 336.6997 341.9100 338.8084
ARMA(1,1) 312.3036 320.1191 315.4667
AR(2) 323.0129 330.8284 326.1759
This result is a con…rmation for the analysis made at the previous point. 10
3.3
Question 8.c
A pure MA(2) model for y2t is y2t = "t +
1 "t 1
+
2 "t 2
(8)
Its estimation yields OPG Std. Err.
y2
Coef.
ma L1. L2.
-1.260144 .5517165
.0909078 .0949712
/sigma
1.219542
.0975892
z
P>|z|
[95% Conf. Interval]
-13.86 5.81
0.000 0.000
-1.43832 .3655763
-1.081968 .7378568
12.50
0.000
1.028271
1.410814
ARMA
with the usual representative Q-statistics being Q (8) = 30:094 ; Q (16) = 41:188 ; Q (24) = 43:075 (0:0002)
3.4
(0:0005)
(0:0098)
Question 8.d
Since the Q-test shows that the autocorrelations of the residuals of the MA(2) are signi…cant, the model does not constitute a good …t of the data. Consequently, it compares poorly with the ARMA(1,1).
AIC SBC HQC
MA(2) 331.286 339.1015 334.4491
ARMA(1,1) 312.3036 320.1191 315.4667
Notice that the MA(2) is outcompeted also by the AR(2) - which is the other correct speci…cation of the data - but not by the AR(1) - which is the other case in which residuals are not WN.
References [1] Enders, W. (20042 ) Applied Econometric Time-Series, John Wiley & Sons. [2] Greene, W. H. (20025 ) Econometric Analysis, Prentice Hall.
11