Online Appendix to Zipf’s Law for Chinese Cities: Rolling Sample Regressions Guohua Peng
APPENDIX : Results of Hill Estimator for the Rolling Rank Method Hill estimator is the maximum likelihood estimator under the null of Pareto distribution (Gabaix and Ioannides, 2004). For a sample of n cities with sizes S(1) · · · ≥ S(r) ≥ · · · ≥ S(n) , Hill estimator is: n−1 . (lnS − lnS ) (r) (n) r=1
β = Pn−1
And the standard error, σ, on β can be computed by the delta method: 2 0.5 Pn−1 1 r(lnS − lnS ) − (r) (r+1) β r=1 −0.5 σ = β2 (n − 1) . n−2
(1)
(2)
In fact, Hill estimator is almost the same as the MLE estimator discussed in Newman (2005). Below are the results of Hill estimator for the Rolling rank method. We use Hill estimator for the Rolling rank method with the beginning sub-sample size of 20, an arbitrary start number for each year separately, and get 644 (=663-20+1), 573, 645, 636, 631, and 635 exponent coefficients respectively. The mean value of Pareto exponent for the “full” sample of each year is roughly 0.3203, which is much smaller than the mean OLS estimate of 0.8354 for the “Full” sample of each year. Table A1 below reports the components of the results. The last column of Table A1 clearly reveals the result for the simulation size series, which is an exact rank size rule for 654 cities with the largest city being just the primacy of 2004 as described in the end of section 2.2 of the paper. Surprisingly, the values of Hill estimate of Pareto exponent are significantly different from one for every sub-sample size, although they are so close to 1. This result is contrary to that of OLS. Figure A1 illustrates the relationship between the Pareto exponent with 95% confidence interval and truncation point from the 20th largest city to the smallest city for each year. The 1
figure suggests that the Pareto exponent is also almost negatively related to the sub-sample size. We also perform a nonparametric analysis for the distribution of the Hill estimate of Pareto exponent for the Rolling rank method to show a complete description of how the values of the Pareto exponent are distributed, as does Soo (2005). We construct the efficient Epanechnikov kernel function using the optimal window width for the Pareto exponent. Figure A2 demonstrates the kernel function of the Hill estimate of Pareto exponent for the Rolling rank method. In sum, the results of Hill estimator are similar to those of OLS. However, the deviations of the Hill estimator are much larger than those of OLS. References Gabaix, X. and Y. Ioannides. 2004. “The evolution of the city size distributions”, In Handbook of Regional and Urban Economics, eds. V. Henderson, J.F. Thisse, 4:2341-78. Oxford: Elsevier Newman, M.E.J. 2005. “Power laws, Pareto distributions and Zipf’s law”, Contemporary Physics, 46(5): 323-351 Soo, K.T. 2005. “Zipf’s law for cities: a cross country investigation”, Regional Science and Urban Economics, 35:239-63
2
Table A1 Components of the results of Hill estimator for the Rolling rank method n
1999 ***
2000 1.6584
***
2001
2002 1.7100
***
2003 1.6727
***
2004 1.6607
***
Simulation 1.0230***
100
1.6439
(0.1626)
(0.1863)
(0.1677)
(0.1647)
(0.1731)
(0.1850)
(0.0042)
200
1.2231***
1.3331***
1.2597***
1.2813***
1.2544***
1.2694***
1.0131***
(0.0902)
(0.1010)
(0.0983)
(0.1070)
(0.1045)
(0.0973)
(0.0022)
*
***
0.9180
0.9412
0.9152
1.0198
1.0094***
0.8148
1.6538
***
300
0.8956
(0.0598)
(0.0604)
(0.0604)
(0.0647)
(0.0619)
(0.0657)
(0.0015)
400
0.8666***
0.7838***
0.8121***
0.7854***
0.8141***
0.8343***
1.0074***
(0.0506)
(0.0476)
(0.0499)
(0.0461)
(0.0470)
(0.0471)
(0.0011)
***
***
***
***
***
***
1.0061***
0.7582
0.7800
0.7453
0.7587
0.7825
500
0.8393
(0.0477)
(0.0393)
(0.0437)
(0.0407)
(0.0401)
(0.0410)
(0.0009)
Full
0.3601***
0.3411***
0.3410***
0.3014***
0.2851***
0.3332***
1.0049***
(0.0666)
(0.0560)
(0.0542)
(0.0485)
(0.0547)
(0.0581)
(0.0007)
1.0664
1.0983
1.0571
1.0586
1.0569
1.0774
1.0131
Average
Notes: Numbers in ( ) are standard errors computed from equation (3). n is the sub-sample size of regression, and the “Full” sample size is 663 for the year of 1999, 592 for 2000, 664 for 2001, 655 for 2002, 650 for 2003, and 654 for 2004. “Average” is the average value for all estimate exponents in each year. “Simulation” is a simulation of an exact rank size rule for 654 cities with the largest city being just the primacy of 2004. *** significant at 1%, ** significant at 5%, * significant at 10%, significantly from one for β. The mean Hill estimate for the “Full” sample of each year is 0.3203, which is much smaller than the mean OLS estimate of 0.8354 for the “Full” sample of each year.
3
2.5 2.0 1.5 1.0 0.5 0.0
3.0 Hill estimates of Pareto exponent
3.0 Hill estimates of Pareto exponent
Hill estimates of Pareto exponent
3.0
2.5 2.0 1.5 1.0 0.5 0.0
0
100
200
300
400
500
600
700
200
300
400
500
2.4 2.0 1.6 1.2 0.8 0.4
400
500
600
0.5
0
700
Truncation point of Rolling rank (2002)
100
200
300
400
500
600
700
Truncation point of Rolling rank (2001)
2.4
2.0
1.5
1.0
0.5
0.0
0.0 300
1.0
600
Hill estimates of Pareto exponent
Hill estimates of Pareto exponent
Hill estimates of Pareto exponent
100
2.5
200
1.5
Truncation point of Rolling rank (2000)
2.8
100
2.0
0.0 0
Truncation point of Rolling rank (1999)
0
2.5
2.0 1.6 1.2 0.8 0.4 0.0
0
100
200
300
400
500
600
Truncation point of Rolling rank (2003)
700
0
100
200
300
400
500
600
700
Truncation point of Rolling rank (2004)
Hill estimates of Pareto exponent
1.12 1.10 1.08 1.06 1.04 1.02 1.00 0
100
200
300
400
500
600
700
Truncation point of Rolling rank (Simulation)
Fig. A1. Values of Hill estimate of Pareto Exponent with 95% Confidence Interval for the Rolling rank method.
4
2.4
Kernel Density (Epanechnikov, h = 0.1844) 0.851
1.6
Kernel Density (Epanechnikov, h = 0.2569) 0.769
2.0
2.0
Kernel Density (Epanechnikov, h = 0.2243) 0.799
1.6
1.2
1.2
Density
Density
Density
1.6 0.8
1.2
0.8
0.8 0.4
0.4
0.4 0.0
0.0 0.5
1.0
1.5
2.0
Hill estimate of Pareto exponent for Rolling rank (1999)
1.2
1.6
2.0
2.4
0.5
Hill estimate of Pareto exponent for Rolling rank (2000)
Kernel Density (Epanechnikov, h = 0.2397) 0.772
2.0
Density
0.8
0.4
Kernel Density (Epanechnikov, h = 0.2212) 0.798
1.2
0.8
1.0
1.5
2.0
Hill estimate of Pareto exponent for Rolling rank (2002)
2.0
1.6
Kernel Density (Epanechnikov, h = 0.2154) 0.803
0.8
0.4
0.0
0.0 0.5
1.5
1.2
0.4
0.0
1.0
Hill estimate of Pareto exponent for Rolling rank (2001)
1.6
1.2 Density
0.8
Density
1.6
0.0 0.4
0.4
0.8
1.2
1.6
2.0
Hill estimate of Pareto exponent for Rolling rank (2003)
0.4
0.8
1.2
1.6
2.0
Hill estimate of Pareto exponent for Rolling rank (2004)
Kernel Density (Epanechnikov, h = 0.0034) 120 1.007 100
Density
80 60 40 20 0 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 Hill estimate of Pareto exponent for Rolling rank (Simulation)
Fig. A2. Kernel Density Function for the Pareto Exponent using the Hill estimator for the Rolling rank method.
5