The return from Collectibles: An unbiased Index ...

Viewer
Transcript

The return from Collectibles: An unbiased Index (Variable selection and hedonic art indexes) R.Zanola♠

ABSTRACT The hedonic-based regression approach has been utilised extensively to investigate the relationship between collectible prices and their characteristics. However, there is concern that variable selection could undermines the method’s ability to accurately predict economic values due to multicollinearity. In this paper such a problem is analyzed from a totally innovative perspective, using a non-parametric approach, the so-called classification and regression tree approach, to select few relevant characteristics determining the hammer prices of collectives. Using Picasso paintings sold worldwide this paper show the usefulness of this technique.

[DRAFT VERSION DO NOT QUOTE]

Key words: CART; smearing factor; Picasso; hedonic. JEL classification: C6, D2, Z1.

♠

Department of Public Policy and Public Choice, University of Eastern Piedmont, Corso Borsalino 83, 15100 Alessandria, Italy, Rimini Centre for Economic Analysis, Rimini, Italy, e-mail: [email protected]

1. Introduction The hedonic-based regression approach has been extensively utilised to estimate art price indexes aiming at gaining a better understanding of the art market. The hedonic method recognizes that collectibles are composite items: although characteristics are not sold separately, regressing the hammer price of art items on their properties yields the implicit prices of attributes which are supposed to reflect consumers’ marginal rates of substitution among attributes (Ginsburgh and Throsby, 2006).. However, to obtain unbiased estimates of art price indexes, the regression should be specified correctly with respect to the independent variables. Unfortunately, differently from the case where the choice of the covariates is derived from economic theory, in the hedonic price theory there are no theoretical arguments in favour of a specific set of independent variables (Anderson, 2000). In the traditional hedonic regression models the decision of what features to include is made by individuals and, in principle, all characteristics relevant to the determination of market price should be included (Butler, 1982). Although there is no established rule against the use of all variables in hedonic analysis, the inclusion of numerous independent variables in the model bring about concerns regarding multicollinearity (Leggett and Bockstael, 2000) and overfitting (Greene, 2003). Moreover, as noted by Shiller (2008), if there are too many possible hedonic variables that might be included “…one could strategically vary the list of included variables until one found the results one wanted”. To address the potential bias due to variable selection this paper explores the application of the so-called classification and regression tree approach (CART). This nonparametric technique calls for the selection of those variables and their interactions which are most effective in determining an outcome or dependent variable. Although such an approach have rapidly developed in the empirical analysis over the past two decades, however no application exists within the art market. Compared to traditional approach to price determinants, several advantages of CART are remarkable (Berry and Linoff, 1997; Murthy, 1998; Ganz-Zhi et al. 2009). First, CART handles numerical data that are highly skewed or multi-modal, as well as categorical predictors with either ordinal or non-ordinal structure, and normality or other assumptions concerning the distribution of the data are not necessary. Secondly, CART results are easily interpreted and allow to identify major variables that partition target data in different classes. Finally, since CART is powerful in identifying the most significant independent variables in predicting the target variable it can supplement and complement traditional regression methods. This last characteristic will be used to select few relevant characteristics determining the hammer prices of collectives which 2

Taking advantage of this last characteristic, a sample of Picasso paintings sold worldwide has been used to identify the relevant characteristics determining the hammer prices of collectives to be used for the calculation of a quality-adjusted price index. The remainder of this paper has the following structure. In Section 2 method is discussed in more details. Section 3 describes the data used in the paper. Results are shown in Section 4. Conclusions and directions for further research are discussed in Section 5. 2. Method The literature on hedonic price techniques emphasizes the issue of splitting out pure price change from quality change. Basically, the quality of a painting is defined in terms of its characteristics, and for the dummy variable price index, the regression of prices on these characteristics allows one to hold quality constant and, thus, construct constant-quality price measures (Rosen, 1974). A set of quality characteristics z it,k = 1,..., K , is identified for a regression of the log price of

painting i, with i = 1,..., N , sold at time t, with t = 1,...,T on its k-characteristics and a set of dummy variable d it , which are equal to 1 in period t and zero otherwise, such that:

K

T

k =1

t =2

pit = α 0 + ∑ β k z it,k ∑ δ τ d it + ε it

(1)

The parameter β it can be interpreted as (implicit) prices of the various characteristics describing the painting, α 0 is the intercept, and ε it is a random error term. The estimated time dummy coefficient can interpreted as a measure of the amount of price that is not attributable to the identified characteristics to be uses as an estimate of the pure price change. Once the implicit prices of the various characteristics are obtained by a regression of the prices on observable characteristics, it is possible to directly calculate the quality-adjusted price index by taking the exponential of the time-dummy coefficients. In particular the adjacent qualityadjusted price index of a painting between period t and s, PI t ,s , for any given set of characteristics, z, is equal to (Triplett, 2006):

3

PI

t ,s

=

E  pit | z , d  E  pis | z , d 

K   exp  αˆ + ∑ βˆk zi ,k + δˆt  E exp ε t | z, d  ( i)  k =1    = K s     exp  αˆ + ∑ βˆk zi ,k + δˆ s  E exp ( ε i ) | z , d  k =1   t t exp δˆ E exp ( ε i ) | z, d  = exp δˆ s E exp ( ε is ) | z, d 

(2)

( ) ( )

Whenever a change in quality occurs it is taken care of by the associated characteristics, and the quality-adjusted price change will be captured by the product of the exponential of the regression coefficient of the time dummy variable and the conditional expectation of the exponential of the unobserved error term. In order to correct for this retransformation bias, Duan’s (1983) nonparametric smearing estimator can be applied, this estimates the conditional expectation of exp(ε it ) by its sample mean. However, as the error on the log-scale of art prices is expected to be heteroskedastic, a variant of the Duan’s (1983) nonparametric smearing estimator has been used (Jones and Zanola, 2011). This calculates a separate smearing factor for each year, such that:

PI

t ,s

ϕt = s exp δˆ t − δˆ s ϕ

(

)

where

ϕt =

1 Nt

(3) K   1 t ˆ exp log( ) p α βˆk zi ,k − δˆ t  = t − − ∑ ∑ i  i =1 k =1   N N

N

∑ exp (εˆ ) i =1

t i

Any procedure to estimate equation (1) must comprise an estimator of a hedonic quality of the painting, usually a vector of arithmetic means. However, even without problems about availability and poor quality of data, hedonic price indexes are subject to problems related to multicollinearity of exogenous variables due to correlation between independent variables. In what follows the problem of variable selection is analyzed by a totally different perspective, selecting the few relevant characteristics determining the hammer prices of paintings. Using a nonparametric technique, the so-called Classification And Regression Tree (CART) approach (Breiman et al., 1984), we select the variables and their interactions that are most relevant in determining the dependent variable. In fact, in contrast with regression approach, CART selects 4

those variables and their interactions that are most important in determining the dependent variable. Beginning with the root node, which includes all the observations, CART algorithm finds the best possible variable to split the node into two child nodes. Starting from the root node, CART checks all possible splitting variables among the control variables following a splitting rule which maximizes the average ‘purity’ of the two child nodes. The process continues until the tree is completed. At the end of the process the initial sample has been splitted in several sub-groups whose members share common characteristics that influence the dependent variable of interest. 3. Data

The dataset consists of 716 Picasso paintings sold at auction worldwide during the period 19882005. The data set is collected from the 2006 edition of the Art Price Index on CD-Rom. It contains records of paintings sold at the world’s major auctions, and provides information on a number of variables: artist’s name, nationality, title of the work, year of production, materials used, date and city of sale, auction price, pre-sale estimate (when available), dimensions, signature, and a number of further information that might vary from case to case. No information is provided on the provenance and the previous exhibitions of the items. Prices are gross of the buyers and sellers’ transaction fees paid to auction houses and are expressed in USD, deflated using US CPI prices (2000=100). The dependent variable is price, which is logged to deal with heteroschedasticity. The explanatory variables included in the study are dimension, style, media, salesroom and year of sale. More precisely, the variables included in the regression are: o Dimension: the painting surface in meters, area_m, and the squared surface, area_m2,

in squared meters. o Media: a set of dummy variables, reflecting the technique adopted, is used: oil on

canvas, oil_canvas; oil on panel, oil_panel; mixed technique, mixed; and all other media, other_tech (excluded variable). o Style: different style periods are identified [Czujack, 1997]; Childhood and Youth

(1881-1901), @_1881; Blue and Rose Period (1902-1906), @_1902; Analytical and Synthetic Cubism (1907-1915), @_1907; Camera and Classicism (1916-1924),

@_1916; Juggler of the Form (1925-1936), @_1925; Guernica and the ‘Style Picasso’ (1937-1943), @_1937; Politics and Art (1944-1953), @_1944; and The Old Picasso (1954-1973), @_1954 (excluded variable). 5

o Salerooms: Sotheby’s and Christie’s are known to be the leading auction houses in this

kind of transaction while the most important art auction markets are in New York and London. We consider therefore some interaction dummies between salerooms and cities: chr_ny, for Christie’s New York; chr_lon, for Christie’s London; sot_ny, for Sotheby’s New York; sot_lon, for Sotheby’s London; other_auc for all other locations (excluded variable). o Year of sale: a set of yearly dummy variables, dt, with t = 1988,.., 2005, are introduced

for each year between 1988 and 2005 (1988 baseline variable). Table 1 reports the main descriptive statistics of the data set described above. [TABLE 1 ABOUT HERE]

4. Results

The SPSS software was used to check all possible splitting variables among the control variables following a splitting rule which maximizes the average ‘purity’ of the two child nodes. We imposed a condition that all terminal nodes had to contain a minimum of 20 subjects. The sample was divided in five sub samples in order to help ensure the fit of the constructed tree and its ability to generalise. Figure 1 displays the results of a CART analysis. [FIGURE 1 ABOUT HERE] The total number of nodes generated is 16 (also including the root node). The tree can be easily comprehended. The first split of node occurs with respect to the area variable. When the area is less than 0.248 then it is classified to the left side (node 1) and if more than 0.248 then it goes to the right (node 2). As the next step the node 1 was split with respect to the area again. The process is continued to reach the final leaf node. The average log value of the hammer price is a leaf node represents the forecasting value or regression value of the hammer price of Picasso paintings of a certain category. For instance, for node 4, the average log hammer price of 12.587 indicates the predicted price for the Picasso painting whose area is between 0.030 and 0.248 meter. The relative importance plot visualizes how important the various independent variables are relative to one another in predicting the dependent variable. A tabular plot of the variables in their descending order of influence and importance is shown in Figure 2. 6

[FIGURE 2 ABOUT HERE] It is clearly shown that the surface, both in meters and squared meters, is the most important variable for determining hammer price of Picasso paintings. The second highly important factor is the technique. The style is a factor which has less influence, while the auction house where the painting is sold is roughly insignificant, with the only exception of all auction houses different from Sotheby’s and Christies’s. The full set of available characteristics as well as the significant variables selected by using the CART approach are then used to calculate equation (1). Table 2 displays the main results. [TABLE 2 ABOUT HERE] Dummy variable coefficients are then used to build the quality-adjusted as in equation (3). The first column of Table 3 displays the Duan’s smearing factor used to calculate the smearing price index and the CART smearing price index as depicted in Figure 3, where the standard hedonic price index is also reported as benchmark. [TABLE 3 ABOUT HERE] [ FIGURE 3 ABOUT HERE] The calculation of price indexes estimated using the CART approach is straightforward. The standard regression model is expected to be much more susceptible to problems related to high multicollinearity of the exogenous variables, an issue when the estimated models are used for extrapolating price predictions. As a result, it could be observed that the smearing price index yields highly (unrealistic) price predictions compared to the CART smearing price index. We are now in the position to evaluate the CART smearing price index. A common way of comparing the goodness of fit of two or more econometric models is to compare the estimated standard deviation of the disturbance term. This is a direct measure of the degree of variation in the dependent variable that is not explained by the econometric model1 The estimated standard deviation of both models displays a similar value (0.051 and 0.053 respectively). Hence, it is useful to introduce a second measure to compare the models is the width of the confidence 1

Since it is not possible to empirically distinguish between pure random noise and an error component due to the inadequate theory, imperfect data, or other practical limitations, the estimated standard deviation of the disturbance term can be thought as a measure of the combined "unexplained" variation of both components [Case and Szymanoski, 1995]. 7

interval around the predicted price of an average painting. It reflects the precision with which the individual parameters of the model are estimated using a specific model. Again, the width of a 95 percent confidence interval estimated around the predicted price shows analogous results for both the models. Finally, a third measure used to assess the relative precision of both models is the correlation between the actual and predicted values of all prints included in the dataset. It provides a direct measure of the reliability with which the price of each print can be predicted from the econometric model (Case and Szymanoski, 1995). The CART smearing model displays a lower value (0.76) than the smearing price index (0.80), but strictly close to it. In summary, these results reveal how the CART smearing price index performs analogously to the smearing price index, but, at the same time, controlling for the variable selection bias. 5. Conclusion

This study focuses on the problem of variable selection by applying the classification and regression tree technique to a sample of Picasso paintings sold at auction worldwide during the period 1988-2005. The method proposed here allow to select those variables and their interactions which are most effective in determining the dependent variable. According to empirical evidence shown in this paper, controlling for variable selection drives to lower (and realistic) price predictions compared to the smearing price index. This study provides important implications for investments in the art market. However, a number of issues remain for further research. In particular, a boosting variant of the CART method will be explored to introduce a more flexible technique to select variables.

8

References

Anderson, D.E. (2000), Hypothesis testing in hedonic price estimation – On the selection of independent variables, Ann Reg Sci, 34, 293-304. Berry, M.J.A. and Linoff, G. (1997), Data Mining Techniques: For Marketing, Sales, and

Customer Support, New York: John Wiley & Sons. Breitman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984), Classification and regression

Trees, Belmont-CA: Wadsworth. Butler, R.V. (1982), The Specification of Hedonic Indexes for Urban Housing, Land

Economics, 58(1), 86-108. Case and Szymanoski, 1995 Ginsburgh, V. and Throsby, D. (eds.), 2006, Handbook of the Economics of Art and Culture, North-Holland, Amsterdam. Triplett, J., 2006, Handbook on Hedonic Indexes and Quality Adjustments in Price Indexes, OECD. Czujack C. (1997), Picasso Paintings at Auction, (1963-1994), Journal of Cultural Economics, 21, 3, 229-247.Davis (2004) Greene, W.H. (2003), Econometric Analysis, Upper Saddle River, N.J.: Prentice Hall. Greenstone and Gallagher (2008) Jones, A,M. and Zanola, R. (2011), Retransformation bias in the adjacent art price index, mimeo. Leggett, C.G. and N.E.Bockstael (2000), Evidence of the Effects of Water Quality on Residential Land Prices, Journal of Environmental Economics and Management, 39(2), 121144. Murthy, S.K. (1998), Automatic construction of decision trees from data: a multi-disciplinary survey, Data Mining and Knowledge Discovery, 2, 345-389.

9

TABLE 1. Descriptive statistics TABLE 1. Descriptive statistics price area_m area_m2 oil_canvas oil_panel mixed other_tech chr_lon chr_ny sot_lon sot_ny other_auc @_1881 @_1902 @_1907 @_1916 @_1925 @_1937 @_1944 @_1954 d88 d89 d90 d91 d92 d93 d94 d95 d96 d97 d98 d99 d00 d01 d02 d03 d04 d05

Mean 2,347,754 5,170.973 5.45e+07 .7263 .0614 .0503 .1844 .1564 .2654 .1480 .2877 .1256 .0566 .0189 .0610 .0972 .1074 .1495 .1466 .3628 .0140 .0475 .0670 .0182 .0335 .0517 .0461 .0642 .0517 .0768 .1271 .0824 .0475 .0503 .0531 .0349 .0601 .0377

Std. Dev. 6,469,125 5,274.767 1.11e+08 .4462 .2403 .2187 .3880 .3635 .4418 .3554 .4530 .3316 .2312 .1362 .2394 .2965 .3098 .3568 .3539 .4812 .1174 .2128 .2503 .1336 .1801 .2215 .2098 .2454 .2215 .2665 .3333 .2752 .2128 .2187 .2243 .1837 .2378 .1906

Description Price of painting Area Squared area Oil on canvas Oil on panel Mixed technique Other techinques (omitted category) Sold at Christie’s London Sold at Christie’s New York Sold at Sotheby’s London Sold at Sotheby’s New York Sold at other auction houses (omitted category) Childhood and Youth (1881-1901) Blue and Rose Period (1902-1906) Analytical and Synthetic Cubism (1907-1915) Camera and Classicism (1916-1924) Juggler of the Form (1925-1936) Guernica and the ‘Style Picasso’ (1937-1943) Politics and Art (1944-1953) The Old Picasso (1954-1973) (omitted category) 1988 dummy 1989 dummy 1990 dummy 1991 dummy 1992 dummy 1993 dummy 1994 dummy 1995 dummy 1996 dummy 1997 dummy 1998 dummy 1999 dummy 2000 dummy 2001 dummy 2002 dummy 2003 dummy 2004 dummy 2005 dummy

10

FIGURE 1. Decision tree results

11

FIGURE 2. Relative importance plot

12

TABLE 2. Smearing price index and CART smearing price index results CART smearing price index

Smearing price index Coef.

Robust Std. Err.

Coef.

Robust Std. Err.

area_m

0.0004***

0.0000

.0003***

.0000

area_m2

-1.13e-08***

1.02e-09

-9.88e-09***

9.99e-10

oil_canvas

0.3853***

0.1161

.1597

.1988

oil_panel

-0.2831

0.2208

--

--

-1.0101***

0.2855

-1.0078***

.3193

-.2492

.2269

Physical characteristics

mixed other_tech

excluded variable

Style characteristics @_1881

1.6199***

0.2042

--

--

@_1902

2.2112***

0.3544

1.3512***

.3657

@_1907

1.8137***

0.1816

--

--

@_1916

0.9535***

0.1180

--

--

@_1925

1.2625***

0.1207

.3907***

.1276

@_1937

0.9340***

0.1232

--

--

@_1944

0.2699***

0.1090

--

--

@_1954

excluded variable

-.3964***

.1276

Sale characteristics chr_lon

0.3823***

0.1406

--

--

chr_ny

0.4037***

0.1401

--

--

sot_lon

0.3231**

0.1455

--

--

sot_ny

0.5995***

0.1371

--

--

Year dummies cons R-squared

[incl.] 10.1389*** 0.64

Year dummies

[incl.]

0.2154 0.58

Note: ***, **, * significance at .01, .05, and .10 respectively

13

TABLE 2. Full price and financial price index Year

Duan's smearing factor

Full smearing price index

CART smearing price index

d88

1.951

100.00

100.00

d89

1.196

109.48

86.96

d90

1.437

144.60

109.95

d91

1.083

50.52

40.51

d92

1.371

49.41

40.00

d93

1.315

30.49

27.28

d94

2.480

101.34

56.29

d95

1.558

32.98

31.57

d96

1.506

43.71

29.80

d97

1.572

80.66

67.98

d98

1.471

44.17

24.75

d99

1.514

72.14

52.31

d00

1.650

90.63

63.80

d01

1.416

58.05

46.17

d02

1.621

62.33

39.69

d03

1.213

53.38

45.71

d04

1.261

108.71

80.87

d05

1.343

126.16

76.57

dummy

14

FIGURE 3. The aesthetic return from Picasso paintings

200,00 Hedonic price index

180,00

Smearing price index

160,00

CART smearing price index

140,00 120,00 100,00 80,00 60,00 40,00 20,00 0,00 d88 d89

d90 d91 d92 d93 d94

d95 d96 d97 d98

d99 d00 d01 d02 d03

d04 d05

16

$pdf-1374\collectibles-market-guide-price-index-from-krause ...$