Theory Based Estimation: A Primer Methods and Examples

George-Levi Gayle Department of Economics, Washington Unversity in St. Louis and The Federal Reserve Bank of St. Louis

August 2015

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

1 / 48

Why Theory Based Estimation?

Observing reality is especially valuable. However, without models, every situation at every time on every variable would be unpredictable. Assumptions allow models and theories to assert constancy. Assumptions distill and simplify reality by dismissing the conspicuous but irrelevant. Abstraction is the precise virtue of an assumption.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

2 / 48

Why Theory Based Estimation?

Assumptions are realistic when they produce good theories, satisfactory predictions, valuable implications, and correct recommendations. Output matters far more than input. Realism is only an issue when creatively diagnosing poorly performing models, not when judging model performance. Assumptions are the source of value in empirical analyses.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

3 / 48

Why Theory Based Estimation?

If data sets were truly the source of value, empirical research studies would only greatly devalue the raw data by dramatically reducing rich observations to a few meager summary statistics or estimated parameters. There are scientific methods for evaluating model output (i.e., predictions, findings, implications, recommendation) on criteria such as accuracy, reliability, validity, robustness, and so on.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

4 / 48

Testing

Empirically there are three main of ways evaluating the output of a model: 1

Testing a major prediction of the model while leaving unspecified the main structure of the model Positive correlation test:

1

In insurance markets – Chiappori and Salanie, 2000, 2003; Cardon and Hendel, 2001; Cohen, 2005; among others. 2 In performance pay setting – Jensen and Murphy, 1990; Hall and Liebmain 1998; Aggarwal and Samwick 1999; among others. 1

2 3

Specifying the structure and deriving the overidentifying restrictions. Specifying the structure and performing out of sample validation.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

5 / 48

Beyond Testing

A more structured approach is needed to quantify: 1 2 3

Welfare E¢ciency The potential impact of policy reforms

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

6 / 48

Foundation of Theory Based Empirical Research

Identification and empirical content are fundamental to theory based empirical research irrespective of the method used to evaluate the output of theoretical models. Identification presumes the probability distribution defining the population for the data comes directly from an unknown model belonging to a known class of models, and determines how many models within that class generate the same probability distribution. Empirical content determines whether the class of models imposes any restrictions that can be falsified by probability distributions generating the data.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

7 / 48

Structure and identification The first focus of econometrics: Lehfeldt (1914) and Lenoir (1913), Working (1925, 1927), Wright (1928), Tinbergen (1930), Frisch (1934, 1938), Hurwicz (1950), Koopmans and Reiersøl (1950), Koopmans, Rubin and Leipnik (1950). The identification question: when can data be informative about features of an economic process? Only when there are restrictions on admissible processes - these are econometric models. What fundamental elements of econometric models have identifying power? The answer indicates: what economic theory must deliver, what must be taken on trust, the opportunities for detecting model failure.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

8 / 48

Plan of this section of the lecture

1

Introduction, concepts, parametric identification. 1 2

3 2

The nature of the identification problem. The construction of Leonid Hurwicz and a method for determining the identifying power of a model (Empirical Content). Identification with parametric restrictions.

Nonparametric identification.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

9 / 48

Returns to schooling W : log wage,S: years of schooling, X : a characteristic, #1 and #2 : unobserved.

= qS + bX + #1 S = gX + #2

W

substituting for S:

= ( b + qg)X + U1 S = gX + U2

W

Haavelmo (1944) - the probability approach to econometrics - data are informative about FWS |X . When is this informative about values of parameters? Two issues: If we knew many values of W , S, X , and also the values of U1 and U2 then we could know g but not q and b. Unless there is a restriction, e.g. b = 0. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

10 / 48

Returns to schooling

Two issues: If we knew many values of W , S, X , and also the values of U1 and U2 then we could know g but not q and b. Unless there is a restriction, e.g. b = 0. We never know U1 and U2 . So we need a restriction on FU |X . Restrictions on structural equations and on distributions of unobservables... the Hurwicz construction.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

11 / 48

Identification of structures

Regard outcomes (Y = {W , S } say) as realisations of random variables with a probability distribution, FY |X conditional on some X . A Hurwicz (1950) structure consists of

equations, determining Y given X and # a probability distribution for # given X , F#|X .

Consider a structure, S0 , which implies a distribution FYS0|X .

S0

There may exist observationally equivalent S00 with FYS0|X = FY 0|X . Then knowledge of FY |X does not allow us to identify S0

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

12 / 48

Identifying power of models

Suppose restrictions, embodied in a model, imply admissible structures S 2 M.

S0

If for a structure S0 2 M and any admissible S00 2 M, FSY0|X 6= FY 0|X then the model identifies S0 . Consider a feature of a structure, R(S ).

If R(S ) is identical for all observationally equivalent S 2 M then the model identifies R(S ). If there exists a functional G (·) such that for all S 2 M and R(S ) = r  =) G (FYS |X ) = r  then the model identifies r  .

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

13 / 48

Proof of the proposition Proof. Suppose there exists a functional G (FY |X ) such that for all S 2 M and R(S ) = r  =) G (FYS |X ) = r  0

Let S and S0 be observationally equivalent with R(S ) = r and R(S00 ) = r0 . S0

Then G (FYS |X ) = r and G (FY 0|X ) = r0 .

0 Since FSY |X = FS0Y |X it follows that r = r0 . The existence of the functional ensures that R(S ) cannot vary across observationally equivalent structures admitted by M. QED

Note - analog estimation follows using G (FbY |X ). Properties of estimators depend on Fb and G.

Overidentification if more than one functional. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

14 / 48

Example: consider the model W : log wage, S: years of schooling, X : a characteristic, #1 and #2 : unobserved. A : W = a + bS + #1 There exists X such that: : E [ #1 |X = x ] = c

B

: E [S |X = x ] exists

C There is

E [W |x1 ] = a + bE [S |x1 ] + c

E [W |x2 ] = a + bE [S |x2 ] + c and if E [S |x1 ] 6= E [S |x2 ]. b=

E [W |x1 ]  E [W |x2 ] E [S |x1 ]  E [S |x2 ]

The Wald estimator is the analogue estimator. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

15 / 48

Set identification

Let R(S0 ) be a feature of a structure, S0 .

Let A0 be the set of structures admitted by a model and observationally equivalent to S0 . If for all S 2 A0 , R(S0 ) 2 R0 then the model set identifies R(S0 ). When R0 is a singleton there is point identification.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

16 / 48

Example: Static Model of Pure Moral Hazard Framework

A risk neutral principal proposes a compensation plan to a risk averse agent, an explicit contract or an implicit agreement, which depends on the future realization of gross revenue to the principal. The agent accepts or rejects the principal’s (implicit) o§er. If he rejects the o§er he receives a fixed utility from an outside option. If he accepts the o§er, the agent chooses between pursuing the principal’s objectives of value maximization (working diligently), versus following objectives he would pursue if he was paid a fixed wage (shirking). The principal observes whether the o§er is accepted, but not the agent’s work routine. After revenue is realized, the agent receives compensation according to the explicit contract or implicit agreement, and the principal pockets the remainder as profit. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

17 / 48

Static Model of Pure Moral Hazard Choices of Agent

Denote the workplace employment decision of the agent by an indicator l0 2 {0, 1}, where l0 = 1 means the agent rejects the principal’s o§er. Denote the e§ort level choices by lj 2 {0, 1} for j 2 {1, 2} , where diligence work is defined by setting l2 = 1, and shirking is defined by setting l1 = 1. Since taking the outside option, working diligently and shirking are mutually exclusive activities, l0 + l1 + l2 = 1.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

18 / 48

Static Model of Pure Moral Hazard Revenue and Profits of Principal

Gross revenue to the principal is denoted by x, a random variable drawn from a probability distribution that is determined by the agent’s work routine. After x is revealed the both the principal and the agent at the end of the period, the agent receives compensation according to the contract or implicit agreement. To reflect its potential dependence on (or measurability with respect to) x, we denote compensation by w (x ) . The principal’s profit is revenue less compensation, x  w (x ).

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

19 / 48

Static Model of Pure Moral Hazard Marginal Product of Agent

Denote by f (x ) the probability density function for revenue conditional on the agent working diligently, and let f (x ) g (x ) denote the probability density function for revenue when the agent shirks. We assume: E [xg (x )] 

Z

xf (x ) g (x ) dx <

Z

xf (x ) dx  E [x ]

The inequality reflects the preference of principal for diligent work over shirking. Since f (x ) and f (x ) g (x ) are densities, g (x ) , the ratio of the two densities, is a likelihood ratio. That is g (x ) is nonnegative for all x and: E [g (x )]  G.-L. Gayle (Wash. U. and St. Louis Fed)

Z

g (x ) f (x ) dx = 1

Theory Based Estimation

08/15

20 / 48

Static Model of Pure Moral Hazard Regularity Condition

We assume there is an upper range of revenue that might be achieved with diligence, but is extremely unlikely to occur if the agent shirks. Formally: lim [g (x )] = 0

x !•

Intuitively this assumption states that a truly extraordinary performance by the principal can only be attained if the agent works hard. We assume that g (x ) is bounded, an assumption that rules out the possibility of setting a contract that is arbitrarily close to the first best resource allocation, first noted by Mirrlees (1975), by severely punishing the agent when g (x ) takes an extremely high value.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

21 / 48

Static Model of Pure Moral Hazard Preferences of Agent

We assume the agent is an expected utility maximizer and utility is exponential in compensation, taking the form: h i h i l0  l1 a1 E e gw (x ) g (x )  l2 a2 E e gw (x )

where without further loss of generality we normalize the utility of the outside option to negative one. Thus g is the coe¢cient of absolute risk aversion, and aj is a utility parameter with consumption equivalent g1 log (aj ) that measures the distaste from working at level j 2 {1, 2}. We assume a2 > a1 meaning that shirking gives more utility to the agent, than being diligent. A conflict of interest arises between the principal and the agent because he prefers shirking, meaning a1 < a2 , yet the principal prefers diligence since E [xg (x )] < E [x ] . G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

22 / 48

Solving the Pure Moral Hazard Model Participation Constraint

To induce the agent to accept the principal’s o§er and engage in his preferred activity, shirking, it su¢ces to propose a contract that gives the agent an expected utility of at least minus one. In this case we require w (x ) to satisfy the inequality: h i a1 E e gw (x ) g (x )  1

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

23 / 48

Solving the Pure Moral Hazard Model Participation and Incentive Compatibility Constraints

To elicit diligent work from the agent, the principal must o§er a contract that gives the agent a higher expected utility than the outside option, and a higher expected utility than shirking. In this case we require:

and:

h i a2 E e gw (x )  1

h i h i a2 E e gw (x )  a1 E e gw (x ) g (x )

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

24 / 48

Solving the Pure Moral Hazard Model Cost Minimization to Achieve Diligent Work

Defining v (x )  exp [gw (x )] note that:

E [w (x )] = g1 E {log [v (x )]} the participation constraint can be expressed as: a2 E [v (x )]  1 and the incentive compatibility constraint becomes: a2 E [v (x )]  a1 E [v (x )g (x )] In the transformed problem we maximize a strictly concave objective function with linear constraints. Applying the Kuhn Tucker theorem applies, we choose v (x ) for each x to maximize: E {log [v (x )]} + h 0 E [1  a2 v (x )] + h 1 E [a1 g (x ) v (x )  a2 v (x )] G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

25 / 48

Lemma (Margiotta and Miller, 2000) To minimize the cost of inducing the agent to accept employment and work diligently the board o§ers the contract:     a2 o 1 1 w (x )  g ln a2 + g ln 1 + h  hg (x ) a1 where h is the unique positive solution to the equation:     g (x ) (a2 /a1 ) E =E a2 + h [(a2 /a1 )  g (x )] a2 + h [(a2 /a1 )  g (x )] Di§erentiate the Lagrangian with respect v (x ) to obtain: v (x ) 1 = h 0 a 2 + h 1 a 2  h 1 a 1 g ( x ) We can show both constraints are met with equality, establishing the formula for h, and showing h 0 = 1, to yield: v (x ) 1 = a 2 + h 1 a 2  h 1 a 1 g (x ) G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

26 / 48

Solving the Pure Moral Hazard Model Intuition for Cost Minimizing Contract

There is no point exposing the manager to uncertainty in a shirking contract by tying compensation to revenue. Hence a agent paid to shirk is o§ered a fixed wage that just o§sets his nonpecuniary benefits, g1 ln a1 . The certainty equivalent of the cost minimizing contract that induces diligent work is g1 ln a2 , higher than the optimal shirking contract to compensate for the lower nonpecuniary benefits because a2 > a1 . Moreover, the agent is paid a positive risk premium of E [w o (x )]  g1 ln a2 .

In this model of pure moral hazard these two factors, that diligence is less enjoyable than shirking, and more certainty in compensation is preferable, explains why compensating an agent to align his interests with the principal is more expensive than merely paying them enough to accept employment. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

27 / 48

Solving the Pure Moral Hazard Model Profit Maximization

Profit maximization by the principal determines which cost minimizing contract the principal should o§er the agent. The profits from inducing the agent to work diligently are x  w o (x ) , while the profits from employing the agent to shirk are xg (x )  g1 log (a1 ) . Thus diligent work is preferred by the principal if and only if: max {0, gE [xg (x )]  log (a1 )}  gE [x  w o (x )] while a shirking contract is o§ered if and only if: max {0, gE [x  w o (x )]}  gE [xg (x )]  log (a1 ) Otherwise no contract is o§ered. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

28 / 48

Identification in the Pure Moral Hazard Model Parameters

The parameters of the model are characterized by f (x ) and g (x ) , which together define the probability density functions of gross profits, (a1 , a2 ) , the preference parameters for shirking and diligent work (relative to the normalized utility from taking the outside option), as well as the risk aversion parameter g. For the purposes of this introductory example, we assume the data comprise independent draws of profits and compensation, (xn , wn ) for a sample of N observations generated in equilibrium. When the principal induces shirking, the density f (x ) g (x ) can be estimated from observations on profits, the wage is constant at wn  g1 log (a1 ) for all n, but nothing more can be gleaned from the data about the structure of the model.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

29 / 48

Identification in the Pure Moral Hazard Model An Implication of the Regularity condition

Our analysis focuses on cases when diligence is induced, and compensation wn , depends nontrivially on revenue xn . Hence f (x ) is identified, along with N points on the compensation schedule wn  w o (xn ) . Under the assumptions of the model f (x ) can be estimated with a nonparametric density estimator. From the compensation equation, the regularity condition on g (x ) and the fact that g (x ) is nonnegative, the maximum compensation the agent can receive is:    a2 o 1 1 lim w (x ) = g ln a2 + g ln 1 + h w (1) x !• a1 Thus w is identified, and consistently estimated by the maximum compensation observed in the data. This essentially leaves g, a1 , a2 , and g (x ) to identify from f (x ) , w o (x ) , and w . G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

30 / 48

Identification in the Pure Moral Hazard Model Approach

Our analysis proceeds in three steps. First we show that if g is known, then a1 , a2 , and g (x ) are identified from the cost minimization problem. This means that the set of observationally equivalent parameters can be indexed by the positive real number g, the risk aversion parameter. Second, we show that the firm’s preference for diligence over shirking provides an additional inequality that helps delineate the values of observationally equivalent g. Third, we prove that the set of restrictions we have derived in the first two steps fully characterize the identified set.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

31 / 48

Identification in the Pure Moral Hazard Model A Definition of the Likelihood Ratio

Suppose g is known, and define the mappings g (x, g) , a1 (g) , and a2 (g) as: o e gw  e gw (x )   g (x, g)  gw o e  E e gw (x )

All three mappings inherit the basic structure of the model for any positive value of g. Integrating over x, we that E [g (x, g)] = 1 for all g > 0. h i o o Also by definition w  w , so e gw  E e gw (x ) and e gw  e gw (x ) for all g > 0. Therefore g (x, g)  0 for all g > 0. Furthermore as x ! •, we see that w (x ) ! w , and hence g (x, g) ! 0, as stipulated by the regularity condition. This proves g (x, g) can be interpreted as a likelihood ratio satisfying the regularity condition for all g > 0. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

32 / 48

Identification in the Pure Moral Hazard Model A Definition of the Taste Parameters

Next define a1 (g) and a2 (g) as: h i o n h io1 1  E e gw (x )gw gw o (x ) a1 (g)   gw o (x )  , a g  E e ( ) 2 E e  e gw o

Clearly a2 (g) > 0 because e gw (x ) > 0. Similarly the numerator and denominator of the equation for a1 (g) have the same sign for all g, so a1 (g) is also positive. Rearranging the expression for the ratio of the two taste parameters we obtain: h i gw  E e gw o (x ) e a1 ( g ) =   1 o a2 ( g ) e gw  E e gw (x ) Since the inverse function is convex, Jensen’s inequality implies h i h i 1 n h io1 h i o o o o E e gw (x ) > E e gw (x ) or E e gw (x ) < E e gw (x ) . Therefore a1 (g) < a2 (g) for all g > 0. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

33 / 48

Identification in the Pure Moral Hazard Model Using the Cost Minimization Problem in Identification

Summarizing, given a density f (x ) for x and a compensation schedule w o (x ) satisfying w o (x ) ! w as x ! •, identified from observations (xn , wn ), for any positive g we can construct, as primitives for the principal agent model, a g (x, g) , a a1 (g) , and a a2 (g). But we can also prove a stronger result:

Theorem (Gayle and Miller, 2015) Suppose the data on xn and wn is generated by a parameterization of the model denoted by a1 , a2 , g , g  (x ) and f  (x ) in which shareholder induce diligent work by solving the cost minimization problem. Then: a1 = a1 (g ) a2 = a2 (g ) g  (x ) = g (x, g ) G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

34 / 48

Identification in the Pure Moral Hazard Model Intuition for Theorem

Making g (x ) the subject of the compensation equation and di§erentiating with respect to x yields:  g 0 (x ) = h 1 e gw (x ) ∂w (x ) ∂x

From this equation it is evident that the slope is defined up to one normalization; a second normalization determines the level of g (x ) . In our setup the regularity condition provides one normalization; the fact that E [g (x )] = 1 provides another. The formula for a2 (g ) is due to the participation constraint being met with equality. Since the incentive compatibility constraint is also met with equality: h i h i o o a1 E e gw (x ) g (x ) = a2 E e gw (x ) = 1 and substituting in the formula for g (x ) and rearranging to make a1 the subject of the equation produces the formula evaluated at g . G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

35 / 48

Identification in the Pure Moral Hazard Model Restrictions from Profit Maximization

The restrictions from cost minimization place no restrictions on g. Imposing profit maximization limits the set of admissible g. If paying w o (x ) is more profitable than paying g1 log (a1 ) then: E [x ]  E [w o (x )]  E [xg (x )] + g1 ln (a1 )  0 Substituting for g (x ) = g (x, g) and a1 = a1 (g) define Q0 (g) as: E [x ]  E [w o (x )] h i 1 0 " # gw o (x )gw o (x ) gw gw 1  E e e e A   + g1 log @   E x gw o o e  E e gw (x ) E e gw (x )  e gw

From the theorem Q0 (g )  0. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

36 / 48

Identification in the Pure Moral Hazard Model Sharp and Tight Bounds

This inequality Q0 (g )  0 restricts the set of admissible g. Are there any other restrictions? The short answer is no. Define G, a Borel set of risk aversion parameters, as: G  {g > 0 : Q0 (g)  0}

Theorem (Gayle and Miller, 2015) Consider any data generating process for (xn , wn ) . If G is not empty, it indexes the set of observationally equivalent parameters for a pure moral hazard model where the principal maximizes expected profits by inducing the agent to work diligently. Otherwise G is empty and the data is not generated by such a model.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

37 / 48

Empirical Content Identification is concerned with recovering parameters of interest from data generated by a model. empirical content determines whether the model can be rejected by data generated by a di§erent model. We now suppose the data is not necessarily generated by the PMH1 model. Under the null hypothesis, that the data could have been generated by a PMH1 model, G1 is not empty; Under the alternative hypothesis the data could not have been generated by a PMH1 model and G1 is empty. A corollary establishes PMH1 models have empirical content: they can be rejected.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

38 / 48

Empirical Content The PMH model is flexible enough to entertain a non-monotone mapping from revenue to compensation in the equilibrium optimal contract. An agency problem only exists because the principal expects higher revenue if the agent works, but the agent prefers to shirk. Frequently observing high levels of compensation paired with low revenue outcomes and vice-versa seem counterintuitive to the predictions of a PMH1 model. Such empirical regularities provide the basis for rejecting it. The cost minimization problem, implies the data generating process identifies a set of parameters that fully characterizes a PMH model up to any positive g. Only the profit condition could be violated. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

39 / 48

Empirical Content

If the principal to motivate the agent to work, even though inducing work is more expensive to the principal and their goals are perfectly aligned when the agent shirks! The potential for observing this contradiction underlies its empirical content. Formally, the profit condition is violated if compensation is monotone decreasing in revenue.

Corollary There exist joint distributions of (W , X ) such that G1 is empty.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

40 / 48

Restrictions and Structure for Identification and Estimation Semiparametric Set (Gayle and Miller, 2015, Review of Economic Studies) Take if straight to the data!

Full Solution Parametric model (Gayle and Miller, 2009, American Economic Review). Parametric restrictions on distribution and Observed heterogeneity in preferences

Semiparametric two step point Identification (Gayle, Golan, Miller, 2015, Econometrica). Get more data!! –- Turnover, promotion, retirement, work histories, education, With more data put economic structure – General equilibrium

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

41 / 48

Semiparametric Set Estimation (Gayle and Miller, 2015, ReStud) Approximating the Q function

The identified set of risk parameters defined G has a simple empirical analogue. Suppose we have N cross sectional observations on (xn , wn ) on identical firms and their managers. To estimate Q0 (g) , we replace w with w (N )  max {w1 , . . . , wN } and substitute sample moments for their population corresponding expectations, to obtain upon rearrangement: (N )

Q0

(g) 

N

Ân =1 (xn wn ) /N ( ) N  Ân =1 xn e gw  e gw N

h

.

Â

N n =1

i (N ) N +g1 log Ân =1 e gw  e gwn h i (N ) N +g log Ân =1 e g(w wn )  N

G.-L. Gayle (Wash. U. and St. Louis Fed)



n

Theory Based Estimation



e gw

(N )

 e gwn

08/15



42 / 48

Semiparametric Set Estimation Convergence of the Approximation

Our tests are based on the fact that if g 2 G then sampling error is (N ) the only explanation for why Q0 (g) might be negative. (N )

Clearly Q0 (g) converges at the rate of its slowest converging component. For simplicity suppose there exists some x < • such that g (x ) = 0 for all x > x. In words, there is a revenue threshold that shirking cannot achieve. Thus compensation is flat at w for all profits p levels above x, and w (N ) converges to w at a faster rate than N. (N )

Since all the other components of Q0 (g) are sample moments, we p (N ) conclude Q0 (g) converges at rate N.

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

43 / 48

Semiparametric Set Estimation A Test (N )

Denote by Gd the set of risk aversion parameters that asymptotically cover the observationally equivalent set of g > 0 with probability 1  d. For the critical value cd associated with test size d, this set is defined as:   n p o2 (N ) (N ) Gd  g > 0 : min 0, NQ0 (g)  cd A consistent estimate of cd can be determined numerically by following subsampling procedures in Gayle and Miller (2010). p (N ) Intuitively, if NQ0 (g) is negative and large in absolute value for all g > 0 we reject the null hypothesis that the pure moral hazard model generated the data. p (N ) On the other hand if NQ0 (g ) is small in absolute value, or positive, we do not reject the null hypothesis that g belongs to the identified set. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

44 / 48

Full Solution Parametric (Gayle and Miller, 2009, AER) An observation on compensation, denoted w et , is the sum of true compensation wt plus an independently distributed error #t , assumed orthogonal to the other variables of interest: w e t = wt + # t

We parameterized f1 (x ) and f2 (x ) , the distributions of abnormal returns under shirking and working respectively, as truncated normal with support bounded below by y, setting: " #    p  1 µj  y (x  µj )2 fj ( x ) = F s 2p exp s 2s2 where j 2 {1, 2} denotes the shirking and working respectively, where F is the standard normal distribution function, and where (µj , s2 ) denotes the mean and variance of the parent normal distribution. G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

45 / 48

Full Solution Parametric

Denoting by f the standard normal probability density function, the implicit function for µ2 is given by 0 = E (xt | l2t = 1) = µ2 +

sf[(y  µ2 )/s] . 1  F[(y  µ2 )/s]

We estimate the bankruptcy return y, the mean of the parent normal distribution under shirking µ1 , the common variance of the parent normal s, the risk aversion parameter r, the ratio of nonpecuniary benefits from working to shirking a2 /a1 , and the ratio of nonpecuniary benefits from working to quitting a2 /a0 .

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

46 / 48

Identification and Estimation Equilibrium Turnover and Promotion (Gayle, Golan, and Miller, 2015, Econometria) Overview-certainty equivalent, sorting and compensating di§erentials

Sample analogs were constructed for the conditional choice probabilities, compensation schedule, and conditional and unconditional densities of the abnormal return. Construct moment conditions from the participation constraint using an optimal two-step GMM for each of the firm/rank types. Intuitively estimation exploits the idea that when risk averse managers make rational choices between di§erent uncertain outcomes or lotteries they are revealing their attitude towards risk. These choices also reveal trade-o§ between current and future payo§s

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

47 / 48

Research in Accounting

A lot of models in top accounting journals What happened in Marketing, Political Science and Finance? Up to you guys

G.-L. Gayle (Wash. U. and St. Louis Fed)

Theory Based Estimation

08/15

48 / 48

Theory Based Estimation.pdf

Department of Economics, Washington Unversity in St. Louis and The Federal Reserve. Bank of St. Louis. August 2015. G.-L. Gayle (Wash. U. and St. Louis Fed) ...

471KB Sizes 1 Downloads 131 Views

Recommend Documents

A wave-based polishing theory
Sep 19, 2007 - These grooves and unevennesses are not filled with anything, but they are not seen. As Rayleigh Roughness criterion states if the depth of ...

Graph Theory Techniques in Model-Based Testing - Semantic Scholar
Traditional software testing consists of the tester studying the software system .... In the case of a digraph, an Euler tour must also take account of the direction.

Coding theory based models for protein translation ...
used by an engineering system to transmit information .... for the translation initiation system. 3. ..... Liebovitch, L.S., Tao, Y., Todorov, A., Levine, L., 1996. Is there.

A Research-based Theory of Addictive Motivation
This paper examines whether a relatively new line of scientific inquiry may clarify some age-old puzzles about addiction. Although the most productive research on addictions in recent years has been the study of their brain mechanisms, I won't say mu

Object-Based Unawareness: Theory and Applications
Apr 25, 2009 - Tirole's paper, a buyer and a seller negotiate a contract as in the standard holdup problem. ..... expanding the set of events that he can credibly insure. .... versation: the “universe of objects” referred to by the word “all”

Testing a self-determination theory-based teaching ...
Sep 18, 2007 - exercise program were exposed to an SDT-based (i.e. SDTc; n ¼ 25) or typical .... Autonomy support refers to the provision of choice and meaningful rationale ... have used, in the majority of instances, an autonomous motivation compos

A Research-based Theory of Addictive Motivation
explained rewards of ingesting, but why do they sometimes buy ways to make ..... nature to a child that the telephone pole down the street is as tall as the one ...

Quantum Evolutionary Algorithm Based on Particle Swarm Theory in ...
hardware/software systems design [1], determination ... is found by swarms following the best particle. It is ..... “Applying an Analytical Approach to Shop-Floor.

Theory-based categorization under speeded conditions ...
Department of Psychology, Vanderbilt University. Wilson Hall, 111 ... On one view, concept learning and use involves computing the similarity between novel ...

PDF Physically Based Rendering: From Theory to Implementation Read online
Physically Based Rendering: From Theory to Implementation Download at => https://pdfkulonline13e1.blogspot.com/0123750792 Physically Based Rendering: From Theory to Implementation pdf download, Physically Based Rendering: From Theory to Implement

A Demspter-Shafer Theory based combination of ...
Jul 24, 2014 - and the reliability of a HMM based handwriting recognition system, by the use of Dempster-Shafer Theory (DST). ..... (presented in the previous section) to select the best one to use in the final global system. 15 ..... Rejection perfo

Graph Theory Techniques in Model-Based Testing
help us use this model information to test applications in many different ways. This paper ... What's Wrong with Traditional Software Testing? Traditional software testing consists of the tester studying the software system and then writing and.

A Dempster-Shafer Theory based combination of ...
1 Gipsa-Lab, Institut National Polytechnique de Grenoble, 46 av. Felix Viallet, Grenoble,. France. - {firstname}.{lastname}@gipsa-lab.inpg.fr -. 2 Dep. of Comp. ... Fourier descriptors, and Curvature Scale Space (CSS) descriptors. Region descriptors

Margin Based Feature Selection - Theory and Algorithms
criterion. We apply our new algorithm to var- ious datasets and show that our new Simba algorithm, which directly ... On the algorithmic side, we use a margin based criteria to ..... to provide meaningful generalization bounds and this is where ...

Adaptation Algorithm and Theory Based on Generalized Discrepancy
rithms is that the training and test data are sampled from the same distribution. In practice ...... data/datasets.html, 1996. version 1.0. S. Sch˝onherr. Quadratic ...

Coding theory based models for protein translation ... - Semantic Scholar
We tested the E. coli based coding models ... principals have been used to develop effective coding ... Application of channel coding theory to genetic data.