∗

Yuichiro Waki [email protected] Department of Economics, University of Minnesota

Abstract This paper considers optimal history-dependent income taxation when skills are endogenously formed. Skills are endogenous in that individuals invest time in a Ben-Porath type human capital accumulation technology over the life cycle. Following the literature of the New Dynamic Public Finance, I show the optimal taxation problem with history-dependent income taxes can be formulated as a dynamic mechanism design problem in which people’s skills (human capital) and their human capital investment are private information. This clarifies the two tradeoffs that arise from the endogenous skill assumption: One between the skill accumulation incentive and insurance, and the other between current labor income and future high skills. To quantify how these tradeoffs affect efficient risk-sharing implications, I calibrate models with and without human capital accumulation to the U.S. data, numerically solve for optimal allocations and taxes in these two models, and contrast the policy implications. Computational difficulty is overcome using an approach developed by Fukushima and Waki [2010].

∗

This version: February 23, 2011. I’m indebted to Chris Phelan and Narayana Kocherlakota for their invaluable

advice. I thank Toni Braun, Kenichi Fukushima, Fatih Guvenen, Ellen McGrattan, Ctirad Slavik, and participants in Public Economics Workshop for comments and suggestions. All remaining errors are mine.

1

Introduction

One role of an income tax is to provide insurance against income risk, by redistributing resources from high income earners to low income earners. Justification for such a role is grounded on the view that individual labor incomes differ due to unobserved differences in ability and unpredictably fluctuate over time, that private insurance markets are not working most effectively, and that the costs of distorting incentives are outweighed by the benefits of insurance. Numerous projects on optimal income taxation have considered how the government should address the incentive-insurance tradeoff most effectively. Quantitative studies generally set up a dynamic general equilibrium model in which individuals are hit by uninsurable idiosyncratic skill shocks, specify the details of the model to capture relevant aspects of the reality, and characterize an optimal tax system numerically, under exogenously imposed parametric assumptions on the tax system (e.g. Conesa and Krueger [2006] and Conesa, Kitao, and Krueger [2009]). Although a parametric assumption facilitates a characterization of an optimal tax system, it is imposed basically for a computational reason and might also be overly restrictive. A recently developing literature of New Dynamic Public Finance (NDPF) addresses this issue and considers an optimal taxation problem under a weaker restriction: labor and capital income taxes can depend on the history of labor incomes in an arbitrary manner.1 Neither a functional form restriction nor history/age independence is imposed on the set of available tax systems. Theoretically, it establishes the necessity of a negative interaction between labor income and capital income tax, which is in many cases excluded by parametric assumptions (Kocherlakota [2005])2 ; Quantitatively, it finds a big welfare gain by switching to an optimal history-dependent tax system, even from a tax system that is optimal under a parametric assumption (Fukushima [2010]). These findings suggest that when addressing the fundamental tradeoff between incentives and insurance it is essential to isolate it from an artificial constraint on a tax system. In this paper I consider an optimal history-dependent income taxation problem in the environment where skills are endogenously formed and quantify its implications, following the NDPF approach. Skills are endogenous in that individuals accumulate skills by investing time in a risky, Ben-Porath 1 Golosov, Tsyvinski, and Werning [2006], Kocherlakota [2010a], and Kocherlakota [2010b] provide excellent reviews of this literature. 2 Kitao [2010] considers an optimal taxation problem under a parametric assumption, allowing capital income tax to depend linearly on labor incomes, and finds a large long-run welfare gain from negative correlation between capital income tax and labor income.

1

type human capital accumulation technology over the life cycle.3 I generalize Kocherlakota [2005] and show that under a certain condition an optimal taxation problem with history-dependent income taxes is equivalent to a dynamic mechanism design problem in which an individual’s (agent’s) skill and human capital investment activity are private information. Formulating an optimal taxation problem as a dynamic mechanism design clarifies the key tradeoffs that arise from the endogeneity of skills. When skills are exogenous the key tradeoff is the one between the work incentive and insurance. By insurance I mean consumption smoothing across different realization of skills and a shift of work hours from the unproductive to the productive. By the work incentive I mean the need of positive correlation between consumption and skill in supporting unequal work hours across different skills, for skills are private information. When skills are endogenous, there arise two additional tradeoffs: one between insurance and the skill accumulation incentive, and the other between future skills and current income. The former tradeoff arises since positive correlation between future consumption and future skill is needed to encourage current human capital investment, and the latter arises from the assumption that the time used in human capital investment doesn’t produce labor income. The goal here is to quantify whether and how these additional tradeoffs affect the policy implications of the NDPF. Since the quantitative studies in this literature so far have largely focused on exogenous skill cases, this also serves as a robustness analysis of the NDPF’s implications to an empirically plausible, alternative specification of skill process. To examine the robustness quantitatively, I derive the policy implications using the NDPF approach, using calibrated models with and without endogenous skills. Toward this end, I specify preferences and technologies that nest both endogenous and exogenous skill cases. Two sets of parameters are calibrated, assuming that skills are endogenous in one and exogenous in the other, so that in a market economy where people trade risk-free bonds under a borrowing constraint and a calibrated U.S. tax system the equilibrium allocation reproduces selected features of the U.S. data. In each economy I derive policy recommendations and compare the results. The main findings so far are summarized as follows. First, when skills are endogenous the lifetime income tax is, in some sense, less progressive. For each possible path of labor incomes over the life cycle, I define the lifetime tax rate as the present value of taxes along that income path divided by 3

The question of how labor income tax should be designed to provide insurance against uninsurable human capital shocks dates back, at least, to Eaton and Rosen [1980]. Considering an optimal income taxation with a restriction that the labor income tax function is affine, they make a point that an optimality calls for some distortion; Starting from a non-distortionary tax, they find that at a margin the benefits of insurance exceeds the costs of distortion.

2

the present value of labor incomes. I regress it on the present value of labor incomes and constant. The intercept term is higher and the coefficient of lifetime labor income is lower in the endogenous skill model, implying less progressive tax rate than in the exogenous skill model. Second, when skills are endogenous, the lifetime labor income inequality is lower, and the lifetime consumption inequality relative to labor income’s is higher. Relative inequality in consumption is measured as the variance of log of lifetime consumption divided by that of lifetime income. Higher relative inequality in consumption is a direct consequence of the less progressive tax; Under a less progressive income tax the consumption distribution keeps track of the income distribution more closely, thereby resulting in higher relative inequality in consumption. There are two forces behind the lower labor inequality. One is the assumption that people need to forego some labor income to accumulate future skills. The other is that the reward for human capital investment in an early stage of the life cycle is also provided through a relatively low work effort for the productive in a later stage. A challenge for quantitative analysis in the NDPF is a computational one. Allowing arbitrary history-dependence makes an optimal taxation problem insolvable. The crux of the NDPF approach is first to show that the best allocation achievable under an optimal tax system is obtained by solving a dynamic mechanism design problem, and second to construct an optimal tax system from that allocation. This translates an optimization problem over an infinite dimensional space into a mechanism design problem. Though this simplifies the original problem, obtaining a full characterization of a solution to the dynamic mechanism design problem is still a complicated task because of the “curse of dimensionality” problem, when the private shocks are persistent (See Fukushima and Waki [2010] for details). This computational challenge has led researchers to focus on a particular kind of risk at a time, and sometimes even to simplify the incentive problems. In particular, quantitative studies in the NDPF literature have largely focused on the idiosyncratic skill/productivity risk, assuming it is an exogenous stochastic process4 , and risk-sharing implications of endogenous skill formation are not fully investigated. A few exceptions include Kapicka [2006] and Kapicka [2009]. These papers focus on a situation where individual’s productivity depends on his innate ability that is fixed over time and human capital that is accumulated without any risk, thereby abstracting from the insurance problem arising from unpredictable fluctuations in skills. Kapicka [2006] simplifies an optimal taxation problem 4

Examples include Golosov and Tsyvinski [2006], Weinzierl [2009], Ales and Maziero [2009], Slavik and Wiseman [2009], Huggett and Parra [2010], Fukushima [2010], and Golosov, Tsyvinski, and Troshkin [2010]

3

by imposing a restriction that the labor income tax can depend only on the current income, and focuses on a steady-state in which for each innate ability level the optimal human capital is constant over time. These assumptions enable him to translate the optimal taxation problem into a static mechanism design problem with exogenous skills. Kapicka [2009] allows for history-dependence of the labor income tax, but assumes that utility is quasi-linear in consumption. When the social welfare function is linear in individuals’ utilities, the first best allocation is implementable. Hence he further imposes type-specific utility bounds, which makes the first-best not achievable. The key friction there is the binding utility bounds. To overcome this computational difficulty without adding constraints to a mechanism design problem, I use an approach proposed by Fukushima and Waki [2010]. They identify the class of type transition kernels that permit a recursive formulation with a low dimensional state variable, while imposing all the incentive compatibility constraints. Although this requires an approximation of the transition kernel, it has an advantage relative to the first-order approach (Kapicka [2010]) that reduces the dimension of the state variable by correctly conjecturing which incentive constraints bind, before solving the model. Identifying the binding constraints is particularly hard in the setting of this paper because agents have two levers to manipulate: human capital investment and a report about their types.

2

Model and Optimal Tax System

In this section I set up an environment with human capital accumulation: during the working age, people spend time on human capital investment, and hours worked are not counted as the human capital investment activity.

2.1

Preference and Technology

A continuum of individuals (agents) live for a finite number of periods from t = 0 to T . I focus on a single cohort and abstract from intergenerational redistribution. Hence I also refer to t as an individual’s age. There is an exogenous retirement age T R (≤ T ) and people work for t = 0, ..., T R and can not work from T R + 1. At each point in time, people differ in their productivity (human capital level) θ. I also call θ the agent’s type in a period. For any period t, agent’s type θt belongs to a finite set Θt . I denote the history of θ’s from period s to t by θst = (θs , ..., θt ) and the full 4

history up to period t by θt = θ0t . A θ-type individual can convert l units of time into y = θl units of labor income. In each period t, an individual receives a shock θt , produces yt ∈ Yt , consumes ct ∈ C, and invests time in human capital et ∈ Et . I assume the sets Et ’s are finite so that the planning problem in Section 2.2.2 has a finite number of constraints. The total time endowment is normalized to one and I assume Et ⊂ [0, 1] and 0 ∈ Et . The type θ evolves stochastically over time. An individual before retirement can affect the probability distribution of θ by devoting a fraction of time e into human capital production. The type θ is Markov, i.e. for all t, θt+1 is independent of all the history up to period t, except for (θt , et ). The conditional probability of an θt -type agent in period t becoming the θt+1 -type in the next period given the current period human capital investment et is denoted by πt+1 (θt+1 |θt , et ). The initial distribution of θ0 is denoted π0 . I assume the full support, i.e. πt (θ0 |θ, e) > 0 for all t and (θ0 , θ, e) ∈ Θt × Θt−1 × Et . Also, π0 (θ) > 0 for all θ ∈ Θ0 . An investment strategy is a sequence of functions e = {et }Tt=0 such that for all t, et : Θt → Et . The set of all investment strategies is denoted by E. An investment strategy determines the probability distribution of θt . The conditional probability of θt happening given e ∈ E is denoted by Pr(θt |e) = π0 (θ0 )π(θ1 |θ0 , e0 (θ0 ))...π(θt |θt−1 , et−1 (θt−1 )). An allocation is (x, e), where x = (c, y) = {ct , yt }Tt=0 such that for all t, ct : Θt → C and yt : Θt → Yt . The set of all c (y) is denoted by C (Y, respectively). The set of all allocations is X = C × Y. For periods after retirement t ≥ T R + 1, no production is possible i.e. Yt = {0}. The pair (θ, e) is private information to an agent, while (c, y) is not. Individual’s (ex-ante) utility from an allocation (x, e) is given by

U (x, e) =

T X X

n o yt (θt ) + et (θt ) Pr(θt |e). β t u(ct (θt )) − v θt t

t=0 θt ∈Θ

I assume that u is continuous, increasing, and concave, and that v is continuous, increasing, and convex. The assumption of separability of the period utility in consumption and leisure facilitates the construction of tax systems that implement an optimal allocation. I discuss this issue in Section 2.3. The cost associated with an allocation (x, e) is

J(x, e) =

T X X

n o R−t ct (θt ) − yt (θt ) Pr(θt |e).

t=0 θt ∈Θt

Let J0 be the available resources in period 0, or i.e. −J0 is the total amount of resources in period 0 5

which this cohort owes; For example −J0 may be equal to the present value of government expenditure P −t t R Gt . I say an allocation is feasible if J(x, e) ≤ J0 and refer to this inequality as the feasibility constraint. Note that this definition of costs implies that intergenerational redistribution is disregarded or incorporated as exogenous in J0 . Also incorporated in the above definition is the assumption of a small open economy with a constant interest rate R. Though it is interesting and important to see how a tax reform changes the real interest rate and skill price through changes in the distribution of human capital, it is left for now as a future exercise.5 This specification is consistent with e.g. the post-school training in Heckman, Lochner, and Taber [1998b], Huggett, Ventura, and Yaron [2009], and Guvenen, Kuruscu, and Ozkan [2010]. This complements the analysis by Grochulski and Piskorski [2010]; They consider an optimal income taxation in an environment with once-and-for-all human capital investment in the form of goods, not time, which is more consistent with an interpretation of educational expenditure. Kapicka [2006] and Kapicka [2009] consider the optimal taxation problem in a similar setting to this. In these papers, people differ in their production skills that are constant over time and their human capital level, and an agent’s productivity is the product of these two. Human capital accumulation technology is riskless and thus all the uncertainty is resolved in the first period, thereby abstracting from the insurance against shocks that hit people over time. Another big difference between these papers and this paper is the way the mechanism design problem is set up. In Section 2.2.2 I discuss this issue. Note that this environment clearly nests an exogenous skill case, where Et = {0} for all t.

2.2

Market Economy and Planning Problem

I consider two mechanisms that determine an allocation. One is a market mechanism where there is a decentralized market for a risk-free bond and individuals trade them subject to a borrowing constraint and a given tax function. An allocation is determined in a competitive equilibrium fashion. I call this a market economy. The other is an optimal direct mechanism which a hypothetical planner specifies so as to maximize the ex-ante utility of an individual subject to incentive constraints and feasibility. I call this a planning problem. A market economy is later used to calibrate preference and technology parameters to the U.S. 5 Heckman, Lochner, and Taber [1998a] and Heckman, Lochner, and Taber [1999] point out that the analysis under a partial equilibrium assumption can generate misleading results. I later examine the average human capital level before and after the reform.

6

economy, so that the equilibrium allocation under a calibrated U.S. tax system reproduces selected features of the U.S. data. It also provides a benchmark allocation for comparison. A planning problem is, as shown later, equivalent to an optimal taxation problem where the government can choose among history-dependent income tax systems and an allocation is determined by a market mechanism. I show that one can construct an optimal history-dependent tax system from a solution to the planning problem, in the same way as in Kocherlakota [2005].

2.2.1

Market Economy

In a market economy, individuals trade risk-free bonds with gross return R under an exogenously given borrowing constraint and history-dependent income tax functions τ = {τtk (.), τtl (.)}Tt=0 such that for all t, (τtk (.), τtl (.)) : Y t → R2 where τtk (y t ) denotes a linear wealth tax rate after history y t and τtl (y t ) denotes a labor income tax after history y t . Let T AX denote the set of all such taxes. An allocation is determined in a competitive equilibrium. Individual’s problem is to choose a contingent plan (c, y, e) and asset b = {bt+1 }Tt=0 with ∀t, bt+1 : θt → R to maximize U (x, e) subject to the budget constraint ct (θt ) + bt+1 (θt ) ≤ {1 − τtk (y t (θt ))}Rbt (θt−1 ) + yt (θt ) − τtl (y t (θt )),

∀t, θt ,

the borrowing constraint bt+1 (θt ) ≥ 0, and the initial and terminal conditions b0 = bT +1 (θT ) = 0. The government budget constraint is T X X

n o R−t τtl (yt (θt )) + τtk (yt (θt ))Rbt (θt−1 ) Pr(θt |e) + J0 = 0.

t=0 θt ∈Θt

An equilibrium given a tax system τ is an allocation (x, e) and asset b that solves individual’s problem, satisfies the government budget constraint, and the feasibility constraint with equality. (There is no reference to a price system because this is a small open economy.) Let EQM (τ ) be the set of allocations (x, e) such that there exists b such that (x, e, b) is an equilibrium given τ . When the ex-ante utility of an individual is used as a social welfare function, an optimal taxation problem is max

τ ∈T AX

max (x,e)∈EQM (τ )

7

U (x, e) .

This problem is hard to solve directly, since the set T AX is very large. It is shown, however, that this problem is in some sense equivalent to the planning problem defined below.

2.2.2

Planning Problem

In the planning problem, an allocation is chosen by a planner who maximizes the ex-ante utility of an individual, taking into account incentive problems that arise from private information, as well as the feasibility constraint. Without loss of generality, I focus on a direct mechanism where the agents make report about their type θ and the planner sends a recommendation of action as a function of report history (Myerson [1982]). An allocation has to take into account the incentive compatibility constraint in addition to the feasibility constraint. A reporting strategy r = {rt }Tt=0 is a sequence of functions s.t. for all t, rt : Θt → Θt and rt (θt ) = (r0 (θ0 ), ..., rt (θt )) ∈ Θt . The set of all reporting strategies is denoted by R. ˜), for any reporting An allocation is incentive compatible if and only if it satisfies U (x, e) ≥ U (x ◦ r, e ˜ ∈ E. strategy r ∈ R and any investment strategy e The notion of optimality depends on the choice of social welfare function. The social welfare function I use is the ex-ante lifetime utility of the agent in a given cohort. The mechanism design problem is maxx,e U (x, e) subject to the incentive compatibility constraints and the feasibility J(x, e) ≤ J0 . A solution to this problem is called an optimal allocation.I find a solution to this problem by solving instead minx,e J(x, e) subject to IC and U (x, e) ≥ U0 with U0 that gives the minimized cost equal to J0 . I call the constraint U (x, e) ≥ U0 the promise-keeping constraint.6 Obtaining a numerical solution to this dynamic mechanism design problem is known to be difficult, due to persistent private information and a hidden action. Kapicka [2006] and Kapicka [2009] use similar environments, but set up different, simpler planning problems. In these papers the only source of uncertainty is about the innate skill that shifts productivity level, and it is resolved in the first period. Therefore these papers abstract from the insurance against unpredictable fluctuations in skills. Kapicka [2006] considers an economy populated with infinitely-lived agents, and assumes that the labor income tax can depend only on the current income, thereby abstracting from historydependence. In the numerical exercises, he assumes the steady-state, i.e. for each innate skill level, the initial condition for the human capital is such that an optimal allocation specifies a constant human 6

Also, when I numerically solve the planning problem, I introduce lotteries to make the problem convex. (See the appendix for details.)

8

capital level. Under these assumptions the planner’s problem is shown equivalent to a static one with exogenous skill and distorted preference and technology. Kapicka [2009] uses a life cycle version of Kapicka [2006] and considers optimal labor income tax that can be history-dependent. He assumes, however, that the individual’s utility is linear in consumption. When the social welfare function is linear in individuals’ utilities (as it is here) the first best allocation is implementable. Hence he further assumes there is a set of promised utilities, each of which is associated with each innate skill level, and makes the first-best not achievable. The relevant friction here is the binding utility bounds that are newly introduced. Though setting up a planning problem differently is one way to simplify the computation, I take an alternative route: I assume the transition kernels πt ’s belong to a class that reduces a computational burden (Fukushima and Waki [2010]). This allows me to maintain the standard planning problem and to compare the policy implications across exogenous and endogenous skill assumptions.

2.2.3

Key Tradeoffs

When skills are endogenous, the key tradeoff is the one between insurance and the work incentive: the planner shifts work hours from the unproductive to the productive, and reward the productive with high consumption. When people accumulate skills by investing time, two other tradeoffs emerge. One tradeoff is between insurance and the human capital investment incentive. When the planner wants an agent to invest time in human capital investment, she needs to reward him by high utility when the productivity becomes high, and may also punish him by low utility when the productivity becomes low. Compared to the exogenous skill case, future consumption and future labor income need be more tightly linked. This affects the implications on redistribution across people with different labor incomes. The other tradeoff is the one between current income (production) and future skills (human capital investment). This arises from a technological constraint that time allocated to human capital investment doesn’t produce goods. The transition kernel πt+1 (θt+1 |θt , et ) captures the idea that a high θt -type can produce more goods from given work hours than a low θt -type, and more human capital on average from a given level of human capital investment. Simply shifting work hours from the unproductive to the productive to achieve high current output results in high marginal disutility of work for the productive, thereby making it harder to have them accumulate the skill and to achieve

9

high future output.

2.3

Decentralizing the Planning Solution through the Market Economy

The above planning problem is relevant in considering optimal taxation in the following sense: there exists a tax system τ ∗ ∈ T AX such that the planning solution (x∗ , e∗ ) is in EQM (τ ∗ ). (In Appendix I state this as a formal proposition and provide a proof.) In other words, the planning solution can be implemented as an equilibrium allocation in the market economy, with an appropriate setting of the history-dependent tax system. Since the market economy given a tax system τ is a mechanism and any (x, e) ∈ EQM (τ ) is incentive compatible and feasible, the planning solution is no worse than any market economy equilibrium allocation, i.e. U (x∗ , e∗ ) ≥ U (x, e) for any τ ∈ T AX and (x, e) ∈ EQM (τ ). This means τ ∗ ∈ arg max

τ ∈T AX

max

U (x, e) .

(x,e)∈EQM (τ )

Thus solving this optimal taxation problem is equivalent to finding a tax system τ ∗ such that the planning solution (x∗ , e∗ ) belongs to EQM (τ ∗ ). For cases where skills are exogenous, Kocherlakota [2005] provides a way to construct τ ∗ ∈ T AX from the planning solution x∗ . In the technical appendix I show that the same construction works when skills are endogenous as specified in Section 2.1.7 This has an important consequence: Whether skills are endogenous or exogenous, there is an optimal tax system that belongs to the class T AX. This is important because it enables us to compare optimal tax systems under different settings on the same ground. Planning solutions differ under different assumptions on skill formation, and this difference leads to different tax rates for any given history of labor incomes.

3

Design of Experiments

The design of the experiments are as follows. First I calibrate two market economies to the U.S. data, assuming skills are endogenous in one and exogenous in the other. Calibration target is chosen so that each model reproduces selected characteristics of the U.S. data, when the tax function is 7 What’s crucial is that utility is separable in consumption and leisure and that human capital investment is in the form of time.

10

parameterized to approximate the U.S. one. Second, in each model economy I numerically derive the policy implications by solving the dynamic mechanism design problems.

4

Calibrating the Market Economies to the U.S.

The computational burdens associated with solving the dynamic mechanism design problem impose the restriction on the set of models I can handle. I set T R = 4 and T = 6, so the model is a seven period model. Agents can work only up to period 4 and can not work in the last two periods. One model period corresponds to 7 years. Period 0 (t = 0) corresponds to age 25-31, and the period T R to age 53-59. Tax Function: The labor income tax function τtl (.) has the form:

τtl (y t )

=

φ0 (yt − (y −φ1 + φ2 )−1/φ1 ) + τss yt , t −SS,

∀t ≤ T R

∀t ≥ T R + 1

This functional form has been widely used since Gouveia and Strauss (1994). I use values (φ0 , φ1 ) = (0.258, 0.768) that are taken from Gouveia and Strauss. Social security payroll tax τss is set to 0.124 but assumed to be linear for any income level, and the social security benefit SS is assumed to be constant and independent of the labor income history. This is for simplicity. Wealth tax τtk (.) is set to τtk (y t ) = (R − 1)/R × φk , i.e. it is a linear tax on net interest income with a rate φk . Utility Function: I specify u(c) =

c1−γ −1 1−γ

1+η

and v(l, e) = ν (l+e) 1+η .

Parametric Assumption on π’s: Although the type space is discrete in the dynamic mechanism design problem I solve, it is convenient for calibration purposes to use a continuous type space and to make a parametric assumption on the set of conditional probabilities. The assumption I employ is that, conditional on (θt , et ), θt+1 is log-normal with mean log[θt + ζ(θt · et )α ] and variance σ2 . This is consistent with a stochastic Ben-Porath technology of the form

θt+1 = exp(t+1 ){θt + ζ(θt · et )α }, t+1 ∼ i.i.d. Normal(0, σ2 ),

11

which is taken from Huggett, Ventura, and Yaron [2009]. A parametric assumption reduces the number of parameters to be calibrated, and having a continuous function makes it easier to solve for an equilibrium in the market economy. Under this functional form, high productivity θt also implies high marginal productivity of human capital investment. Initial Condition: Studies using Ben-Porath models of human capital accumulation have found that heterogeneity in ζ is important in accounting for the fanning-out of wage/earnings inequality (Huggett, Ventura, and Yaron [2009], Guvenen, Kuruscu, and Ozkan [2010]). In the calibration I allow such heterogeneity and set ζ to its calibrated mean in the numerical exercises. Each agent’s initial human capital θ0 and the learning ability ζ are drawn from a bivariate log normal distribution. Following Huggett, Ventura, and Yaron [2009], I allow for a correlation between these two parameters and denote the correlation by ρ0 . Government budget constraint parameter J0 : J0 is chosen so that −J0 is equal to 20 % of the present value of total labor income. This is equivalent to a 20 % share of government spending in total labor income. I exogenously fix parameter values for (β, γ, η, R, φ0 , φ1 , α, φk ). Sensitivity analysis using different values particularly for the risk aversion γ and the labor disutility η needs to be conducted, since Golosov, Tsyvinski, and Werning [2006] find through numerical exercises that these are important determinant of optimal wedges in an exogenous skill model.

4.1

Calibration Method

Parameters that are calibrated using simulations include (σ , E(ζ), σζ , σθ0 , ρ0 , ν, φ2 ). I use the method of simulated moments to calibrate these parameters. The individual’s policy function is calculated using a weighted residual method. Data:

I use the Panel Study of Income Dynamics (PSID). A detailed description of the sample

selection and variable construction is found in the Data Appendix. Moments to match:

I define the wage in the model as y/(l + e), i.e. ”hours worked” reported

is interpreted as the sum of the actual hours worked l and time for human capital investment e. I follow Huggett, Ventura, and Yaron [2009] and use the cross sectional variance and the average of the growth rate of hourly wages between periods 3 and 4 (ages 43-51 to 52-60) to identify σ . This is because human capital investment later in life is close to zero and the movements in human capital

12

Parameters that are fixed exogenously Discount factor β 0.98 per annum Risk aversion γ 1.0 Labor elasticity η 2.0 Real interest rate r 2% per annum Tax parameter φ0 0.258, Gouveia-Strauss (1994) Tax parameter φ1 0.768, Gouveia-Strauss (1994) Human capital elasticity α 0.7 (Huggett et al. (2009)) SS payroll tax τss 0.124

Parameters Mean learning ability E(ζ) Std. of log ζ Std. of h.c. shock σ Utility weight on hours ν Std. of log θ0 Corr. b.w. log θ0 and ζ

Target Moment mean wage growth from 25-31 to 39-45 std. wage growth from 25-31 to 39-45 std. wage growth from 46-52 to 53-59 average hours = 0.3 Initial dist. of wages Corr. of initial wage and wage growth

are driven by in the model. The average growth rate in the first two periods is informative about E(ζ), and the variance of growth rates in the same periods is informative about var(ζ). The utility weight on hours ν is chosen so that average hours equals 30% of the time endowment (i.e. 0.3). The correlation between log w0 and log w1 − log w0 together with the unconditional distribution of log w0 are informative about the initial joint distribution of (log θ0 , log ζ). The labor tax parameter φ2 is chosen so that the government budget constraint is satisfied.

4.2

Discretization

I obtain discrete probabilities using the data obtained from simulating the above model with ζ = E(ζ). I discretize the set of human capital investment Et , using the unconditional distribution of the simulated data of et ’s. More precisely, for each t, I include four points in Et : 0, the median of the Moments: Target vs Simulated Target E(log w39−45 − log w25−31 ) 0.322 std(log w39−45 − log w25−31 ) 0.619 std(log w53−59 − log w46−52 ) 0.441 Average hours 0.3 corr(log w25−31 , log w32−38 − log w25−31 ) -0.370 std(log w25−31 ) 0.413

13

Simulated 0.349 0.626 0.38 0.306 -0.364 0.254

simulated distribution of et ’s, and ±10% of the median. From the simulated data, I calculate for each age the unconditional mean and variance of human capital and labor income. Following Tauchen [1986], I choose Θ0 using the mean and the variance of initial distribution and assign probabilities π0∗ to each point in Θ0 .8 For t ≥ 0, I do the same but use the mean and the variance of the simulated distribution of θt+1 to obtain a discrete set Θt+1 . For each (θt , et ), the parametric assumption on the human capital accumulation technology defines a probability ∗ (.|θ , e ) is obtained by discretizing that distribution over R+ , and the discrete transition kernel πt+1 t t

distribution. Discrete sets Yt ’s are also constructed in the same way, using the mean and the variance of the simulated distribution of yt ’s. For a computational reason which I’ll discuss shortly, I am interested in the following class of transition kernel. πt+1 (.|., .) has a K-mixture representation if and only if

0

πt+1 (θ |θ, e) =

K X

pt+1,k (θ0 )ωt+1,k (θ, e)

k=1

with pt+1,k (.) ∈ ∆(Θt+1 ) and ωt+1,. (θ, e) being a nonnegative weight over {1, 2, ..., K} for all (θ, e) ∈ Θt × Et . Why am I interested in this class? In the environment considered here, agent’s type is persistent and depends on a hidden action. When the type takes on many values, it is well known that the recursive formulation of the planning problem suffers from the ”curse of dimensionality” problem, with the dimension of the state space being equal to the cardinality of the type space. Fukushima and Waki [2010] show that when the transition kernel has a K-mixture representation, this dimension can be reduced to K. Although this requires the approximation of the transition kernel, it has an advantage relative to the first-order approach (e.g. Kapicka [2010]) that requires one to conjecture correctly what the relevant incentive constraints are, before solving the model. Identifying the relevant constraints is particularly hard in this context because the agent has two levers to manipulate (report and action). 8

Details of approximation method is as follows. Let x be the one dimensional, real random variable to be approximated, x the unconditional mean, and σx the unconditional standard deviation. For a prefixed cardinality n of discrete set and a positive real number λ, x1 = x − λσx and xn = x + λσx . A discrete set X d is constructed as an equally spaced grid of size n between x1 nd xn . Let dx = xi+1 − xi . Approximate a continuous (possibly conditional) CDF Fx with a discrete distribution p over X, where p1 = F (x1 + dx ), pn = 1 − F (xn − dx ), and pi = F (xi + dx ) − F (xi − dx ). 2 2 2 2

14

Approximation is done as follows. For a given K and for each t, I solve

min πt

X

|πt∗ (θ0 |θ, e) − πt (θ0 |θ, e)|2

θ0 ,θ,e

subject to the restriction that πt has a K-mixture representation. As one increase K, the accuracy of this approximation clearly improves. This, however, comes at a price. With a larger K, one needs to use a larger grid on the K-dimensional state space to approximate the value function very well. Because I’m using a lottery to make the problem convex and its size is linear in the grid size on the state space, setting K to some high value significantly increases the burden for a value function iteration. For this reason I set K = 2.

4.3

Exogenous Skill Transition

I calibrate an exogenous skill model so that the skill transition mimics that in the endogenous skill model under the calibrated U.S. tax system. I simulate the model of endogenous skill with 2-mixture representation under the U.S. tax system, and calculate the frequency of transitions from θt to θt+1 , for all t, θt , θt+1 . I use this frequency as πtex (θt+1 |θt ).9 I further approximate πtex with 2-mixture representation, minimizing the sum of squared errors. Elementwise approximation errors are less than 1%. This means that under the calibrated U.S. tax system the endogenous and exogenous skill models thus calibrated show almost the same skill transitions. Labor disutility parameter ν is adjusted so that the average hours equals 0.3 in a market equilibrium.

5

Quantitative Exercise: Results

5.1 5.1.1

Properties of An Optimal Allocation Who Is Investing In The Human Capital?

One way to understand who invests more in human capital is to look at correlation between skills and human capital investment. In period 0, this correlation is 0.71. This means that when skills are 9 Note that this procedure assumes the productivity is correctly measured. An alternative calibration strategy is to define the ”measured” productivity as y/(l + e) and calculate its transition probabilities.

15

endogenous the productive agents spend more time on human capital investment and produce less than they do in the exogenous skill model. A related question is, who is earning more in period 0? Table 1 shows how skills are correlated with labor income in period 0. Although their correlation is positive in both specifications, it is significantly weaker when skills are endogenous. When skills are endogenous the high skill people use their time so intensively in human capital investment in period 0 that they don’t use much time in production. This property creates a difference in history-dependence of optimal tax system, as shown later in Section 5.2.2. t=0

corr(θ, y) 0.17

corr(θex , y ex ) 0.71

Table 1: Correlation Structure of Labor Income and Skill

5.1.2

Total Lifetime Consumption and Income

Next I look at the inequalities of lifetime consumption and income. For any history of shocks θT , the P total lifetime consumption in period-0 present value is Tt=0 R−t ct (θt ) and the total lifetime income P in period-0 present value Tt=0 R−t yt (θt ). I measure inequalities of these variables by the variances of log of these. Quantitative studies in the NDPF literature have find that, in the exogenous skill models, an optimal tax reform results in a shift of hours worked from the unproductive to the productive. This creates high labor income inequality in an optimal allocation, as compared to a market economy allocation with progressive tax system. The work incentive is provided by consumption that is positively correlated with labor income. When skills are endogenous the shift of hours worked is expected to be much more modest. The incentive for the human capital investment needs to be provided either through consumption that correlates with skills more strongly than in the exogenous skill model, or through lower production, or both. The variance ratio of lifetime labor income shows this is indeed the case. The variance of optimal lifetime labor income in endogenous skill model divided by that in exogenous skill model is around 0.61, and the difference in variances is around 0.07. Due to incentive problems, it is more costly to achieve large income inequality when skills are endogenous. There is another force behind a lower income inequality in an endogenous skill model. It is the assumption that people need to forego their 16

labor income when they invest in human capital. This reduces top income earners’ labor income, thereby reducing the inequality. The variance of consumption is a measure of insurance. Numerical exercise tells this measure is higher in the endogenous skill model: the variance ratio is 1.6, and the variance differential is 0.04. This is consistent with the incentive provision story - the incentive for human capital investment is provided through consumption that is more strongly correlated with skill. How the lifetime consumption and labor income are correlated with each other has some implications on an optimal tax system and thus is deferred to the next subsection.

5.2

Properties of Optimal Tax System

Next I turn to the policy recommendation in terms of taxes.

5.2.1

Progressivity of Lifetime Tax Rate

I look at how much resources in present value are taken away from an individual as a function of his lifetime labor income. For a particular history of shocks θT , the lifetime taxes paid are PT −t t t t=0 R {yt (θ )−ct (θ )}. I calculate the lifetime tax rate by dividing the lifetime taxes by the lifetime P P income Tt=0 R−t yt (θt ). The way this tax rate is related with the total lifetime income Tt=0 R−t yt (θt ) serves as a measure of progressivity of taxes. I measure the progressivity of taxes by regressing the lifetime tax rate on the log of lifetime income and a constant. The slope coefficient represents the elasticity of the tax rate to the lifetime income, and is lower when skills are endogenous (0.35) than it is when skills are exogenous (0.42). This means the total lifetime tax rate as a function of total lifetime labor income is less upward-sloping in the endogenous skill model, and hence is less progressive. What underlies this result is the incentive provision for human capital investment. When skills are endogenous, people who earn large lifetime income in an optimal allocation tend to be those who engaged a lot in human capital investment in the early stage of their life cycle and become productive later. If high lifetime income earners are taxed with a high rate, it discourages them from investing in human capital and becoming more productive. While under the exogenous skill assumption the government doesn’t need to care about such a disincentive and is able to smooth consumption across different realization of skills by a more progressive tax rate.

17

5.2.2

History-Dependence of Lifetime Tax Rate

Next I examine how the lifetime tax rate responds to the initial (period-0) labor income. This provides us a piece of information as to how the government should use the information about the initial labor income in designing taxes. I regress the lifetime tax rate on the log of initial labor income and a constant. I found that the slope coefficient is negative when skills are endogenous, while it is positive when skills are exogenous. This is related to the property that high lifetime income earners in the endogenous skill model tend to be those who are initially productive and spend more time in the human capital investment activity. This creates a negative correlation between the initial labor income and the lifetime income. While in the exogenous skill model, the optimality calls for a shift of work hours from the unproductive to the productive, regardless of the age, and hence the initial income is positively correlated with the lifetime labor income. This difference produces opposite responses of tax rates to initial labor incomes.

6

Conclusion

This paper considers optimal history-dependent income taxation in a model where people’s skills are endogenously formed, instead of being determined by an exogenous stochastic process. The main contributions of the present paper are two-folds. First I show that an optimal taxation problem with history-dependent income taxes in an endogenous skill model is equivalent to a dynamic mechanism design problem where people’s skills and human capital investment are private information. This result clarifies two new tradeoffs that the endogeneity of skills adds. When skills are exogenous the key tradeoff lies between work incentive and insurance, while when skills are endogenous there is also one tradeoff between insurance and the human capital investment incentive, and the other between future skill and current labor income. Second I quantify the effects of these tradeoffs on the efficient risk-sharing and examine the robustness of policy implications to an empirically plausible, alternative specification of skill process. I contrast the policy implications derived under the endogenous skill assumption with those derived under the exogenous skill assumption. Two sets of parameters are calibrated, assuming that skills are endogenous in one and exogenous in the other. I calibrate parameters so that the market economy can reproduce selected facts about individual wages in the U.S. data. Using the approach proposed

18

by Fukushima and Waki [2010], I numerically solve for an optimal allocation without imposing other artificial constraints to the optimal taxation problem. The assumption of endogenous skills alters some properties of efficient risk-sharing. Due to an added tradeoff between current labor income (production) and future high skills (human capital investment), the most productive type is not necessarily the one who earns the most in a given period. In fact, in the initial period all types but the least productive one engage in initial human capital investment to achieve on-average high skills in the future. As a result, the least productive type is the one who produces the most in the initial period. This doesn’t happen in the exogenous skill model where the optimality generally calls for a shift of work hours from the unproductive to the productive in any period. Although productive types don’t produce much in the initial period, they derive higher disutility from human capital investment and have to be rewarded by high consumption. This results in a low correlation between consumption and labor income in the initial period. Another added tradeoff between the human capital investment incentive and insurance also alters the risk-sharing implications. In contrast to the exogenous skill case where an optimal risk-sharing implies a large labor income inequality and a relatively low consumption inequality, an optimal risk sharing in the endogenous skill case calls for a smaller labor income inequality and a higher relative consumption inequality. When the planner wants to provide an incentive for high human capital investment, she needs to reward the agent when his skill becomes high; This is done through a higher consumption and a lower labor income for a high skill agent. To compare the optimal tax systems, I look at a measure of redistribution: the total lifetime taxes as a function of the total lifetime labor income. Regressing the former on the latter, I found that the intercept term is higher and the coefficient on the lifetime income is lower in the endogenous skill model, than those found in the exogenous skill model. This is naturally understood from the planner’s vantage point: if the taxes are made more responsive to the labor income, it provides more insurance but at the same time produces disincentive to become more productive. So far I have focused mainly on the implications about the lifetime income, lifetime consumption, and lifetime tax, instead of the life cycle properties. This is because life cycle patterns of an optimal allocation and taxes are found somewhat sensitive to the choice of grid size and grid points. It is absolutely an interesting task to look more closely at the life cycle patterns, but for now it is left as a future exercise.

19

References L. Ales and P. Maziero. Accounting for private information. 2009. Working Paper. 3 J. C. Conesa and D. Krueger. On the optimal progressivity of the income tax code. Journal of Monetary Economics, 53(7):1425–1450, 2006. 1 J. C. Conesa, S. Kitao, and D. Krueger. Taxing capital? not a bad idea after all! American Economic Review, 99(1):25–48, 2009. 1 M. Doepke and R. M. Townsend. Dynamic mechanism design with hidden income and hidden actions. Journal of Economic Theory, 126(1):235–285, 2006. 29 J. Eaton and H. Rosen. Taxation, human capital, and unvertainty. American Economic Review, 70 (4):705–715, 1980. 2 K. Fukushima. Quantifying the welfare gains from flexible dynamic income tax systems. Working Paper, 2010. 1, 3 K. Fukushima and Y. Waki. Computing dynamic optimal mechanisms when private shocks are persistent. Working Paper, 2010. 1, 3, 4, 9, 14, 18 M. Golosov and A. Tsyvinski. Designing optimal disability insurance: A case for asset testing. Journal of Political Economy, 2006. 3 M. Golosov, A. Tsyvinski, and I. Werning. New dynamic public finance: A user’s guide. NBER Macroeconomics Annual, 2006. 1, 12 M. Golosov, A. Tsyvinski, and M. Troshkin. Optimal dynamic taxes. 2010. Working Paper. 3 B. Grochulski and T. Piskorski. Risky human capital and deferred capital income taxation. Journal of Economic Theory, 145(3):908–943, 2010. 6, 25 F. Guvenen, B. Kuruscu, and S. Ozkan. Taxation of human capital and wage inequality: A crosscountry analysis. 2010. Working Paper. 6, 12 J. J. Heckman, L. Lochner, and C. Taber. Tax policy and human-capital formation. American Economic Review Papers and Proceedings, 88(2):293–297, 1998a. 6 20

J. J. Heckman, L. Lochner, and C. Taber. Explaining rising wage inequality: Explorations with a dynamic general equilibrium model of labor earnings with heterogeneous agents. Review of Economic Dynamics, 1(1):1–58, 1998b. 6 J. J. Heckman, L. Lochner, and C. Taber. Human capital formation and general equilibrium treatment effects: A study of tax and tuition policy. Fiscal Studies, 20(1):25–40, 1999. 6 M. Huggett and J. C. Parra. How well does the u.s. social insurance system provide social insurance? Journal of Political Economy, 118(1):76–112, 2010. 3 M. Huggett, G. Ventura, and A. Yaron. Sources of lifetime inequality. mimeo, 2009. 6, 11, 12 M. Kapicka. Optimal income taxation with human capital accumulation and limited record keeping. Review of Economic Dynamics, 9:612–639, 2006. 3, 6, 8, 9 M. Kapicka. The dynamics of optimal taxation when human capital is endogenous. 2009. Working Paper. 3, 4, 6, 8, 9 M. Kapicka. Efficient allocations in dynamic private information economies with persistent shocks: A first-order approach. 2010. Working Paper. 4, 14 S. Kitao. Labor-dependent capital income taxation. Journal of Monetary Economics, 57(8):959–974, 2010. 1 N. R. Kocherlakota. Zero expected wealth taxes: A Mirrlees approach to dynamic optimal taxation. Economatrica, 57(5):1587–1621, 2005. 1, 2, 7, 10, 22, 23, 24, 25 N. R. Kocherlakota. Advances in dyanmic optimal taxation. Advances in Economics and Econometrics: Theory and Application, Ninth World Congress, I:269–299, 2010a. 1 N. R. Kocherlakota. The New Dynamic Public Finance. 2010b. Princeton University Press. 1 R. B. Myerson. Optimal coordination mechanisms in generalized principal-agent problems. Journal of Mathematical Economics, 10:67–81, 1982. 8 C. Slavik and K. Wiseman. Tough love for lazy kids. Working Paper, 2009. 3 G. Tauchen. Finite state markov-chain approximations to univariate and vector autoregressions. Economic Letters, 20:177–181, 1986. 14 21

M. Weinzierl. The surprizing power of age-dependent taxes. Working Paper, 2009. 3

22

7

Appendix

7.1

Tax Implementation a la Kocherlakota (2005)

In this section I derive a tax implementation result, the inverse Euler equation, and the zero expected wealth tax. The environment here generalizes the shock structure as well as the information friction in Kocherlakota [2005]. The endogenous skill and the hidden action require that the human capital investment be induced by appropriately designing taxes and markets. As is mentioned in Kocherlakota [2005], one can implement the optimal allocation by designing taxes in exactly the same way as in his paper, with a labor income tax and a linear wealth tax both of which depend on the whole history of labor incomes. The mechanism at work is quite simple. In a market economy, individuals are allowed to borrow and save. The crux of his tax system is that the wealth tax is designed so that the ex-post Euler equation holds for each realization of tomorrow’s shock, i.e. ∗∗ u0 (c∗t (θt )) = βR{1 − τt+1 (θt , θt+1 )}u0 (c∗t+1 (θt , θt+1 )),

∀(θt , θt+1 ) ∈ Θt+1 ,

to prevent an individual from ”double deviation” - saving more/less today and misreporting tomorrow. In the environment here, the agent has another lever to manipulate: human capital investment. This lever, however, merely changes the way of weighing tomorrow’s tax-adjusted marginal utilities, which are constant across realizations. Hence such a manipulation is not beneficial when the above equations hold. In the environment with exogenous skills, individual’s optimization problem with those taxes subject to the budget constraint is essentially equivalent to choosing the optimal reporting strategy under the optimal mechanism. Incentive compatibility of the optimal allocation guarantees that the truthtelling is optimal. The same logic applies here - individual’s optimization problem subject to the budget constraint is shown to be equivalent to choosing the optimal reporting and investment strategies under the optimal mechanism, and the incentive compatibility implies the truth-telling and obedience is optimal.

23

7.1.1

Market Equilibrium Given Taxes

Labor income tax τtl : Y t → R is allowed to be nonlinear, which capital income tax is restricted to be linear and the marginal tax rate is denoted by τtk : Y t → R. Let b = {bt+1 }Tt=0 be a sequence of functions such that for all t, bt+1 : Θt → R and B be the set of all such b’s. Individual’s problem: Taking (τ k , τ l ) as given, T X X

max (x,e,b)∈X×E×B

o n yt (θt ) , et (θt )) Pr(θt |e) β t u(ct (θt )) − v( θt t

t=0 θt ∈Θ

subject to

ct (θt ) + bt+1 (θt ) ≤ {1 − τtk (y t (θt ))}Rbt (θt−1 ) + yt (θt ) − τtl (y t (θt )),

∀t, θt ,

b0 (θ−1 ) = 0, bt+1 (θt ) ≥ 0, bT +1 (θT ) = 0.

Labor income choice y is said to be budget-feasible if and only if there exists (c, b) with which the above three constraints are satisfied. Note that the initial wealth b0 is the same across individuals. An equilibrium is (x, e, b) such that (1) (x, e, b) solves an agent’s problem given (τ k , τ l ), and (2) markets clear: for all t, X

ct (θt )Pr(θt |e) + Bt+1 = RBt + Yt −

θt ∈Θt

X

τtl (y t (θt ))Pr(θt |e),

θt ∈Θt

where Bt =

X

bt (θt−1 )Pr(θt−1 |e)

X

yt (θt )Pr(θt |e).

θt ∈Θt

θt−1 ∈Θt−1

7.1.2

and Yt =

Designing Taxes

Let (x∗ , e∗ ) be an optimal allocation. I construct a tax system (τ k , τ l ) so that the optimal allocation (x∗ , e∗ ) together with some b∗ ∈ B constitutes an equilibrium under (τ k , τ l ). Following Kocherlakota [2005], I denote DOMj = {y t ∈ Y t |∃θt ∈ Θt , y t = y ∗t (θt )}, where y ∗t (θt ) = {y0∗ (θ0 ), ..., yt∗ (θt )}, and

24

assume the following: Assumption 1 There exists ˆ c∗ = {ˆ c∗t }Tt=0 , such that for all t, cˆ∗t : DOMt → C and c∗t (θt ) = cˆ∗t (y ∗t (θt )),

∀θt ∈ Θt ,

where y ∗t (θt ) = {y0∗ (θ0 ), ..., yt∗ (θt )}. This means that the optimal consumption depends on the history of reports only through the history of optimal labor incomes. k∗ : Rt+1 → R by I construct capital income taxes as follows: Define τ0k = 0, and for all j, define τt+1 +

k,∗ t+1 (y ) τt+1

=

1−

u0 (ˆ c∗t (y t )) βRu0 (ˆ c∗t+1 (y t+1 ))

1

if y t+1 ∈ DOMt+1 , otherwise

Then I define (τ l∗ , ˆb∗ ) recursively as follows: for all t and y t ∈ DOMt , cˆt (y t ) + ˆb∗t+1 (y t ) = (1 − τtk (y t ))Rˆb∗t (y t−1 ) + yt − τtl (y t ), X

ˆb∗ (y ∗t (θt ))Pr(θt |e∗ ) = B ∗ , t+1 t+1

θt ∈Θt

and for y t ∈ / DOMt , τtl∗ (y t ) = 1 + yt , with the initial condition ˆb∗0 (y −1 ) = 0. Let b∗t+1 (θt ) = ˆb∗t+1 (y t (θt )) for all t, θt ∈ Θt . I now show that (x∗ , e∗ , b∗ ) is an equilibrium given (τ k∗ , τ l∗ ). The following lemma, which is a generalization of Proposition 2 in Kocherlakota [2005], shows that for any budget-feasible choice of y0 and any investment strategy e0 , it is optimal for the agent to choose consumption and asset level which are ”consistent” with y0 . Lemma 1 Given that the agent chooses a budget-feasible y0 and an investment strategy e0 , then given tax system (τ k∗ , τ l∗ ), an optimal choice of (c, b) for the agent is (c0 , b0 ) such that for all t and θt ∈ ΘtDOM , c0t (θt ) = cˆ∗t (y t (θt ))

25

and b0t+1 (θt ) = ˆb∗t+1 (y t (θt )). Proof Labor income choice y0 is budget-feasible by assumption. This implies that for all t and θt , y 0t (θt ) ∈ DOMt , although the agent here is allowed to borrow unlike Kocherlakota [2005]. Suppose to the contrary that y 0t (θt ) ∈ / DOMt for some θt . Then after such a history θt , his wealth is taken away whenever it’s positive, his labor income taxes are strictly higher than his earnings from that period on, and he can’t die in debt (bT +1 ≥ 0). This is not budget-feasible. Next I argue that, given y0 , the first order conditions are satisfied at (c0 , b0 ) for all t and θt . The flow budget constraint is clearly satisfied with equalities by construction. By construction of taxes, for any y t+1 ∈ DOMt+1 , ∗ u0 (ˆ c∗t (y t )) = βR{1 − τt+1 (y 0t+1 (θt+1 ))}u0 (ˆ c∗t+1 (y t+1 )),

which clearly implies the Euler equation: ∗ u0 (c0t (θt )) = βRE[{1 − τt+1 (y 0t+1 (θt+1 ))}u0 (c0t+1 (θt , θt+1 ))|θt , et (θt )].

It follows that (c0 , b0 ) is an optimal choice given y0 . Q.E.D. With this lemma, I can prove the second welfare theorem: Proposition 1 (Tax Implementation) The triple (x∗ , e∗ , b∗ ) constitutes an equilibrium given the tax system (τ k∗ , τ l∗ ). (Since I started with a symmetric initial condition, there is no need for redistribution in this case.) This follows from the fact that given lemma 1, choosing (y0 , e0 ) is equivalent to choosing the report and investment strategies (r, e0 ) given an optimal mechanism x∗ , and the truth-telling and obedience is optimal by incentive compatibility. This result may sound contradicting the finding of Grochulski and Piskorski [2010] that the deferred capital income tax is necessary for the implementation. This difference arises their assumption that the human capital investment is private information and requires consumption goods. This makes consumption and the intertemporal marginal rates of substitution private information. In my envi-

26

ronment, the agent’s utility is separable in consumption and unobservables (human capital investment and labor), and the intertemporal marginal rates of substitution is public information. Some other results are obtained: Together with the inverse Euler equation, the expected wealth tax (or the net revenue from wealth tax) is shown to be zero. Proposition 2 (Inverse Euler equation) Let (x∗ , e∗ ) be an optimal allocation. Then for all t = 0, 1, ..., T − 1 and θt ∈ Θt such that c∗t (θt ) and c∗t+1 (θt , θt+1 ) are in the interior of C for all θt+1 with (θt , θt+1 ) ∈ Θt+1 , we have i 1 1 q h t t∗ t θ , e (θ ) . E = u0 (c∗t (θt )) β u0 (c∗t+1 (θt , θt+1 )) Corollary 1 (Zero Expected Wealth Tax) For all t and θt which satisfy the condition in Proposition 2, ∗∗ E[τt+1 (θt+1 )|θt , et∗ (θt )] = 0.

7.2

Recursive formulation

Note that once θt realizes, an agent’s preference over continuation allocations doesn’t depend on either past shocks θt−1 or choice of et−1 . Moreover, any θt can occur with positive probability, regardless of the actual choice of et−1 . This allows us to prove the principle of optimality of the following form. Lemma 2 (Principle of Optimality) An allocation (x, e) is incentive compatible if and only if for all t, θt−1 , θt , h yt | t−1 (θ0 ) 0 ,e (et (θt ), θt ) ∈ arg max u ct |θt−1 (θ0 ), θ θt (e0 ,θ0 ) +β

T X X

s β s−t−1 u cs |(θt−1 ,θ0 ) (θt+1 ),

s s=t+1 θt+1

s ) ys |(θt−1 ,θ0 ) (θt+1 s , es |(θt−1 ,θ0 ) (θt+1 ) θs

i s ×Pr(θt+1 |θt , e0 , e|(θt−1 ,θ0 ) ) .

Proof Only-if part is obvious because of the full-support assumption. If part obtains by backward induction. Q.E.D. I assume that for all t, πt+1 (θt+1 |θt , et ) has a K-mixture representation.

27

+1 Define {At }Tt=1 recursively as follows:

AT +1 = {0} ⊂ RK At = B(At+1 ) ∃(c, y, e, a+ )s.t. i h y(θ0 ) 0 + (θ 0 ) · ω(θ, e0 ) , 0 0 ,θ 0 ) u(c(θ )) − v( , e ) + βa (e(θ), θ) ∈ arg max (e θ = a ∈ RK : P y(θ) + a = θ {u(c(θ)) − v( θ , e(θ)) + βa (θ) · ω(θ, e(θ))}p(θ), (e(θ), a+ (θ)) ∈ Et × At+1 , ∀θ,

∀θ

for t = 1, ..., T . Lastly,

A0 = B0 (A1 ) ∃(c, y, e, a+ )s.t. h i 0 (h(θ), θ) ∈ arg max(e0 ,θ0 ) u(c(θ0 )) − v( y(θθ ) , e0 ) + βa+ (θ0 ) · ω(θ, e0 ) , = U0 ∈ R : P + U0 = θ {u(c(θ)) − v( y(θ) θ , e(θ)) + βa (θ) · ω(θ, e(θ))}π0 (θ), (e(θ), a+ (θ)) ∈ E0 × A1 , ∀θ,

∀θ

.

Lemma 3 An allocation (x, e) satisfies the constraints of the planner’s problem given U0 if and +1 , at : Θt → At , such that ((x, e), a) satisfies the temporal incentive only if there exists a = {at }Tt=1

compatibility constraints: for all t and θt , i h yt (θt−1 , θ0 ) 0 , e + βat+1 (θt−1 , θ0 ) · ω(θt , e0 ) , (et (θt ), θt ) ∈ arg max u ct (θt−1 , θ0 ), θt (e0 ,θ0 ) and the temporal promise-keeping constraints: for all j ≥ 1 and θj−1 ,

at (θt−1 ) =

X yt (θt ) {u ct (θt ), , et (θt ) + βat+1 (θt ) · ω(θt , e(θt ))}p(θ), θt θt

U0

X y0 (θ0 ) ≤ {u c0 (θ0 ), , e0 (θ0 ) + βa1 (θ0 ) · ω(θ0 , e(θ0 ))}π(θ0 ). θ0 θ0

Proof First I prove the only-if part. Fix (x, e) in the constraint set of the planner’s problem. For

28

all j and θj−1 , let

at (θt−1 ) =

Xn yt (θt ) u ct (θt ), , et (θt ) θt θt

+β

T X X

s β s−t−1 u cs (θt , θt+1 ),

s s=t+1 θt+1

o s ) ys (θt , θt+1 s s , es (θt , θt+1 ) Pr(θt+1 |θt , e|θt ) p(θt ) θt

The temporal promise-keeping constraints are clearly satisfied. Lemma 2 implies the temporal incentive compatibility constraints. at (θt−1 ) ∈ At for all t and θt−1 is shown by backward induction. Now I go on to prove the if part. Suppose ((x, e), a) satisfies both the temporal incentive compatibility and promise-keeping constraints. I want to show x is in the constraint set of the auxiliary planner’s problem. By iteratively substituting the temporal promise-keeping constraints into the one with t = 0, (x, e) is shown to satisfy the promise-keeping constraint. For all t, θt−1 , and θt , by iteratively substituting the temporal promise-keeping constraints into the temporal incentive compatibility constraint after θt , one can obtain the condition in Lemma 2. This proves (x, e) is incentive compatible. Taken together, (x, e) is in the constraint set of the planner’s problem. Q.E.D. Finally, I define value functions {Jt } as follows:

JT +1 (0, θ− , e− ) = 0.

For t = 1, ..., T , for (θ− , e− , a) ∈ Θt−1 × Et−1 × At ,

Jt (a, θ− , e− ) = min

c,y,e,a+

Xn o c(θ) − y(θ) + qJt+1 a+ (θ), θ, e(θ) π(θ|θ− , e− ) θ∈Θt

subject to i h y(θ0 ) 0 , e ) + βa+ (θ0 ) · ω(θ, e0 ) , ∀θ (e(θ), θ) ∈ arg max u(c(θ0 )) − v( θ (e0 ,θ0 ) X y(θ) a = {u(c(θ)) − v( , e(θ)) + βa+ (θ) · ω(θ, e(θ))}p(θ), θ θ

(e(θ), a+ (θ)) ∈ Et × At+1 ,

∀θ.

29

Lastly, Xn o c(θ) − y(θ) + qJ1 a+ (θ), θ, e(θ) π0 (θ)

J0 (U0 ) = min

c,y,e,a+

θ∈Θ0

subject to i h y(θ0 ) 0 (e(θ), θ) ∈ arg max u(c(θ0 )) − v( , e ) + βa+ (θ0 ) · ω(θ, e0 ) , ∀θ θ (e0 ,θ0 ) X y(θ) , e(θ)) + βa+ (θ) · ω(θ, e(θ))}π0 (θ), U0 ≤ {u(c(θ)) − v( θ θ

+

(e(θ), a (θ)) ∈ E0 × A1 ,

∀θ.

Then the following Proposition immediately follows. Proposition 3 This recursive problem is equivalent to the planner’s problem.

7.2.1

Covexifying the Problem

The above formulation is practically hard to implement on a computer; These sets At ’s are not necessarily convex. I obtain the convexity by changing variables (the planner chooses u(c) instead of c), discretizing Yt and At ’s, and then introducing lotteries over (e, y, a+ ). I also introduce auxiliary choice variables, following Doepke and Townsend [2006].10 Timing within a period is as follows. First the agent reports his type θ. The planner assigns to an agent with report θ a pair of consumption c(θ) and a lottery γ(.|θ) over (y, e, a+ ). Then a recommended human capital investment e is drawn from this lottery. Both the agent and the planner observe this. Next the agent chooses e0 . Finally the labor income y and the continuation utility a+ is drawn from γ(.|θ, e). I introduce auxiliary choice variables W BD (θ, θ0 , e) for all (θ, θ0 , e), which define the upper-bound of continuation utility for type θ given a report θ0 and recommendation e. Doepke and Townsend [2006] call them ”off-path utility bounds.” To simplify the notation, let δ(θ, y, e, a+ ) = −v( yθ , e) + βa+ · ω(θ, e). Let A be a finite set. Define B(A) as a set of a ∈ RK such that there exists (u, γ, W BD ) and the following conditions are satisfied: 10 The problem in this paper is different from theirs, in that this paper allows the outcome of hidden action (income in their model and human capital level in mine) to affect the probability distribution of the future outcome.

30

(1) Promise-keeping constraint: for all k

ak =

X

X

u(θ) +

γ(y, a+ , e|θ)δ(θ, y, e, a+ ) pk (θ),

(y,a+ ,e)∈Y ×A×E

θ

(2) Obedience under truth-telling: for all θ, e, e0 6= e X

γ(y, a+ , e|θ)δ(θ, y, e, a+ ) ≥

X

γ(y, a+ , e|θ)δ(θ, y, e0 , a+ ),

y,a+

y,a+

(3) Off-path utility bound: for all θ, θ0 6= θ, e, e0 , X

W BD (θ, θ0 , e) ≥

γ(y, a+ , e|θ0 )δ(θ, y, e0 , a+ ),

y,a+

(4) Lying is not beneficial: for all θ, θ0 X

u(θ) +

γ(y, a+ , e|θ)δ(θ, y, e, a+ ) ≥ u(θ0 ) +

X

W BD (θ, θ0 , e),

e

(y,a+ ,e)∈Y ×A×E

(5) Probability restriction: X

γ(y, a+ , e|θ) = 1

(y,a+ ,e)∈Y ×A×E

for all θ and 0 ≤ γ(y, a+ , e|θ) ≤ 1 for all θ, y, a+ , e; and (6) feasibility: u(θ) ∈ U for all θ. I construct At ’s so that co(At ) ⊂ B(At+1 ) for all t. I also discretize the consumption set, using the grid size of 300. By introducing a lottery over that set, the problem becomes a linear programming.

8

Data Appendix

This Data Appendix describes the sample selection and variable construction. 31

I use the Panel Study of Income Dynamics from 1968 to 2007. I use core sample only, and drop all SEO samples. I use male head. Variables used are labor income and total annual hours of the head. Labor income includes the labor part of business income. When either of these two variables is missing in some year, that observation is dropped. I require the head to be observed at least for 10 years between ages 20-60. Real labor income is obtained using CPI. I focus on household heads with ages 25-59. Since one model period is taken to be approximately 7 years, I construct 7 year age bins (25-31, 32-38, 39-45, 46-52, 53-59). For each age bin, an individual is counted in if he is observed at least for three years in that age bin. When constructing the moments, I control for the cohort effect. For statistics that varies with age (e.g. mean/variance of wage in a particular ages), I calculate a statistic statyb,ia for each birth year yb and age bin ia; For statistics that don’t (e.g. correlation of log wia=0 and d log wia=1 ), I construct a statistic statyb for each birth year yb. Then I take average of statistics across birth years, excluding (yb, ia)-bins that contain less than 20 individuals.

32