Multiple Treatments with Strategic Interaction⇤ Sukjin Han Department of Economics University of Texas at Austin [email protected] First Draft: February 23, 2016 This Draft: April 25, 2017

Abstract We develop an empirical framework in which we identify and estimate the e↵ects of treatments on outcomes of interest when the treatments are results of strategic interaction (e.g., bargaining, oligopolistic entry, decisions in the presence of peer e↵ects). We consider a model where agents play a discrete game with complete information whose equilibrium actions (i.e., binary treatments) determine a post-game outcome in a nonseparable model with endogeneity. Due to the simultaneity in the first stage, the model as a whole is incomplete and the selection process fails to exhibit the conventional monotonicity. Without imposing parametric restrictions or large support assumptions, this poses challenges in recovering treatment parameters. To address these challenges, we first analytically characterize regions that predict equilibria in the first-stage game with possibly more than two players, whereby we find a certain monotonic pattern of these regions. Based on this finding, we derive bounds on the average treatment e↵ects (ATE’s) under nonparametric shape restrictions and the existence of excluded variables. We also introduce and point identify a multi-treatment version of local average treatment e↵ects (LATE’s). JEL Numbers: C14, C35, C57 Keywords: Multiple treatments, strategic interaction, endogeneity, heterogeneous treatment e↵ects, average treatment e↵ects, local average treatment e↵ects.

1

Introduction

We develop an empirical framework in which we identify and estimate the heterogeneous e↵ects of treatments on outcomes of interest where the treatments are results of strategic interaction (e.g., bargaining, oligopolistic entry, decisions in the presence of peer e↵ects or strategic e↵ects). Treatments are determined as an equilibrium of a game and these strategic ´ The author is grateful to Tim Armstrong, Steve Berry, Jorge Balat, Aureo de Paula, Phil Haile, Karam Kang, Juhyun Kim, Yuichi Kitamura, Konrad Menzel, Francesca Molinari, Azeem Shaikh, Jesse Shapiro, Dean Spears, Ed Vytlacil, Haiqing Xu, and participants in the 2016 Texas Econometrics Camp, the 2016 North American Summer Meeting of the Econometric Society, Interactions Conference 2016 at Northwestern University, and seminars at Yale, Brown, UBC, and UNC for helpful comments and discussions. ⇤

1

decisions of players endogenously a↵ect common or player-specific outcomes. For example, one may be interested in the e↵ects of newspaper entry on local political behaviors, the e↵ects of entry of carbon-emitting companies on local air pollution and health outcomes, the e↵ects of the presence of potential entrants in nearby markets on pricing or investment decisions of incumbents, the e↵ects of large supermarkets’ exit decisions on local health outcomes, and the e↵ects of provision of limited resources where individuals make participation decisions under peer e↵ects as well as based on their own gains from the treatment. As reflected in some of these examples, our framework allows us to study externalities of strategic decisions, such as societal outcomes resulting from firm behaviors. Ignoring strategic interaction in treatment selection processes may lead to biased, or at least less informative, conclusions about the e↵ects of interest. We consider a model where agents play a discrete game with complete information, whose equilibrium actions (i.e., a profile of binary endogenous treatments) determine a post-game outcome in a nonseparable model with endogeneity. We are interested in various treatment parameters in this model. In recovering these parameters, the setting of this paper poses several challenges. First, the first-stage game posits a structure in which binary dependent variables are simultaneously determined, thereby making the model as a whole incomplete. Second, due to this simultaneity, the selection process does not exhibit the conventional monotonic property ´ a la Imbens and Angrist (1994). Furthermore, we make no assumptions on the joint distributions of the unobservables nor parametric restrictions on the payo↵ function of each player and on how treatments a↵ect the outcome. In nonparametric models with multiplicity or/and endogeneity, identification may be achieved with excluded instruments of large support. Even though such a requirement is met in practice, estimation and inference can still be problematic (Khan and Tamer (2010), Andrews and Schafgans (1998)). We thus allow instruments and other exogenous variables to be discrete and have small supports. As a crucial first step to address these challenges, we analytically characterize regions that predict equilibria in the first-stage game. Under symmetry and strategic substitutability restrictions on the payo↵ functions, we fully characterize the geometric properties of the regions in the space of unobservables, which describe the properties of equilibria in the game. More importantly, we show that these regions exhibit a monotonic pattern in terms of the number of players who choose to take the action—e.g., the number of entrants in an entry game. Complete analytical characterization of the equilibrium regions has not been studied in the literature for the case of more than two players.1 Having characterized the equilibrium regions, we show how the model structure and the data can be informative about treatment parameters, such as the average treatment e↵ects (ATE’s) and the local ATE (LATE’s). We first establish the bounds on the ATE and other related parameters with possibly discrete instruments of small support. We also show that tighter bounds on the ATE can be obtained by introducing (possibly discrete) exogenous variables excluded from the first-stage game. This is especially motivated in the context of externalities mentioned above. When instruments have a joint support that is rectangular, we can derive sharp bounds as long as the outcome variable is binary. Further, with continuous 1

To estimate payo↵ parameters, Berry (1992) partly characterizes equilibrium regions. To calculate the bounds on these parameters, Ciliberto and Tamer (2009) simulate their moment inequalities model that are implied by the shape of these regions, especially the regions for multiple equilibria. While their approaches are enough for the purpose of their analyses, full analytical results are critical for the identification analysis of the current paper.

2

instruments of large supports, we show that multiplicity and endogeneity become irrelevant and the ATE is point identified. To derive informative bounds, we impose nonparametric shape restrictions on the outcome function, such as conditional symmetry and monotonicity. The symmetry assumption can be relaxed either when strategic interaction occurs only within subgroups of players, thus allowing for partial symmetry, or when the first-stage equilibrium selection is stable with the change of instruments. The latter is trivially guaranteed when instruments vary enough to o↵set the e↵ect of strategic substitutability. We also introduce and point identify a multi-treatment version of the LATE. The simultaneity in the selection process does not permit the usual equivalence result by Vytlacil (2002) between the specification of a threshold-crossing selection rule and Imbens and Angrist (1994)’s monotonicity assumption. A monotonic pattern found in the equilibrium regions, however, enables us to recover a version of the LATE for a treatment of “dichotomous states.” Partial identification in single-agent nonparametric triangular models with binary endogenous variables has been studied in Shaikh and Vytlacil (2011) and Chesher (2005), among others. Shaikh and Vytlacil (2011) provide bounds on the ATE in this setting. In a slightly more general model, Vytlacil and Yildiz (2007) achieve point identification with an exogenous variable that is excluded from the selection equation and has a large support. Our bound analysis builds on these papers, but we study a multi-agent model with strategic interaction as a key component of the model. A few existing studies have extended a single-treatment model to a multiple-treatment setting, but their models maintain monotonicity in the selection process and none of them allow simultaneity among the multiple treatments resulting from agents’ interaction as we do in this paper.2 In interesting recent work, Pinto (2015), Heckman and Pinto (2015), and Lee and Salani´e (2016) relax or generalize the monotonicity of the selection process in multi-valued treatments settings, but they generally consider di↵erent types of treatment selection mechanisms than ours. Pinto (2015) and Heckman and Pinto (2015) introduce unordered monotonicity, while Lee and Salani´e (2016) consider more general non-monotonicity. The latter paper does mention entry games as one example of their treatment selection process, but by assuming known payo↵s they sidestep the multiplicity of equilibria, which is one of the main focuses of this paper. Also, Lee and Salani´e (2016)’s main focus is on point identification of marginal treatment e↵ects, in which case continuous instruments are required and the ATE is recovered under a large support assumption. Their LATE under discrete instruments also di↵ers from ours; see Section 5 for details. Without triangular structures, Manski (1997), Manski and Pepper (2000) and Manski (2013) also propose bounds on the ATE with multiple treatments under various monotonicity assumptions, including an assumption on the sign of treatment response. We take an alternative approach that is more explicit about treatments interaction while remaining agnostic about the direction of treatment response. Our results suggest that, provided that there exist exogenous variation excluded from the selection process, the bounds calculated from this approach can be more informative than those from their approach. Among these papers, Manski (2013) is the closest to ours in that it considers multiple treatments and multiple agents with simultaneous interaction, but with an important di↵erence from our approach. The interaction in his setting is through individuals which are the unit of observa2

Heckman et al. (2006) consider multiple treatments in a framework of the marginal treatment e↵ects and local instrumental variables. See also Jun et al. (2011) for vector endogenous regressors in a triangular model.

3

tion. On the other hand, our setting features the interaction through the treatment/player unit, and the unit of observation is i.i.d. markets or regions in which the first-stage game is played and from which the outcome variable may emerge. Identification in models for binary games with complete information has been studied in Tamer (2003) and Ciliberto and Tamer (2009), Bajari et al. (2010), among others. The present paper contributes to this literature by considering post-game outcomes in the model, especially those that are not the game players’ direct concerns such as externalities. As related work that considers post-game outcomes, Ciliberto et al. (2016) introduce a model where firms make simultaneous decisions of entry and pricing upon entry. As a result, their model can be seen as a multi-agent extension of a sample selection model. The model considered in this paper, on the other hand, is a multi-agent extension of a model for endogenous treatments. Unlike in Ciliberto and Tamer (2009) which assume a parametric distribution for the unobservables in the game, we impose no restrictions for the joint distribution of all unobservables in the game and the outcome equation. Also a di↵erent approach to partial identification under multiplicity is employed, as their approach is not applicable to the particular setting of this paper. Lastly, it is typically hard to estimate bounds on underlying payo↵ parameters in complete information games even with two players (Ciliberto and Tamer (2009, p. 1813)), i.e., partial identification analysis in this setting may not be constructive. In contrast, our parameters of interest are functionals of the primitives (but excluding the game parameters), and thus can be easily estimated even in a model with many players. The paper is organized as follows. Section 2 introduces the model, the parameters of interest, and motivating examples. Section 3 delivers the main results of this paper. We start by conducting the bound analysis on the ATE’s for a two-player case and a binary dependent variable as an illustration. Then we extend the results to a many-player case with a more general dependent variable. The analytical characterization of equilibrium regions for many players is presented in this section. Section 4 relaxes the symmetry assumption and discusses an extension of the model, point identification under large support, and relationship to Manski (2013). The LATE parameter is introduced and identified in Section 5. Section 6 presents a numerical illustration. ˜ For a generic S-vector v ⌘ (v1 , ...vS˜ ), let v s denote an (S˜ 1)-vector where s-th element is dropped from v, i.e., v s ⌘ (v1 , ..., vs 1 , vs+1 , ..., vS˜ ). When no confusion arises, we sometimes change the order of entry and write v = (vs , v s ) for convenience. For a multivariate R function f (v), the integral A f (v)dv is understood as a multi-dimensional integral over a set A contained in the space of v. Vectors in this paper are row vectors.

2

Setup and Motivating Examples

Let D ⌘ (D1 , ..., DS ) 2 D ✓ {0, 1}S be a S-vector of binary treatments and d ⌘ (d1 , ..., dS ) be its realization, where S is fixed. We assume that D is predicted as a pure strategy Nash equilibrium of a complete information game with S players who make entry decisions or individuals who choose to receive treatments.3 Let Y be a post-game outcome that results from profile D of endogenous treatments. It can be an outcome common to all players or an outcome specific to each player. Let (X, Z1 , ..., ZS ) be exogenous variables in the model. Let 3

While mixed strategy equilibria are not considered in this paper, it may be possible to extend the setup to incorporate mixed strategies following the argument in Ciliberto and Tamer (2009).

4

s 2 {1, ..., S} be an index for players or interchangeably for treatments. We assume there is a variable in Zs that is excluded from the equation for Y . For the partial identification analysis, we may also assume there is a variable in X that is excluded from all the equations for Ds . We consider a model of a semi-triangular system: Y = ✓(D, X, ✏D ), s

Ds = 1 [⌫ (D

s , Zs )

(2.1) Us ] ,

s 2 {1, ..., S},

(2.2)

where ✓ : RS+dx +1 ! R and ⌫ s : RS 1+dzs ! R are functions nonseparable in their arguments. Implied from the complete information game, player s’s decision Ds depends on the decisions of all others D s in D s , and thus D is determined by a simultaneous system. The unobservables (✏D , U1 , ..., US ) are arbitrarily dependent to one another. Without loss of generality we normalize the scalar Us to be distributed as U nif (0, 1) and ⌫ s : RS 1+dzs ! (0, 1]. The scalar ✏D is dependent on D. This unobservable is allowed to be a vector, which would be important in considering Y itself as an equilibrium outcome of strategic interaction among players. In this case, a vector of player-specific unobservables may enter the equation for Y in a reduced-form fashion.4 The unit of observation, indexed by market or geographical region i, is suppressed in all the expressions. As mentioned, the model (2.1)–(2.2) is incomplete due to the possible existence of multiple equilibria in the first-stage game of treatment selection. Moreover, the conventional monotonicity in the sense of Imbens and Angrist (1994) is not exhibited in the selection process due to simultaneity; see Section 5. The potential outcome of receiving D = d can be written as Yd = ✓(d, X, ✏d ),

d 2 D,

P and ✏D = d2D 1[D = d]✏d . We are interested in the ATE and related parameters. With the average structural function (ASF) E[Yd |X = x] (2.3) for vector d 2 D, the ATE can be written as E[Yd

Yd0 |X = x] = E[✓(d, x, ✏d )

✓(d0 , x, ✏d0 )],

(2.4)

for d, d0 2 D. Another parameter of interest is the average treatment e↵ect on the treated (ATT): E[Yd Yd0 |D = d00 , Z = z, X = x] for d, d0 , d00 2 D. Unlike the ATT or the treatment of the untreated in the single-treatment case, d00 does not necessarily equal d or d0 here. One might also be interested in the sign of the ATE, which in this multi-treatment case is essentially establishing an ordering among the ASF’s. Lastly, we are interested in the LATE, which will be considered later after necessary concepts are introduced. As an example of the ATE, we may choose d = (1, ..., 1) and d0 = (0, ..., 0) to measure some cancelling-out e↵ect, or we may be interested in more general nonlinear e↵ects. Another example would be choosing d = (1, d s ) and d0 = (0, d s ) for given d s . In the latter example, we can learn interaction e↵ects of treatments, i.e., how much the average gain 4

In Section 4, we discuss this in the context of player-specific Y . Having a scalar unobserved type Us in each player’s decision process Ds may be relatively innocuous since the interaction is explicitly modeled through D s which contains other players’ unobserved types.

5

(ATE) from treatment s is a↵ected by other treatments: suppressing the conditioning on X = x, h i ⇥ ⇤ E Y1,d s Y0,d s E Y1,d0 s Y0,d0 s ,

where Yd is interchangeably written as Yds ,d s here. For example with d s = (1, ..., 1) and d0 s = (0, ..., 0), complementarity between h treatment s iand all the other treatments can be ⇥ ⇤ represented as E Y1,d s Y0,d s E Y1,d0 s Y0,d0 s > 0. Suppose we instead want to focus on learning about complementarity between two treatments, while averaging over the other S 2 treatments. This can be dealt with a more general framework of defining the ASF and ATE. Define a partial counterfactual outcome as follows: with a partition D = (D 1 , D 2 ) 2 D1 ⇥ D2 = D and its realization d = (d1 , d2 ), X Yd1 ,D2 ⌘ 1[D 2 = d2 ]Yd1 ,d2 . (2.5) d2 2D 2

This is a counterfactual outcome that is fully observed once D 1 = d1 is realized. Then for each d1 2 D1 , the partial ASF can be defined as X E[Yd1 ,D2 ] = E[Yd1 ,d2 |D 2 = d2 ] Pr[D 2 = d2 ] (2.6) d2 2D 2

and the partial ATE between d and d0 as E[Yd1 ,D2

Yd10 ,D2 ].

(2.7)

Using this concept, we can consider ⇥ ⇤ ⇥ complementarity ⇤ concentrated on, e.g., the first two treatments: E Y11,D2 Y01,D2 > E Y10,D2 Y00,D2 . In identifying these treatment parameters, suppose we attempt to recover the e↵ect of a single treatment with D 1 being a scalar in model (2.1)–(2.2) conditional on D 2 = D s = d s , and then recover the e↵ects of multiple treatments by transitively using these e↵ects of single treatments. This strategy is not valid since D2 is a function of D1 and also due to multiplicity. Therefore, the approaches in the literature with single-treatment, single-agent triangular models are not directly applicable and a new theory is demanded in this more general setting. We provide examples to which model (2.1)–(2.2) may apply. For concreteness, let Zs = (Z1s , W ) and X = (X1 , W ), where variables commonly present to all the equations are collected in W . Example 1 (Externality of airline entry). In this example, we are interested in the e↵ects of airline competition on local air quality and health. Consider multiple airline companies making entry decisions in local market i defined as a route that connects a pair of cities. Let Yi denote the air pollution levels or average health outcomes of this local market. Let Ds,i denote airline s’s decision to enter market i, which is correlated with some unobserved characteristics of the local market that a↵ect Yi . The parameter E[Yd,i Yd0 ,i ] captures the e↵ects of a market structure on pollution or health. One interesting question would be whether 6

the ATE is nonlinear in the number of airlines as companies may operate more efficiently when facing more competition. As related work, Schlenker and Walker (2015) document how sensitively local health outcomes, such as acute respiratory diseases, are a↵ected by the change in airline schedules. Economic activity variables, such as population and income, can be included in Wi , since they not only a↵ect the outcomes but also the entry decisions. The excluded variable X1i can be characteristics of the local market that directly a↵ect pollution or health levels, such as weather shocks or the share of pollution-related industries in the local economy. We assume that, conditional on Wi , these factors a↵ect the outcome but do not enter the payo↵ functions of the airlines. The instruments Z1s,i are cost shifters that a↵ect entry decisions. When Yi is a health outcome, pollution levels can be included in X1i . Example 2 (Incumbents’ response to potential entrants). In this example, we are interested in how market i’s incumbents respond to the threat of entry of potential competitors. Let Yi be an incumbent firm’s pricing or investment decision and Ds,i be an entry decision by firm s in “nearby” markets, which can be formally defined in each context. For example, in airline entry, nearby markets are defined as city pairs that share the endpoints with the city pair of an incumbent (Goolsbee and Syverson (2008)). That is, potential entrants are airlines that operate in one (or both) of the endpoints of the incumbent’s market i, but who have not connected these endpoints. Then the parameter E[Yd,i Yd0 ,i ] captures the incumbent’s response to the threat, specifically whether it responds by lowering the price or making an investment. As in Example 1, Z1s,i are cost shifters and X1i are other factors a↵ecting price of the incumbent, excluded from nearby markets, conditional of Wi . The characteristics of the incumbent’s market can be a candidate of X1i , such as the distance between the endpoints of the incumbent’s market in the airline example. Example 3 (Media and political behavior). In this example, the question is how media a↵ects political participation or electoral competitiveness. In county or market i, either Yi 2 [0, 1] can denote voter turnout, or Yi 2 {0, 1} can denote whether an incumbent is re-elected or not. Let Ds,i denote a market entry decision by local newspaper type s, which is correlated with unobserved characteristics of the county. In this example, Z1s,i is the neighborhood counties’ population size and income, which is common to all players (Z11,i = · · · = Z1S,i ). In the Appendix, we show that how the analysis in this paper can incorporate instruments that are common to all players. Lastly, X1i can include changes in voter ID regulations. Using a linear panel data model, Gentzkow et al. (2011) show that the number of newspapers in the market significantly a↵ects the voter turnout but find no evidence whether it a↵ects the reelection of incumbents. More explicit modeling of the strategic interaction among newspaper companies can be important to capture competition e↵ects on political behavior of the readers. Example 4 (Food desert). Let Yi denote a health outcome, such as diabetes prevalence, in region i, and Ds,i be the exit decision by large supermarket s in the region. Then E[Yd,i Yd0 ,i ] measures the e↵ects of absence of supermarkets on health of the residents. Conditional on other factors Wi , the instrument Z1s,i can include changes in local government’s zoning plans and X1i can include the region’s health-related variables, such as the number of hospitals and the obesity rate. This problem is related to the literature on “food desert” (e.g., Walker et al. (2010)). Example 5 (Ground water and agriculture). In this example, we are interested in the impact of access to groundwater on economic outcomes in rural areas (Foster and Rosenzweig 7

(2008)). In each Indian village i, symmetric wealthy farmers (of the same caste) make irrigation decisions Ds,i , i.e., whether or not to buy motor pumps, in the presence of peer e↵ects and learning spillovers. Since ground water is a limited resource that is seasonally recharged and depleted, other farmers’ entry may negatively a↵ects one’s payo↵. The adoption of the technology a↵ects Yi , which can be the average of local wages of peasants or prices of agricultural products, or a village development or poverty level. In this example, continuous or binary instrument Z1s,i can be the depth to groundwater, which is exogenously given (Sekhri (2014)), or provision of electricity for pumping in a randomized field experiment. X1,i can be village-level characteristics that villagers do not know ex ante or do not concern about.5

3

Partial Identification of the ATE

To characterize the bounds on the treatment parameters, we make the following assumptions. Let Z ⌘ (Z1 , ..., ZS ) and U ⌘ (U1 , ..., US ). Let X and Z be the supports of X and Z, respectively. Assumption IN. (X, Z) ? (✏d , U ) 8d 2 D. Assumption E. (✏d , U ) are continuously distributed 8d 2 D.

Assumption R. For any d, d0 2 D, either (a) ✏d = ✏d0 = ✏; or (b) F✏d |U = F✏d0 |U . Assumption R(a) and (b) are the rank invariance and rank similarity conditions, respectively (Chernozhukov and Hansen (2005)). We proceed with (a); we can easily extend the analysis to case (b). We now impose shape restrictions on the outcome function ✓(d, x, ✏) and the first-stage payo↵ ⌫ s (d s , zs ). For the outcome function, we impose shape restrictions on #(d, x; u) ⌘ E[✓(d, x, ✏)|U = u] a.e. u instead of ✓(d, x, ✏) a.e. ✏. These restrictions on the conditional mean are weaker than those that are directly imposed on ✓(d, x, ✏). Unless otherwise noted, the assumptions below hold for each s 2 {1, ..., S}, which statement is omitted for brevity. Let Zs be the support of Zs . The next assumption is a monotonicity assumption. Assumption M. (i) For every x 2 X , either #(1, d s , x; u) #(0, d s , x; u) a.e. u 8d s 2 D s , or #(1, d s , x; u)  #(0, d s , x; u) a.e. u 8d s 2 D s ; (ii) For all zs , zs0 2 Zs , either ⌫ s (d s , zs ) ⌫ s (d s , zs0 ) 8d s 2 D s and 8s 2 {1, ..., S}, or ⌫ s (d s , zs )  ⌫ s (d s , zs0 ) 8d s 2 D s and 8s 2 {1, ..., S}. Assumption M(i) can be stated in twofold: (a) for every x and d s , either #(1, d s , x; u) #(0, d s , x; u) a.e. u, or #(1, d s , x; u)  #(0, d s , x; u) a.e. u; (b) for every x, each inequality in (a) holds for all d s . For an outcome function with a scalar index, ✓(d, x, ✏) = ˜ ˜ ✏)|U = u] being strictly increasing (decreasing) in ✓(µ(d, x), ✏), part (a) is implied by E[✓(t, t a.e. u.6 Functions that satisfy the latter assumption include strictly monotonic functions 5

Especially in this example, the number of players/treatments Si is allowed to vary across villages. We assume in this case that players/treatments are symmetric (in a sense that becomes clear later) and ⌫ 1 (·) = · · · = ⌫ Si (·) = ⌫(·). 6 A single-treatment version of the latter assumption appears in Vytlacil and Yildiz (2007) (Assumption ˜ ✏) is strictly increasing (decreasing) a.e. ✏; see Vytlacil and Yildiz A-4), which is weaker than assuming ✓(t, (2007) for related discussions.

8

˜ ✏) = r(t+✏) where unknown r(·) is a strictly increasing and such as transformation models ✓(t, ˜ ✏) = functions that are not strictly monotonic such as limited dependent variables models ✓(t, ˜ 1[t ✏] or ✓(t, ✏) = 1[t ✏](t ✏). There can be, however, functions that violate the latter assumption but satisfy part (a). For example, consider a threshold crossing model with a random coefficient: ✓(d, x, ✏) = 1[ (✏)d > x > ] where (✏) is nondegenerate. When si 0, h > x > then E[✓(1, d s , x, ✏) ✓(0, d s , x, ✏)|U = u] = Pr (✏)  d x > |U = u and >  +d s

s

s

s

s

thus nonnegative a.e. u, and vice versa. Part (a) also does not impose any monotonicity of ✓ in ✏ and thus ✏ is allowed to be a vector. Part (b) of Assumption M(i) imposes mild uniformity. Uniformity is required across different values of d s but not across s, which means that di↵erent treatments can have di↵erent directions of monotonicity. More importantly, knowledge on the direction of the monotonicity is not necessary, unlike Manski (1997) or Manski (2013) where the semi-monotone treatment response is assumed for possible multiple treatments. In Assumption M(ii), even though uniformity is required not only across d s but also across s, it is justifiable especially when zs is chosen to be of the same kind for all players. For example in an entry game, if zs is chosen to be each player’s entry cost, then the payo↵s would decrease in their costs for all players. Note that this monotonicity is weaker than a conventional monotonicity that ⌫ s (d s , ·) is either non-decreasing or non-increasing in zs for all d s and s. Assumption SS. For every zs 2 Z s , ⌫ s (d d s.

s , zs )

is strictly decreasing in each element of

Assumption SS asserts that the agents’ decisions are produced in a game of strategic substitutes in the first stage. ˜ x; u) a.e. u for any permutation Assumption SY. (i) For every x 2 X , #(d, x; u) = #(d, ˜ d of d; (ii) For every zs 2 Z s , ⌫ s (d s , zs ) = ⌫ s (d˜ s , zs ) for any permutation d˜ s of d s . Assumption SY imposes symmetry in the functions as long as the observed characteristics X or Zs (and W common in both functions, which is suppressed) remain the same. This conditional symmetry assumption is useful to make our incomplete model tractable. Assumption SY(i) is relaxed in Section 4.1. An assumption related to SY(i) is also found in Manski (2013). SY(ii) trivially holds in the two-player case and it becomes crucial with many players. SY(ii) is related to the exchangeability assumption in classical entry games (e.g., Berry (1992), Kline and Tamer (2012)), which imposes that the payo↵ of a player is a function of the number of other entrants,7 or the “anonymity” assumption in large games (e.g., Kalai (2004), Menzel (2016)), which imposes that the payo↵ depends on the empirical distribution of other players’ decisions. In the language of Ciliberto and Tamer (2009), although SY(ii) restricts heterogeneity in the fixed competitive e↵ects (i.e., how each of other entrants a↵ects one’s payo↵), the nonseparability between d s and zs in ⌫ s (d s , zs ) allows heterogeneity how each player is a↵ected by other entrants; this heterogeneity is related to the variable competitive e↵ects. Assumption NU. For each d

s

2D

s,

⌫ s (d

7

s , Zs )|X

is nondegenerate.

This assumption is imposed as part of a monotonicity assumption (Assumption 3.2) in Kline and Tamer (2012). The “symmetry of payo↵s” has a di↵erent meaning in their paper.

9

Assumption NU is related to the exclusion restriction and the relevance condition of the instruments Zs . Theorem 3.1. In model (2.1)–(2.2), suppose Assumptions IN, E, R, M, SS, SY and NU hold. Then the sign of the ATE is identified, and the upper and lower bounds on the ASF and ATE with d, d˜ 2 D are Ld (x)  E[Yd |X = x]  Ud (x) and Ld (x)

Ud˜(x)  E[Yd

Yd˜|X = x]  Ud (x)

Ld˜(x)

where, for given d† 2 D, (

E[Y |D = d† , Z = z, X = x] Pr[D = d† |Z = z]

Ud† (x) ⌘ inf

z2Z

+

X

d0 6=d

inf

x0 2X U† (x;d0 ) †

(

d

0

0

)

0

)

E[Y |D = d , Z = z, X = x0 ] Pr[D = d |Z = z] ,

Ld† (x) ⌘ sup E[Y |D = d† , Z = z, X = x] Pr[D = d† |Z = z] z2Z

+

X

sup

L 0 0 d0 6=d† x 2Xd† (x;d )

0

E[Y |D = d , Z = z, X = x0 ] Pr[D = d |Z = z] ,

with XdU† (x; d0 ) and XdL† (x; d0 ) specified in (3.32)–(3.34) below. Heuristically, the following is the idea of the bound analysis. For given d 2 D, consider E[Yd |X] = E[Yd |Z, X] = E[Y |D = d, Z, X] Pr[D = d|Z] X + E[Yd |D = d0 , Z, X] Pr[D = d0 |Z],

(3.1)

d0 6=d

where the first equality is by Assumption IN. In this expression, the counterfactual term E[Yd |D = d0 , Z, X] can be bounded as long as Y is bounded by a known interval (Manski (1990)). Instruments in Z that are excluded from the equation for Y can then be used to narrow the bound. The goal is to derive tighter bounds on the ATT’s E[Yd |D = d0 , Z, X] in (3.1) by fully exploiting the structure of the model. These bounds then can be used to construct bounds on the ATE. Suppose S = 2 and ✓(d, x, ✏) = 1[µ(d, x) ✏] for illustrative purpose, and let µd (x) ⌘ µ(d, x) for simplicity. Suppose that we know the direction of monotonicity in Assumption M(i) that µ10 (x) µ00 (x). Then we can derive the upper

10

bound on, e.g., E[Y00 |D = (1, 0), Z, X] as Pr[Y00 = 1|D = (1, 0), Z = z, X = x] = Pr[✏  µ00 (x)|D = (1, 0), Z = z, X = x]  Pr[✏  µ10 (x)|D = (1, 0), Z = z, X = x]

(3.2)

= Pr[Y = 1|D = (1, 0), Z = z, X = x],

which is smaller than one, the upper bound without the knowledge of the direction. Below, in a general setting, we show that the direction of monotonicity can actually be identified from the data. Moreover, by additionally using variation of X, we exploit more sophisticated monotonicity than those involved in Assumption M(i). The initial motivation of the approach is similar to Shaikh and Vytlacil (2011) and Vytlacil and Yildiz (2007) in that it jointly exploits the selection model and the existence of X excluded from the selection model to determine how ✓(d, x, ✏) behaves. The procedure, however, di↵ers from theirs as we need to deal with the nonmononic treatments selection process and the incompleteness of the model.

3.1

Analysis with S = 2 and Binary Y

To illustrate how to determine the direction of monotonicity of the outcome function with respect to the treatments, this section considers the simple case S = 2 with binary Y and scalar ✏; Section 3.2 considers the general case of many players with possibly non-binary Y and vector ✏. Then the model (2.1)–(2.2) is simplified as Y = 1 [µ(D1 , D2 , X) ✏] , ⇥ ⇤ D1 = 1 ⌫ 1 (D2 , Z1 ) U1 , ⇥ ⇤ D2 = 1 ⌫ 2 (D1 , Z2 ) U2 .

(3.3) (3.4) (3.5)

We first define quantities that are identified directly from the data. For x 2 X and z, z 0 2 Z where Z|X=x = Z by Assumption NU, define h(z, z 0 , x) ⌘ E[Y |Z = z, X = x]

E[Y |Z = z 0 , X = x]

= Pr[Y = 1|Z = z, X = x]

(3.6) 0

Pr[Y = 1|Z = z , X = x],

which records the change in the distribution of Y as Z changes. Also, define 0 hD 11 (z, z ) ⌘ Pr[D = (1, 1)|Z = z] 0 hD 00 (z, z ) ⌘ Pr[D = (0, 0)|Z = z]

Let a function sgn{h} take values Recall µd1 d2 (x) ⌘ µ(d1 , d2 , x).

Pr[D = (1, 1)|Z = z 0 ], Pr[D = (0, 0)|Z = z 0 ].

1, 0, 1 when h is negative, zero and positive, respectively.

Lemma 3.1. Suppose S = 2 with model (3.3)–(3.5). Under the assumptions of Theorem 0 0 3.1, for z, z 0 2 Z such that hD hD 11 (z, z ) > 0 and 00 (z, z ) > 0 and for x 2 X , sgn h(z, z 0 , x) = sgn {µ11 (x)

µ01 (x)} = sgn {µ10 (x)

µ00 (x)} .

D Given the result of this lemma, since h, hD 11 and h00 can be recovered from the data, we recover the signs of µ11 (x) µ01 (x) and µ10 (x) µ00 (x), i.e., the direction of monotonicity

11

in Assumption M(i). Then we can calculate bounds on the unknown conditional mean terms (the ATT’s) as seen in (3.2). Under Assumption NU, the existence of (z, z 0 ) such that 0 0 0 hD hD 11 (z, z ) > 0 and 00 (z, z ) > 0 is guaranteed by Assumption M(ii), i.e., for (z, z ) such D 0 D 0 that h11 (z, z ) > 0, it must be h00 (z, z ) > 0 by Assumption M(ii) (and vice versa). M(ii) can be tested from the data.8 Under Assumption SS, (1, 0) and (0, 1) are the values of D that can be realized as possible multiple equilibria; see, e.g., Figure 1(a). Given this knowledge, we define hM (z, z 0 , x) ⌘ Pr[Y = 1, D 2 {(1, 0), (0, 1)}|Z = z, X = x]

Pr[Y = 1, D 2 {(1, 0), (0, 1)}|Z = z 0 , X = x],

and h11 (z, z 0 , x) ⌘ Pr[Y = 1, D = (1, 1)|Z = z, X = x] h00 (z, z 0 , x) ⌘ Pr[Y = 1, D = (0, 0)|Z = z, X = x]

Pr[Y = 1, D = (1, 1)|Z = z 0 , X = x], Pr[Y = 1, D = (0, 0)|Z = z 0 , X = x],

so that h(z, z 0 , x) = h11 (z, z 0 , x) + h00 (z, z 0 , x) + hM (z, z 0 , x). Making use of the conditional symmetry assumption (SY(i)), combining D = (1, 0) and D = (0, 1) will manage the multiple equilibria problem. We define regions in the support U ⌘ (0, 1]2 of U = (U1 , U2 ) that predict pure strategy equilibria (1, 1), (0, 0), (1, 0), and (0, 1) for D: with ⌫ds s (zs ) ⌘ ⌫ s (d s , zs ) for brevity, R11 (z) ⌘ U : U1  ⌫11 (z1 ), U2  ⌫12 (z2 ) , R00 (z) ⌘ U : U1 > ⌫01 (z1 ), U2 > ⌫02 (z2 ) , R10 (z) ⌘ U : U1  ⌫01 (z1 ), U2 > ⌫12 (z2 ) , R01 (z) ⌘ U : U1 > ⌫11 (z1 ), U2  ⌫02 (z2 ) . By Assumption SS, R11 and R00 are regions of a unique equilibrium and R10 [ R01 contains regions of multiple equilibria; formal proofs for this argument and other arguments below concerning the equilibrium regions can be found in Proposition 3.1 in a general setup of S 2. By Assumption IN, suppressing the arguments (z, z 0 , x) on the l.h.s., h11 + h00 = Pr[✏  µ11 (x), U 2 R11 (z)] + Pr[✏  µ00 (x), U 2 R00 (z)]

Pr[✏  µ11 (x), U 2 R11 (z 0 )]

Pr[✏  µ00 (x), U 2 R00 (z 0 )],

(3.7)

where the equality uses R11 and R00 being disjoint and regions of unique equilibrium. By Assumption SY(i) that µ10 = µ01 , we have Pr[✏  µ10 (x), U 2 R10 (z 0 ) [ R01 (z 0 )]. (3.8)

hM = Pr[✏  µ10 (x), U 2 R10 (z) [ R01 (z)]

The main insight to obtain the results of Lemma 3.1 is as follows. By (3.6), h captures how Pr[Y = 1|Z = z, X = x] changes in z. By h = h11 + h00 + hM and (3.7)–(3.8), such a change can be translated into shifts in the regions of equilibria while the thresholds of ✏ in each of 8 Obviously, even though Assumption M(ii) is violated, the results of the lemma, hence that of Theorem 0 0 3.1, will follow as long as there exists (z, z 0 ) that satisfy hD hD 11 (z, z ) > 0 and 00 (z, z ) > 0.

12

1

1

R10 (z)

1 R10

U2

(z 0 )

+ (z, z

U2

U2

R01 (z) 0

R01 (z 0 ) 1

U1

0

(z 0 , z) 1

U1

(b) When Z = z 0

(a) When Z = z

0)

0

1

U1

(c) Di↵erence of (a) and (b)

Figure 1: Illustration of hM in the proof of Lemma 3.1. 1

R00 (z)

1

1 R00 (z 0 )

U2

U2 R11 (z) 0

U1

+ (z, z

U2 R11 (z 0 )

1

0

(z 0 , z)

U1

1

(b) When Z = z 0

(a) When Z = z

0)

0

1

U1

(c) Di↵erence of (a) and (b)

Figure 2: Illustration of h11 + h00 the proof of Lemma 3.1. h11 , h00 and hM remaining unchanged by the exclusion restriction. Therefore by inspecting how Pr[Y = 1|Z = z, X = x] changes in z (i.e., the sign of h) relative to the changes in D the equilibrium regions R11 and R00 (i.e., the signs of hD 11 and h00 ), we recover the signs of µ11 (x) µ01 (x) and µ10 (x) µ00 (x). In doing so, we use a crucial fact that the changes in the region R10 [ R01 are o↵set with the changes in R11 and R00 . 0 D 0 To be specific, suppose that (z, z 0 ) are chosen such that hD 11 (z, z ) > 0 and h00 (z, z ) > 0. 0 0 Then by Assumption M(ii), R11 (z) R11 (z ) and R00 (z) ⇢ R00 (z ). Then + (z, z

0 0

) ⌘ {R10 (z) [ R01 (z)} \ R10 (z 0 ) [ R01 (z 0 ) = R00 (z 0 )\R00 (z), 0

0

0

(z, z ) ⌘ R10 (z ) [ R01 (z ) \ {R10 (z) [ R01 (z)} = R11 (z)\R11 (z ),

(3.9) (3.10)

because, as z changes, an inflow of one region is an outflow of a region next to it. This set algebra is illustrated in Figures 2–1. Then (3.8) becomes hM = Pr[✏  µ10 (x), U 2

+ (z, z

0

)]

Pr[✏  µ10 (x), U 2

(z, z 0 )],

(3.11)

˜ and two sets B and B 0 contained by the following general rule: for a uniform random vector U ˜ in U and for a r.v. ✏ and set A ⇢ E, ˜ 2 B] Pr[✏ 2 A, U

˜ 2 B 0 ] = Pr[✏ 2 A, U ˜ 2 B\B 0 ] Pr[✏ 2 A, U

13

˜ 2 B 0 \B]. Pr[✏ 2 A, U (3.12)

Therefore by combining (3.11) with (3.7) applying (3.12) once more, we have h(z, z 0 , x) = Pr[✏  µ11 (x), U 2

(z, z 0 )]

Pr[✏  µ10 (x), U 2

0

Pr[✏  µ00 (x), U 2

(z, z )] + Pr[✏  µ10 (x), U 2

Now, given Assumption E, Assumption M(i) holds with µ(1, d d s if and only if h(z, z 0 , x) = Pr[µ01 (x)  ✏  µ11 (x), U 2

+ (z, z

s , x)

0

)]

+ (z, z

> µ(0, d

0

)]. s , x)

(z, z 0 )] + Pr[µ00 (x)  ✏  µ10 (x), U 2

(3.13) for any + (z, z

0

)],

which is positive as is the sum of two probabilities. One can analogously show this for other signs and we have the result of Lemma 3.1.9 Lastly, to gain efficiency in determining the sign of h(z, z 0 , x), define the integrated version of h as 0 D 0 H(x) ⌘ E[h(Z, Z 0 , x)|hD 11 (Z, Z ) > 0, h00 (Z, Z ) < 0],

(3.14)

then sgn{H(x)} = sgn {µ11 (x) µ01 (x)} = sgn {µ10 (x) µ00 (x)}. Now, consider calculating the upper bound on Pr[Y00 = 1|X = x]. For the chosen evaluation point x, suppose H(x) 0. Then by Lemma 3.1, µ00 (x)  µ10 (x), µ00 (x)  µ01 (x), and µ00 (x)  µ10 (x)  µ11 (x). Recall µ00  µ10 implies that the upper bound on Pr[Y00 = 1|D = (1, 0), Z, X] is Pr[Y = 1|D = (1, 0), Z, X] by (3.2). Likewise, using µ00  µ01 and µ00  µ11 , we can calculate upper bounds on the other unobserved terms Pr[Y00 = 1|D = d, Z, X] for d 6= (0, 0) in (3.1). Consequently we have Pr[Y00 = 1|X = x]  Pr[Y = 1|Z = z, X = x]. Likewise, we can derive the lower bounds on Pr[Y00 = 1|X = x] when H(x)  0.10 So far, we have not exploited the variation of X. Now we derive tighter bounds using this variation. For (possibly di↵erent) evaluation points x0 , x1 , x2 2 X , define h(z, z 0 ; x0 , x1 , x2 ) ⌘ h00 (z, z 0 , x0 ) + hM (z, z 0 , x1 ) + h11 (z, z 0 , x2 ).

(3.15)

For x0 , x1 and x2 to take di↵erent values, it would be important to have the existence of a variable in X that is excluded from the first-stage game, i.e., the ability of varying X given Z. Note that for x0 = x1 = x2 = x, h(z, z 0 ; x, x, x) = h(z, z 0 , x). Since Assumption M(i) only compares µds ,d s (x) and µd0s ,d s (x0 ) for x = x0 , the result of Lemma 3.1 and its proof strategy cannot be applicable using h(z, z 0 ; x0 , x1 , x2 ) when x0 , x1 and x2 take di↵erent values. It would not be desirable to modify Assumption M(i) to incorporate monotonicity between µds ,d s (x) and µd0s ,d s (x0 ) for x 6= x0 , as it creates an overly stringent assumption. Instead, we propose a procedure that compares the signs of h(z, z 0 , x) and h(z, z 0 ; x0 , x1 , x2 ) for some x, x0 , x1 and x2 to infer the signs of µd (x) µd˜(x0 ) for some x and x0 . This procedure is unique to this multi-treatment setting with strategic interaction and thus distinct from how the variation of X is exploited in a single-treatment setting as in Shaikh and Vytlacil (2011) 9 Note that in deriving the result of the lemma, a player-specific exclusion restriction is not crucial and one may be able to relax it. 10 When H(x) 0, the lower bounds on Pr[Y00 = 1|X = x] is trivially zero.

14

and Vytlacil and Yildiz (2007).11 Lemma 3.2. Suppose S = 2 with model (3.3)–(3.5). Suppose the assumptions of Theorem 0 0 3.1 holds. For z, z 0 2 Z such that hD hD 11 (z, z ) > 0 and 00 (z, z ) > 0 and for x0 , x1 , x2 2 X , suppose (3.15) is well-defined. For ◆ 2 { 1, 0, 1}, (i) if sgn{h(z, z 0 ; x0 , x1 , x2 )} = sgn{µ10 (x1 ) µ11 (x2 )} = ◆, then sgn{µ10 (x1 ) µ00 (x0 )} = ◆; (ii) if sgn{h(z, z 0 ; x0 , x1 , x2 )} = sgn{µ00 (x0 ) µ10 (x1 )} = ◆, then sgn{µ11 (x2 ) µ10 (x1 )} = ◆. This lemma is proved in a general setup later. The rules established in Lemma 3.2 can be used to narrow the bounds previously obtained with variation of Z only. Analogous to (3.14), define 0 D 0 H(x0 , x1 , x2 ) ⌘ E[h(Z, Z 0 ; x0 , x1 , x2 )|hD 11 (Z, Z ) > 0, h00 (Z, Z ) < 0].

(3.16)

Fix evaluation points x, x0 2 X and suppose H(x, x0 , x0 )  0. Also suppose H(x0 ) 0, which implies µ11 (x0 ) µ10 (x0 ) by Lemma 3.1 and (3.14). Then sgn{H(x, x0 , x0 )} = sgn{µ10 (x0 ) µ11 (x0 )} 2 {0, 1} therefore µ10 (x0 )  µ00 (x) by Lemma 3.2(i). Using this result, we can derive an lower bound on E[Y00 |D = (1, 0), Z = z, X = x]: Pr[Y00 = 1|D = (1, 0), Z = z, X = x] = Pr[✏  µ00 (x)|D = (1, 0), Z = z, X = x]

Pr[✏  µ10 (x0 )|D = (1, 0), Z = z, X = x0 ]

(3.17)

0

= Pr[Y = 1|D = (1, 0), Z = z, X = x ].

Note that this bound cannot be achieved when x0 = x1 = x2 , as the sign conditions in Lemma 3.2(i) and (ii) fail. Therefore, X needs to have excluded variation given Z = z and Z = z 0 .12 Without exploiting x0 6= x of X, the conditional probability in (3.17) only has a lower bound of zero (when H(x) 0). Now, we can collect all x0 2 X that yields µ10 (x0 )  µ00 (x) and further shrink the bound in (3.17) by taking infimum over all x0 in this set. Other bounds on Pr[Y00 = 1|D = d, Z, X] for d 6= (0, 0) can be derived in similar manners and can be combined based on (3.1) to calculate the bounds on Pr[Y00 = 1|X]. This will be explored in full generality in the next section.

3.2

General Analysis

In this section we prove the main theorem (Theorem 3.1) with the full model (2.1)–(2.2), in which Y may no longer be binary and the number of players may exceeds two. We first introduce a generalized version of the sign matching results (Lemma 3.1). Recall, for z, z 0 2 Z and x 2 X , h(z, z 0 , x) ⌘ E[Y |Z = z, X = x]

E[Y |Z = z 0 , X = x].

In these papers, with a scalar treatment D, the sign of a single object h1 (x0 ) + h0 (x) (where hd (x) ⌘ Pr[Y = 1, D = d|X = x, Z = z] Pr[Y = 1, D = d|X = x, Z = z 0 ]) would conveniently determine the direction monotonicity between two outcome index functions evaluated at x and x0 . 12 The candidates for variables that are excluded from all the equations for Ds are discussed in the context of Examples 1–5 in Section 2. One can easily extend the analysis of this paper to a setting where there is no variable in X excluded from the Ds equations but there are exogenous variables common to the Y equation and each Ds equation (or common to the Y equation and all the Ds equations); see Remark 3.4. 11

15

For k = 1, ..., S, let ek be an S-vector of all zeros P except the k-th element being a unity, and let e0 ⌘ (0, ..., 0). For j = 0, ..., S, define ej ⌘ jk=0 ek , which is an S-vector where the first j elements are unity and the rest are zero. For some positive integers ns , define a permutation function : {n1 , ..., nS } ! {n1 , ..., nS }, which has to be a one-to-one function. For example, ✓ ◆ ✓ ◆ n1 n2 n3 n4 n5 1 2 3 4 5 = . (n1 ) (n2 ) (n3 ) (n4 ) (n5 ) 2 1 5 3 4 Let ⌃ be a set of all possible permutations. Define a set of all possible permutations of ej = (ej1 , ..., ejS ) as n o Mj ⌘ dj : dj = ( (ej1 ), ..., (ejS )) for any (·) 2 ⌃ (3.18) for j = 0, ..., S. Note Mj is a set of all equilibria with j treatments selected or j entrants, S and Sj=0 Mj = D. There are S!/j!(S j)! distinct dj ’s in Mj . For example with S = 3, d2 2 M2 = {(1, 1, 0), (1, 0, 1), (0, 1, 1)} and d0 2 M0 = {(0, 0, 0)}. Note d0 = e0 = (0, ..., 0) and dS = eS = (1, ..., 1). Define hj (z, z 0 , x) ⌘ E[Y |D 2 Mj , Z = z, X = x] Pr[D 2 Mj |Z = z]

E[Y |D 2 Mj , Z = z 0 , X = x] Pr[D 2 Mj |Z = z 0 ],

0 hD j (z, z ) ⌘ Pr[D 2 Mj |Z = z]

Pr[D 2 Mj |Z = z 0 ].

(3.19) (3.20)

P P Since Mj ’s are disjoint, Sj=0 Pr[D 2 Mj |Z = ·] = 1 and thus h(z, z 0 , x) = Sj=0 hj (z, z 0 , x). Let x = (x0 , ..., xS ) 2 X S+1 be a collection of (possibly di↵erent) evaluation points, i.e., each evaluation point X = xj is in X for j = 0, ..., S, and define 0

h(z, z ; x) ⌘

S X

hj (z, z 0 ; xj ).

j=0

Recall #(d, x; u) ⌘ E[✓(d, x, ✏)|U = u], and for succinctness let #j (x; u) ⌘ #(ej , x; u) as ej is the only relevant set of treatments under Assumption SY(i). We state the main lemma of this section. Lemma 3.3. In model (2.1)–(2.2), suppose Assumptions IN, E, R, M, SS, SY and NU hold. PS 0 D For z, z 2 Z such that k=j 0 hk (z, z 0 ) > 0 8j 0 = 1, ..., S and for x 2 X and x 2 X S+1 , suppose h(z, z 0 ; x) is well-defined. For j = 1, ..., S, it satisfies that (i) sgn{h(z, z 0 , x)} = sgn {#j (x; u) #j 1 (x; u)} a.e. u; (ii) for ◆ 2 { 1, 0, 1}, if sgn{h(z, z 0 ; x)} = sgn{#k 1 (xk 1 ; u) #k (xk ; u)} = ◆ 8k 6= j, then sgn{#j (xj ; u) #j 1 (xj 1 ; u)} = ◆ a.e. u. P 0 0 The existence of (z, z 0 ) such that Sk=j 0 hD k (z, z ) > 0 8j is guaranteed by Assumptions M(ii) and NU; see the discussion after Corollary 3.1 below. For Lemma 3.3(ii), variation in X conditional on Z would be important. To show Lemma 3.3, we formally characterize as an important step the regions that predict equilibria of the first-stage strategic interaction. The analytical characterization of the equilibrium regions when S > 2 can generally be complicated (Ciliberto and Tamer (2009, p. 1800)) and has not been fully studied in the literature. 16

In the case of more than two players, the conditional symmetry of the payo↵ functions in opponents’ decisions (Assumption SY(ii)) plays an important role in the characterization by simplifying the regions of multiple equilibria. Recall, ⌫ds s (zs ) ⌘ ⌫ s (d s , zs ). Let e˜j be a (S 1)-vector where the first j elements are unity and the rest are zero for j = 0, ..., S 1. Note that by Assumption SY(ii), ⌫es˜j (zs ) is the only relevant payo↵ function to define the regions, therefore for notational simplicity, let ⌫js (zs ) ⌘ ⌫es˜j (zs ). Now, for each equilibrium profile, we define regions of U ⌘ (U1 , ..., US ) in U ⌘ (0, 1]S . These regions are defined as Cartesian products that are subsets of U : RdS (z) ⌘ Rd0 (z) ⌘

S Y

s=1 S Y

0, ⌫Ss

1 (zs )



,

(⌫0s (zs ), 1] ,

s=1

and, given dj = ( (ej1 ), ..., (ejS )) for some (·) 2 ⌃13 and j = 1, ..., S 1, 8 ( j ) 8 S < < Y ⇣ i Y⇣ (s) (s) Rdj (z) = U : (U (1) , ..., U (S) ) 2 0, ⌫j 1 (z (s) ) ⇥ ⌫j (z : : s=1

(s) ), 1

s=j+1

99 i= = ;;

.

(3.21)

For example, for (·) such that d1 = ( (1), (0), (0)) = (0, 1, 0), ⇤ ⇤ ⇤ R010 (z) = ⌫11 (z1 ), 1 ⇥ 0, ⌫02 (z2 ) ⇥ ⌫13 (z3 ), 1 .

Lastly, define the region of all equilibria with j treatments selected or j entrants as [ Rj (z) ⌘ Rd (z).

(3.22)

d2Mj

Now we establish the geometric properties of these regions. Definition 3.1. Sets A and B are neighboring sets when there exists a point in one set whose open "-ball has nonempty intersection with the other set for any " > 0. Two sets with a nonempty intersection are trivially neighboring sets. Two disjoint sets can possibly be neighboring sets when they share a “border”. Proposition 3.1. Consider the first-stage game (2.2). Under Assumptions SS and SY(ii), the following holds: For every z 2 Z (which is suppressed), (i) Rj \ Rj 0 = ; for j, j 0 = 0, ..., S with j 6= j 0 ; (ii) Rj and Rj 1 are neighboring sets for j = 1, ..., S; (iii) Rj and Rj t are not neighboring sets for j = t, ..., S and t 2; S (iv) Sj=0 Rj = U . 13

Sometime we use the notation dj to emphasize the permutation function (·) from which dj is generated.

17

1

U3

1

0

U2 U1

1

(a) R0 ("); R3 (#)

(b) R1

(c) R2

Figure 3: Illustration of Proposition 3.1 for S = 3.

(d)

S3

j=0

Rj = U

This proposition fully characterizes the equilibrium regions. Figure 3 illustrates the results of Proposition 3.1 for S = 3 with R0 = R000 , R1 = R100 [R010 [R001 , R2 = R110 [R101 [R011 and R3 = R111 ; also see Figures 5 and 6 in the Appendix for relevant figures and for a figure that depicts regions of multiple equilibria for this case. For concreteness, we henceforth discuss Proposition 3.1 in terms of an entry game. By (i) and the fact that MS and M0 are singleton, one can conclude that RdS and Rd0 are regions of unique equilibrium. For j = 1, ..., S 1, however, Rdj \ Rd˜j is not necessarily empty for dj = ( (ej1 ), ..., (ejS )) and d˜j = (˜ (ej1 ), ..., ˜ (ejS )) with dj 6= d˜j . In particular, Rdj \ Rd˜j are regions of multiple equilibria. By (i), there is no multiple equilibria where one equilibrium has j entrants and another has j 0 entrants for j 0 6= j. This is reminiscent of Berry (1992) and Bresnahan and Reiss (1990, 1991) in that the equilibrium is unique in terms of the number of entrants. In other words, even if D = dj for j = 1, ..., S 1 is not uniquely predicted by U 2 Rdj (z) as Rdj (z) contains a region of multiple equilibria, D 2 Mj is uniquely predicted by U 2 Rj (z).14 In the present paper, this result is obtained under substantially weaker conditions on the payo↵ function than those in Berry (1992). In addition to (i), Proposition 3.1(ii)–(iii) are important for the analyses of this paper. They assert that regions are neighboring sets when the number of entrants di↵ers by one, but they are not when the number of entrants di↵ers by more than one. By (i), neighboring sets in (ii) are disjoint neighboring sets. Let A ⇠ B denote that A and B are neighboring sets. Note that A ⇠ B implies B ⇠ A and vice versa. Then (i)–(iii) immediately imply that Rj ’s are disjoint regions that lie in U in a monotonic fashion, where all possible neighboring relationships are expressed as R 1 ⇠ R2 ⇠ · · · ⇠ R S

1

⇠ RS .

(3.23)

Given this characterization, we can track the inflow and outflow in each Rj (z) when the value of Z changes, as we do in the S = 2 case. This is formalized in a corollary below. Proposition 3.1(iv) implies that an equilibrium always exists in this game of strategic substitute, regardless of the number of players or the shape of the distribution of unobservables. In this sense, a discrete game with any number of players and strategic substitutability is shown to be coherent (Tamer (2003); Chesher and Rosen (2012)), which extends the finding in a game 14 Therefore, even if Pr[D = dj |Z = z] 6= Pr[U 2 Rdj (z)], it satisfies that Pr[D 2 Mj |Z = z] = Pr[U 2 Rj (z)].

18

Player s

1

2

3

4

5

Decision djs Decision djs 1

1 1

1 0

0 0

1 0

0 1

Table 1: An example of equilibria that di↵er by one entrant with S = 5 and j = 3. with two players in the literature. Proposition 3.1(i) and (iv) imply that Rj for j = 1, ..., S partition the entire U . Note that, reversion (or crossing) of the “border” of the partition does not occur, otherwise it violates (iii). As becomes clear in the proof of the proposition, this result is essentially due to Assumption SS that ⌫js < ⌫js t for t 1. Proposition 3.1(i), (ii) and (iii) are implied by similar statements that satisfy for all 0 individual pairs between two regions: (i0 ) Rdj \ Rdj 0 = ; 8dj 2 Mj and 8dj 2 Mj 0 with j 6= j 0 ; (ii0 ) Rdj and Rdj 1 are neighboring sets 8dj 2 Mj and 8dj 1 2 Mj 1 ; (iii0 ) Rdj and Rdj t are not neighboring sets 8dj 2 Mj and 8dj t 2 Mj t with t 2. These results can be shown of sets defined as Cartesian products: e.g., for two Cartesian products Qusing properties Q R = Ss=1 rs and Q = Ss=1 qs with rs and qs being intervals in R, it satisfies that R ⇠ Q if and only if rs ⇠ qs 8s. This property can be used to prove (ii0 ) and thus (ii) with R = Rdj and Q = Rdj 1 . To show rs ⇠ qs 8s, we show that each pair of rs and qs is associated with the decision of player s that falls into one of the categories that are exemplified in Table 1. Each entry of the table is the s-th element of equilibria dj and dj 1 with S = 5 and j = 3. For any S and j in general, there always exists a player s⇤ such that djs⇤ = 1 and djs⇤ 1 = 0. In this example, s⇤ = 2 (or 4). For all other players, it is easy to see that their equilibrium decisions must be one of the four pairs displayed in Table 1, i.e., (djs , djs 1 ) 2 {(1, 1), (0, 0), (1, 0), (0, 1)} 8s 6= s⇤ . Then it can be shown that for each of these pairs, the corresponding rs and qs satisfy rs ⇠ qs , whether or not they have a nonempty intersection. This is formally shown in the Appendix as part of the proof of Proposition 3.1. For j = 0, ..., S, define the region of all equilibria with at most j entrants as Rj (z) ⌘

j [

Rk (z).

k=0

Although this region is hard to express explicitly in general, it has a simple property: Corollary 3.1. Under the assumptions of Proposition 3.1, for j = 0, ..., S 1, Rj (z) is expressed as a union across (·) ⇣2 ⌃ of Cartesian products, each of which is a product of i (s) intervals that are either (0, 1] or ⌫j (z (s) ), 1 for some s = 1, ...S.

This corollary asserts that the region which predicts all equilibria with at most j entrants is solely determined by the payo↵s of players who stay out facing j entering opponents. Since deriving the explicit expression of Rj can be cumbersome, we infer its form by focusing on the “border” of Rj and using the results of Proposition 3.1; see the proof in the Appendix.15 Corollary 3.1 establishes a version of monotonicity in the treatment selection process. This corollary plays a crucial role in calculating the bounds on the treatment parameters, in 15

Berry (1992) derives the probability of an event that the number of entrants is less than a certain value, which can be written as Pr[U 2 Rj (z)] using our notation. This result is not sufficient for the purpose of our paper.

19

showing sharpness, and in introducing the LATE later. Its key implication is the following: by Corollary 3.1 and Assumption M(ii), for any given z, z 0 2 Z, either Rj (z) ✓ Rj (z 0 )

or

Rj (z) ◆ Rj (z 0 ).

(3.24)

This result can be used, among other things, in proving Lemma 3.3 by equating inflows and outflows of Rj ’s with those of Rj ’s in calculating (3.19) (and thus h(z, z 0 ; x)), which can be written as hj (z, z 0 , x) = E[Y |U 2 Rj (z), Z = z, X = x] Pr[U 2 Rj (z)]

E[Y |U 2 Rj (z 0 ), Z = z 0 , X = x] Pr[U 2 Rj (z 0 )],

(3.25)

by Assumption IN. This approach is analogous to the simpler analysis in Section 3.1. For Lemma 3.3, the result (3.24) also guarantees the existence of z, z 0 2 Z such that S X

k=j 0

n 0 hD k (z, z ) = 1

Pr[U 2 Rj

0

1

(z)]

o

n

1

Pr[U 2 Rj

0

1

o (z 0 )] > 0

8j 0 = 1, ..., S by Assumptions IN and NU. We introduce a lemma that establishes the connection between Corollary 3.1 (or Proposition 3.1) and Lemma 3.3. P Lemma 3.4. Based on the results in Proposition 3.1, h(z, z 0 ; x) ⌘ Sj=0 hj (z, z 0 , xj ) satisfies 0

h(z, z ; x) =

S Z X j=1

where

j 1 (z 0 , z)

⌘ Rj

j

1 (z 0 ,z)

{#j (xj ; u)

#j

1 (xj 1 ; u)} du,

1 (z 0 )\Rj 1 (z).

As a special case of this lemma, h(z 0 , z; x, ..., x) = h(z 0 , z, x) = expressed as 0

(3.26)

h(z , z, x) =

S Z X j=1

j

1 (z 0 ,z)

{#j (x; u)

#j

PS

j=0 hj (z

1 (x; u)} du.

0 , z, x)

can be

(3.27)

Now we are ready to prove Lemma 3.3. For part (i), suppose that #j (x; u) #j 1 (x; u) > 0 a.e. u 8j = 1, ..., S. Then by (3.27), h > 0. Conversely, if h > 0 then it should be that #j (x; u) #j 1 (x; u) > 0 a.e. u 8j = 1, ..., S. Suppose not and suppose #j (x; u) #j 1 (x; u)  0 with positive measure for some j. Then by Assumption M(i), this implies that #j (x; u) #j 1 (x; u)  0 8j a.e. u, and thus h  0 which is contradiction. By applying similar arguments for other signs, we have the desired result. The proof for Lemma 3.3(ii) is in the Appendix. Using Lemma 3.3, we can obtain the results in Theorem 3.1. First the sign of the ATE is identified by Lemma 3.3(i) since E[Yd |X = x] = E[#(d, x; U )]. Next, we calculate the

20

bounds on E[Yd |X = x] with d = dj for a given dj 2 Mj for some j = 0, ..., S. Consider E[Ydj |X = x] = E[Y |D = dj , Z = z, X = x] Pr[D = dj |Z = z] X + E[Ydj |D = d0 , Z = z, X = x] Pr[D = d0 |Z = z].

(3.28)

d0 6=dj

Note that for d0 2 Mj , E[Ydj |D = d0 , Z = z, X = x] = E[Y |D = d0 , Z = z, X = x]

(3.29)

by Assumption SY(i). In order to bound E[Ydj |D = d0 , Z = z, X = x] for d0 2 / Mj in (3.28), we systematically use the results of Lemma 3.3. Define the integrated version of h(z, z 0 , x) and h(z, z 0 ; x) as 2 3 S X 0 0 5 H(x) ⌘ E 4h(Z, Z 0 , x) hD k (Z, Z ) > 0 for all j = 1, ..., S , k=j 0

2

H(x) ⌘ E 4h(Z, Z 0 ; x)

S X

k=j 0

3

0 0 5 hD k (Z, Z ) > 0 for all j = 1, ..., S ,

and define the following sets of evaluation points of X that satisfy the conditions in Lemma 3.3: for j = 1, ..., S, 0 Xj,j

1 (◆) 1 Xj,j 1 (◆)

t Xj,j

1 (◆)

.. .

⌘ {(xj , xj

1)

⌘ {(xj , xj

1)

⌘ {(xj , xj

1)

t Note that Xj,j Lemma 3.3,

if (xj , xj

1 (◆)

1)

: sgn{H(x)} = ◆, x0 = · · · = xS }, : sgn{H(x)} = ◆, (xk , xk

1)

0 2 Xk,k

: sgn{H(x)} = ◆, (xk , xk

1)

t 1 t 1 2 Xk,k 1 ( ◆) 8k 6= j} [ Xj,j 1 (◆).

t+1 ⇢ Xj,j 1 (◆) for any t. Define Xj,j

2 Xj,j

1 (◆),

then sgn{#j (xj ; u)

#j

1 (◆)

1(

0 ◆) 8k 6= j} [ Xj,j

t ⌘ limt!1 Xj,j

1 (xj 1 ; u)}

1 (◆).

= ◆ a.e. u.

16

1 (◆),

Then by (3.30)

0

Consider j 0 < j for E[Ydj |D = dj , Z, X] in (3.28). Then, for example, if (xk , xk 1 ) 2 Xk,k 1 ( 1) [ Xk,k 1 (0) for j 0 + 1  k  j, then #j (x; u)  #j 0 (x0 ; u) where x = xj and 16

t In practice, the formula for Xj,j 1 provides a natural algorithm to construct the set Xj,j 1 for the computaT tion of the bounds. Practitioners can employ truncation t  T for some T and use Xj,j 1 as an approximation for Xj,j 1 .

21

x0 = xj 0 by transitively applying (3.30). Therefore 0

E[Ydj |D = dj , Z = z, X = x] = E[✓(dj , x, ✏)|U 2 Rdj 0 (z), Z = z, X = x] Z 1 = #j (x; u)du Pr[U 2 Rdj 0 (z)] R j 0 (z) Z d 1  #j 0 (x0 ; u)du Pr[U 2 Rdj 0 (z)] R j 0 (z) d

0

= E[✓(dj , x0 , ✏)|U 2 Rdj 0 (z), Z = z, X = x0 ] 0

= E[Y |D = dj , Z = z, X = x0 ].

(3.31)

Symmetrically, for j 0 > j, if (xk , xk 1 ) 2 Xk,k 1 (1) [ Xk,k 1 (0) for j + 1  k  j 0 , then #j (x; u)  #j 0 (x0 ; u) where x = xj and x0 = xj 0 . Therefore the same bound as (3.31) is derived. Given these results, to collect all x0 2 X that yield #j (x; u)  #j 0 (x0 ; u), we can construct a set x0 2 xj 0 : (xk , xk

1)

[ xj 0 : (xk , xk

2 Xk,k

1)

1(

2 Xk,k

1) [ Xk,k

1 (1)

1 (0)

[ Xk,k

1 (0)

for j 0 + 1  k  j, xj = x

for j + 1  k  j 0 , xj = x .

Then we can further shrink the bound in (3.31) by taking infimum over all x0 in this set. The 0 lower bound on E[Ydj |D = dj , Z = z, X = x] can be constructed by simply choosing the opposite signs in the preceding argument. In conclusion, for bounds on the ATE E[Ydj |X = x], we express the sets XdLj (x; d0 ) and U Xdj (x; d0 ) for d0 6= dj introduced in Theorem 3.1 as follows: for d0 2 Mj 0 with j 0 6= j, XdLj (x; d0 ) ⌘ xj 0 : (xk , xk XdUj (x; d0 )

1)

[ xj 0 : (xk , xk

1)

[ xj 0 : (xk , xk

1)

⌘ xj 0 : (xk , xk

and for d0 2 Mj ,

1)

2 Xk,k

1(

2 Xk,k

1 (1)

2 Xk,k

1(

2 Xk,k

1) [ Xk,k

1 (1)

[ Xk,k

[ Xk,k

1 (0)

1 (0)

1) [ Xk,k

1 (0)

for j 0 + 1  k  j, xj = x

for j + 1  k  j 0 , xj = x ,

(3.32)

0

for j + 1  k  j, xj = x

1 (0)

for j + 1  k  j 0 , xj = x , (3.33)

XdLj (x; d0 ) = XdUj (x; d0 ) ⌘ {x},

(3.34)

where the last display is by (3.29). Remark 3.1. When X does not have enough variation, an assumption that Y 2 [Y , Y ] with known endpoints can be introduced to calculate the bounds. To see this, suppose we do not use the variation in X and suppose H(x) 0. Then #k (x; u) #k 1 (x; u) 8k = 1, ..., S by

22

Lemma 3.3(i) and by transitivity, #j 0 E[Ydj |X = x]  +

X

d2Mj

E[Y |D = d, Z, X = x] Pr[D = d|Z]

X

d0 2Mj 0 :j 0 >j

+

#j for any j 0 > j. Therefore, we have

X

d0 2Mj 0 :j 0
E[Y |D = d0 , Z, X = x] Pr[D = d0 |Z] E[Ydj |D = d0 , Z, X = x] Pr[D = d0 |Z].

(3.35)

Without using variation in X, we can bound the last term in (3.35) by Y 2 [Y , Y ]. This case is illustrated in Section 3.1 with ✓(d, x, ✏) = 1[µd (x) ✏] and #j (x; u) = F✏|U (µej (x)|u). Another example is when Y 2 [0, 1] as in Example 3. Remark 3.2. By using a similar approach as in Vytlacil and Yildiz (2007), it may be possible to point identify the ATE by extending the result of Theorem 3.1 using X with larger support. For example, if we can find x0 such that #j (x; u) = #j 0 (x0 ; u) (j 6= j 0 ) then we can point identify the ATT: Z 1 j0 E[Ydj |D = d , Z = z, X = x] = #j (x; u)du Pr[U 2 Rdj 0 (z)] R j 0 (z) Z d 1 = #j 0 (x0 ; u)du Pr[U 2 Rdj 0 (z)] R j 0 (z) d

0

= E[Y |D = dj , Z = z, X = x0 ].

The existence of such x0 requires sufficient variation of X conditional on Z. This approach is alternative to the identification at infinity that uses the large variation of Z for point identification, which is discussed in Section 4.4 below. Remark 3.3. An alternative approach to partial identification of the ATE is to embrace the framework established in Manski (1997) and Manski and Pepper (2000) under support conditions on Yd and stronger monotonicity restrictions. This approach will not require the specification of the first stage (2.2) and hence no issue of multiplicity arises. It, however, requires a semi-MTR assumption that the treatment response is monotonic in a semi-ordered set, which inevitably restricts the sign of some ATE’s. The current approach, on the other hand, does not require such an assumption in exchange of other shape restrictions, and can deliver tighter bounds even in the presence of multiplicity. This is especially true when one can exploit the variability of X given Z. Second, we want to explicitly consider a multi-agent model and impose a simultaneity structure in the first stage to better fit the economic applications we consider. This enables us to incorporate further knowledge about payo↵ structures, such as strategic substitutability. Lastly, our framework can relax the restriction that Y is bounded between a known interval.

3.3

Sharp Bounds

We derive sharp bounds on the ASF and ATE under a stronger support condition. Sharp bounds on the mean treatment parameters in this model of a triangular structure can only be 23

obtained for binary Y . We first simplify the bounds obtained in Theorem 3.1 using variation in Z only. We consider the case H(x) > 0; the case H(x) < 0 is symmetric and the case H(x) = 0 is straightforward. For given dj 2 Mj , we first calculate (3.32)–(3.34) for the S bounds on E[Ydj |X = x] = Pr[Ydj = 1|X = x]. For ease of notation, let M j ⌘ jk=0 Mk S and M >j ⌘ Sk=j+1 Mk = D\M j . Then the bounds become Udj (x) ⌘ inf

z2Z

(

Pr[Y = 1, D 2 Mj |Z = z, X = x] +

X

+

d0 2M j

1

)

(

X

+

d0 2M >j

d0 2M >j

Pr[Y = 1, D = d0 |Z = z, X = x]

Pr[D = d0 |Z = z, X = x] ,

(3.36)

Ldj (x) ⌘ sup Pr[Y = 1, D 2 Mj |Z = z, X = x] + z2Z

X

)

X

d0 2M j

1

Pr[Y = 1, D = d0 |Z = z, X = x]

Pr[D = d0 |Z = z, X = x] ,

(3.37)

where in Udj (x), we use X X inf Pr[Ydj = 1, D = d0 |Z = z, X = x0 ] = Pr[Y = 1, D = d0 |Z = z, X = x] d0 2Mj

x0 2X Uj (x;d0 )

d0 2Mj

d

= Pr[Y = 1, D 2 Mj |Z = z, X = x]

by (3.34), and similarly in Ldj (x). Assumption C. µd (·) and ⌫d s (·) are continuous. Assumption ZZ. Z is compact and satisfies that Z =

QS

s=1 Zs .

The assumption of rectangular support in Assumption ZZ is the key additional support condition. Under Assumptions C, M(ii) and the compactness of Zs in Assumption ZZ, for given s 2 {1, ..., S}, there exist vectors z¯s and z s such that ⌫js0 (¯ zs ) = maxzs 2Zs ⌫js0 (zs ) and ⌫js0 (z s ) = minzs 2Zs ⌫js0 (zs ) 8j 0 = 0, ..., S 1. Define a joint propensity score as pM (z) ⌘ Pr[D 2 M |Z = z]. Note that 0

pM >j 0 (z) = Pr[U 2 U\Rj (z)] For a given z s 2 Z s , it satisfies Zs |z and z s defined above satisfy pM >j 0 (¯ zs , z

s)

= max pM >j 0 (zs , z zs 2Zs

s

(3.38)

= Zs by Assumption ZZ and thus the vectors z¯s

s ),

pM >j 0 (z s , z

s)

= min pM >j 0 (zs , z zs 2Zs

s ),

by Corollary 3.1. By iteratively applying the same argument w.r.t. other elements in z

24

s

under Assumption M(ii), the vectors z¯ ⌘ (¯ z1 , ..., z¯S ) and z ⌘ (z 1 , ..., z S ) satisfy ¯ = max pM >j 0 (z) pM >j 0 (z), Q z2

s

pM >j 0 (z) =

Zs

min pM >j 0 (z). Q

z2

s

Zs

(3.39)

Now using z¯ and z, we can simplify the bounds in Theorem 3.1 and show their sharpness in model (3.3)–(3.5) where X is assumed to be fixed at x in the data generating process (DGP). The bounds where variation in X is additionally exploited may be shown to be sharp with a di↵erent version of the bounds; see a remark below. Theorem 3.2. Given model (3.3)–(3.5) conditional on X = x, suppose the assumptions of 3.1 and Assumptions ZZ and C hold. Also suppose, for z, z 0 2 Z such that PSTheorem D 0 0 0 0. Then the bounds Udj (x) and k=j 0 hk (z, z ) > 0 8j = 1, ..., S, it satisfies h(z, z , x) Ldj (x) in (3.36) and (3.37) simplify as Udj (x) = Pr[Y = 1, D 2 M >j Ldj (x) = Pr[Y = 1, D 2 M

j

1

¯ X = x] + Pr[D 2 M j |Z = z,

|Z = z, X = x] + Pr[D 2 M

>j

1

¯ |Z = z],

|Z = z],

and these bounds and thus the bounds on the ATE are sharp. Shaikh and Vytlacil (2011) use the propensity score as a scalar conditioning variable, which summarizes all the exogenous variation in the selection process and is convenient in simplifying the bounds. In the context of the current paper, however, this approach is invalid since Pr[Ds = 1|Zs = zs , D s = d s ] cannot be written in terms of a propensity score of player s as D s is endogenous. We instead use vector Z as conditioning variables and establish partial ordering for the relevant conditional probabilities (that define the lower and upper bounds) w.r.t. the joint propensity score (3.38). In proving the sharpness of the bounds, Corollary 3.1 plays an important role. Even though D is a vector that is determined by simultaneous decisions, Corollary 3.1 combined with the partial ordering above establishes “monotonicity” of the event U 2 Rj (z) (and U 2 U\Rj (z)) w.r.t. z; see the proof for details. Given the sharp bounds established in Theorem 3.2, we can further narrow the bounds in model (3.3)–(3.5) where X is no longer fixed, using variation of X excluded from the first-stage as explored earlier. In model (3.3)–(3.5) where X is no longer fixed, the bounds that can be derived using variation of X given Z is narrower than the sharp bounds established in Theorem 3.2, as we explored earlier. The resulting bounds, however, are not automatically implied to be sharp from Theorem 3.2, since they are based on a di↵erent DGP and the additional exclusion restriction. Remark 3.4. Maintaining that Y is binary, sharp bounds on the ATE with variation in X can be derived assuming that the signs of #(d, x; u) #(d0 , x0 ; u) are identified for d, d0 2 D and x, x0 2 X via Lemma 3.3. To see this, define X˜dU (x; d0 ) ⌘ x0 : #(d, x; u) X˜dL (x; d0 ) ⌘ x0 : #(d, x; u)

#(d0 , x0 ; u)  0 a.e. u , #(d0 , x0 ; u)

0 a.e. u ,

which are identified by assumption. Then by replacing Xdi (x; d0 ) with X˜di (x; d0 ) (for i 2 {U, L}) in Theorem 3.1, we may be able to show that the resulting bounds are sharp in a 25

fully general model where excluded variation in X is also in use. Since Lemma 3.3 implies that Xdi j (x; d0 ) ⇢ X˜di j (x; d0 ) but not necessarily Xdi j (x; d0 ) X˜di j (x; d0 ), these modified bounds and the original bounds do not coincide. When variation of X is not part of the DGP, Lemma 3.3(i) establishes equivalence between the two signs, and thus Xdi j (x; d0 ) = X˜di j (x; d0 ) for i 2 {U, L}, which results in Theorem 3.2. Relatedly, we can also exploit variation from variables that are common to both X and Z (with or without exploiting excluded variation of X). This is related to the analysis of Chiburis (2010) and Mourifi´e (2015) in a singletreatment setting. The results of Lemma 3.3 can be extended to accommodate the discussion of this remark, but we do not pursue it in the current paper for succinctness.

4

Discussions

4.1

Relaxing Symmetry

We propose two di↵erent ways of relaxing the conditional symmetry assumption in the outcome function (Assumption SY(i)) introduced in the preceding section. 4.1.1

Partial Symmetry: Interactions Within Groups

In some cases, strategic interactions may occur within groups of players (i.e., treatments). In the airline example, it may be the case that larger airlines interact to one another as a group, so do smaller airlines as a di↵erent group, but there may be no interaction across the groups.17 In general for K groups of players/treatments, we consider, with player index s = 1, ..., Sg and group index g = 1, ..., G, Y = ✓(D 1 , ..., D G , X, ✏D ), ⇥ ⇤ Dsg = 1 ⌫ s,g (D g s , Zsg ) Usg ,

(4.1) (4.2)

where each D g ⌘ (D1g , ..., DSg k ) is the treatment vector of group g and D ⌘ (D 1 , ..., D G ). This model generalizes the model (2.1)–(2.2). It can also be seen as a special case of exogenously endowing an incomplete undirected network structure, where players interact to one another within each of complete sub-networks. In this model each group can di↵er in its number (Sg ) and identity of players (under which the entry decision is denoted as Dsg ). Also, the unobservables U g ⌘ (U1g , ..., USg ) can be arbitrarily correlated across groups, in addition to the fact that Usg ’s can be correlated within group g and U ⌘ (U 1 , ..., U G ) can be correlated with ✏D . This partly relaxes the independence assumption across markets, which is frequently imposed in the entry game literature. To calculate the bounds on the ATE E[Yd Yd0 |X = x] we apply the results in Theorem 3.1, by adapting those assumptions to the current extension. Assumption SY(i) then can be relaxed by assuming that (the conditional mean of) the outcome function is symmetric within each group but not across groups. In terms of notation, let D g ⌘ (D 1 , ..., D g 1 , D g+1 , ..., D G ) and its realization be d g . Then such an assumption would be stated as: Assumption SY2. For each g = 1, ..., G, (i) for every x 2 X , #(dg , d 17

g , x; u)

= #(d˜g , d

g , x; u)

We can also easily extend the model so that smaller airlines take larger airlines’ entry decisions as given and play their own entry game, which may be more reasonable to assume.

26

a.e. u for any permutation d˜g of dg ; (ii) for every zsg 2 Zsg , ⌫ s,g (dg s , zsg ) = ⌫ s,g (d˜g s , zsg ) for any permutation d˜g s of dg s . Under this partial conditional symmetry assumption, the bound on the ASF can be calculated by iteratively applying the previous results to each group. Assumptions Y, M, SS and NU can be modified so that they satisfy for within-group treatments and interaction. In particular, Assumption NU can be modified as follows: for each dg s 2 Dg s , ⌫ s,g (dg s , Zsg )|X, Z g is nondegenerate, where Z ⌘ (Z g , Z g ). That is, there must be group-specific instruments that are excluded from other groups.18 We sketch the idea here with binary Y for simplicity. Analogous to theSprevious notation, let Mjg be the set of equilibria with j entrants in group g and let M g,j ⌘ jk=0 Mkg . Suppose G = 2, and d1 2 {0, 1}S1 and d2 2 {0, 1}S2 . Consider the ASF E[Yd |X] = E[Yd1 ,d2 |X] with d1 2 Mj1 1 and d2 2 Mk2 1 for some j = 1, ..., S1 and k = 1, ..., S2 . To calculate its bounds, we can bound E[Yd |D = d0 , Z, X] in (3.1) for d˜ 6= d by sequentially applying the analysis of Section 3 in each group. First consider d˜ = (d˜1 , d2 ) with d˜1 2 Mj1 . We apply Lemma 3.3 for the D 1 portion after holding D 2 = d2 . Suppose Pr[Y = 1|D 2 = d2 , Z 1 = z 1 , Z 2 , X] 1

Pr[D 2 M

1,>j 1

Pr[Y = 1|D 2 = d2 , Z 1 = z 10 , Z 2 , X] 1

1

|Z = z ]

1

Pr[D 2 M

1,>j 1

1

0,

10

|Z = z ] > 0,

then we have µd˜1 ,d2 (x) µd1 ,d2 (x). The proof of Lemma 3.3 can be adapted by holding 2 2 D = d in this case, because there is no strategic interaction across groups and therefore the multiple equilibria problem only occurs within each group. Note that this strategy still allows that there is dependence between D 1 and D 2 even after conditioning on (Z, X) due to dependence between U 1 and U 2 . Then, Pr[Yd1 ,d2 = 1|D = (d˜1 , d2 ), Z, X = x] = Pr[✏  µd1 ,d2 (x)|D = (d˜1 , d2 ), Z, X = x]  Pr[✏  µd˜1 ,d2 (x)|D = (d˜1 , d2 ), Z, X = x]

(4.3)

= Pr[Y = 1|D = (d˜1 , d2 ), Z, X = x].

Next, consider d = (d1 , d2 ) and d˜ = (d˜1 , d˜2 ) with d˜2 2 Mk2 and the other elements as previously determined. Then by applying Lemma 3.3 this time for the D 2 portion after holding D 1 = d˜1 , we have µd˜1 ,d˜2 (x) µd˜1 ,d2 (x) by supposing Pr[Y = 1|D 1 = d˜1 , Z 1 , Z 2 = z 2 , X] 2

Pr[D 2 M

2,>j 1

Pr[Y = 1|D 1 = d˜1 , Z 1 , Z 2 = z 20 , X] 2

2

|Z = z ]

18

2

Pr[D 2 M

2,>j 1

2

0,

20

|Z = z ] > 0.

We maintain Assumption R in the current setting since the assumption is equivalent to assuming a rank invariance within each group, i.e., ✏dg ,d g = ✏d˜g ,d g 8dg , d˜g 2 {0, 1}Sg and g = 1, ..., G.

27

Then Pr[Yd1 ,d2 = 1|D = (d˜1 , d˜2 ), Z, X = x]  Pr[✏  µd˜1 ,d2 (x)|D = (d˜1 , d˜2 ), Z, X = x]  Pr[✏  µd˜1 ,d˜2 (x)|D = (d˜1 , d˜2 ), Z, X = x]

(4.4)

= Pr[Y = 1|D = (d˜1 , d˜2 ), Z, X = x],

where the first inequality is by (4.3). Note that in deriving the upper bound in (4.4), it is important that at least the two groups share the same signs of within-group h’s and hD ’s. This is clearly a weaker requirement than imposing Assumption SY(i). 4.1.2

Stable Equilibrium Selection

Assumption SY(i) can also be relaxed when the equilibrium selection is stable with the change of instruments. As Z changes, the distribution of D changes. This change occurs as players change their decisions by facing di↵erent payo↵s of entry and as the equilibrium selection rule changes. Under the conditional symmetry assumption, SY(i), it is enough to concern the change in the joint propensity score, Pr[D 2 Mj |Z = z] Pr[D 2 Mj |Z = z 0 ] = Pr[U 2 Rj (z)] Pr[U 2 Rj (z 0 )] = Pr[U 2 Rj (z)\Rj (z 0 )] Pr[U 2 Rj (z 0 )\Rj (z)], which is purely determined by the change in the players’ payo↵s. When the symmetry assumption is dropped, then we need to learn about the change in the individual propensity score, Pr[D = dj |Z = z] Pr[D = dj |Z = z 0 ] = Pr[U 2 Rd⇤ j (z)] Pr[U 2 Rd⇤ j (z 0 )], where Rd⇤ (·) is the region that predicts D = d.19 In general, by the S this change is also determined S equilibrium selection rule that partitions Rj (z) = d2Mj Rd⇤ (z) and Rj (z 0 ) = d2Mj Rd⇤ (z 0 ). Netting the change in the payo↵s, one can show that the remaining di↵erence of the propensity scores can be attributed to the change in the selection rule within Rj (z) \ Rj (z 0 ). This is because Rj (z)\ {Rj (z)\Rj (z 0 )} = Rj (z 0 )\ {Rj (z 0 )\Rj (z)} = Rj (z) \ Rj (z 0 ), where the terms Rj (z)\Rj (z 0 ) and Rj (z 0 )\Rj (z) appear in the joint propensity scores above. The region Rj (z) \ Rj (z 0 ) can be seen as a common support of U with j entrants for Z being z or z 0 , and hence a relevant region to consider the equilibrium selection as Z changes. We assume that the equilibrium selection is stable within this region. Assumption ES. For j = 1, ..., S 1, there exist z, z 0 2 Z such that the region that j predicts D = d is invariant for Z 2 {z, z 0 } within Rj (z) \ Rj (z 0 ) 8dj 2 Mj , i.e., Rd⇤ j (z) \ {Rj (z) \ Rj (z 0 )} = Rd⇤ j (z 0 ) \ {Rj (z) \ Rj (z 0 )} 8dj 2 Mj . Since the portion of Rd⇤ j (z) that predicts a unique equilibrium is never a↵ected by the selection rule, the assumption is only relevant to the regions of multiple equilibria (i.e., the union of Rdj (z) \ Rd˜j (z) across all pairs dj , d˜j 2 Mj and similarly with z 0 ) that intersects with Rj (z) \ Rj (z 0 ); this region is hatch-patterned in Figure 4(a) with S = 2. Therefore this assumption trivially holds when Z varies sufficiently enough that this intersection is empty, or equivalently, enough that Rj (z) \ Rj (z 0 ) does not contain the regions of multiple equilibria. Specifically, this occurs when the following condition holds: Assumption ES⇤ . For j = 1, ..., S

1, there exist z, z 0 2 Z such that ⌫js 1 (zs0 )  ⌫js (zs ) 8s.

19 Unlike Rd (z) which is purely determined by the payo↵s ⌫ds s (zs ), Rd⇤ (z) is unknown to the econometrician even if all the players’ payo↵s had been known, since the equilibrium selection rule is unknown.

28

1

1

R10 (z) R10 (z 0 )

U2

R10 (z)

U2 R10 (z 0 )

R01 (z) R01 (z 0 )

0

U1

R01 (z)

R01 (z 0 ) 1

0

U1

(a)

1

(b)

Figure 4: Illustration of Assumptions ES and ES⇤ . Lemma 4.1. Assumption ES⇤ implies Assumption ES. The sufficiency of Assumption ES⇤ is illustrated in Figure 4(b) with ⌫0s (zs0 ) < ⌫1s (zs ) for s = 1, 2. Assumption ES⇤ states that the change in Z is large enough to o↵set the e↵ect of strategic substitutability. For example in an entry game with Zs being entry cost, Assumption ES⇤ may hold with zs0 > zs 8s. In this example, all players become less profitable with the increase in cost, while one player becomes unprofitable to enter whose absence does not help overturn the decrease of other firms’ profits. A necessary condition for Assumption ES is that, for d 2 D, Rd⇤ (z) is a function of z only through (⌫d1 1 (z1 ), ..., ⌫dS S (zS )). That is, with the same primitives and observables, the same equilibrium is selected; see e.g., Bajari et al. (2010) and de Paula (2013) for related discussions. Under Assumption ES and without Assumption SY(i), we can apply an analogous proof strategy as in the symmetry case to determine the direction of monotonicity and ultimately calculate the bounds on the ATE. Recall #(d, x; u) ⌘ E[✓(d, x, ✏)|U = u]. Lemma 4.2. In model (2.1)–(2.2), suppose IN, E, R, Y, M, SS, SY(ii), NU PS Assumptions 0 D 0 and ES hold. For any (z, z , x) such that k=j 0 hk (z, z ) > 0 8j 0 = 1, ..., S, it satisfies that sgn{h(z, z 0 , x)} = sgn {#(1, d

a.e. u 8d

s

2D

s

s , x; u)

#(0, d

s , x; u)}

and 8s = 1, ..., S.

Again, when Assumption SY(i) holds, then the result of this lemma is satisfied without Assumption ES. Suppose S = 2 and Y is binary for illustration of this lemma. In place of hM (z, z 0 , x) that is used to prove Lemma 3.1, introduce h10 (z, z 0 , x) ⌘ Pr[Y = 1, D = (1, 0)|Z = z, X = x] h01 (z, z 0 , x) ⌘ Pr[Y = 1, D = (0, 1)|Z = z, X = x]

Pr[Y = 1, D = (1, 0)|Z = z 0 , X = x], Pr[Y = 1, D = (0, 1)|Z = z 0 , X = x].

Then h defined in (3.6) satisfies h = h11 + h00 + h10 + h01 ; in fact, hM = h10 + h01 . Denote as ⇤ and R⇤ the regions that predict D = (1, 0) and D = (0, 1), respectively. For (z, z 0 ) such R10 01 0 that hD (z, z 0 ) > 0 and hD R11 (z 0 ) and R00 (z) ⇢ R00 (z 0 ), 11 00 (z, z ) > 0, we have R11 (z) ⇤ ⇤ respectively, by Corollary 3.1. Since R10 [ R01 = R10 [ R01 = R1 , (3.9) and (3.10) can

29

alternatively be expressed as + (z, z

0 0

⇤ ⇤ ) ⌘ {R10 (z) [ R01 (z)} \R1 (z 0 ),

(z, z ) ⌘ Consider partitions such that

+ (z, z

1 0 + (z, z ) 1 0

0)

=

⇤ R10 (z 0 )

1 (z, z 0 ) [ +

[

⇤ R01 (z 0 )

2 (z, z 0 ) +

⇤ ⌘ R10 (z)\R1 (z 0 ),

⇤ (z, z ) ⌘ R10 (z 0 )\R1 (z),

\R1 (z).

and

2 0 + (z, z ) 2 0

(4.5) (4.6)

(z, z 0 ) =

1

(z, z 0 ) [

2

(z, z 0 )

⇤ ⌘ R01 (z)\R1 (z 0 ),

⇤ (z, z ) ⌘ R01 (z 0 )\R1 (z).

⇤ exchanged with the regions for D = (0, 0) That is, 1+ (z, z 0 ) and 1 (z, z 0 ) are regions of R10 ⇤ . and D = (1, 1), respectively, and 2+ (z, z 0 ) and 2 (z, z 0 ) are for R01 0 By Assumption IN and suppressing the argument (z, z , x) on the l.h.s., ⇤ h10 = Pr[✏  µ10 (x), U 2 R10 (z)]

⇤ Pr[✏  µ10 (x), U 2 R10 (z 0 )]

⇤ ⇤ = Pr[✏  µ10 (x), U 2 R10 (z)\R10 (z 0 )]

= Pr[✏  µ10 (x), U 2

1 0 + (z, z )]

⇤ ⇤ Pr[✏  µ10 (x), U 2 R10 (z 0 )\R10 (z)]

Pr[✏  µ10 (x), U 2

1

(z, z 0 )],

where the second equality is by (3.12) and the last equality is by the following derivation: ⇥ ⇤ ⇤ ⇥ ⇤ ⇤ ⇤ ⇤ ⇤ ⇤ R10 (z)\R10 (z 0 ) = R10 (z) \ R1 (z 0 )c \ R10 (z 0 )c [ R10 (z) \ R1 (z 0 ) \ R10 (z 0 )c ⇥ ⇤ ⇤ ⇥ ⇤ 0 ⇤ ⇤ = R10 (z) \ R1 (z 0 )c [ R10 (z ) \ R1 (z) \ R10 (z 0 )c =

1 0 + (z, z ),

where the first equality is by the distributive law and U = R1 (z 0 )c [ R1 (z 0 ), the second ⇤c \ R⇤c (the first term) and by Assumption ES (the second term), equality is by R1c = R10 01 ⇤ (z 0 ) \ R (z)} \ R⇤ (z 0 )c being and the last equality is by the definition of 1+ (z, z 0 ) and {R10 1 10 ⇤ (z 0 )\R⇤ (z) = 1 (z, z 0 ) using Assumption ES empty. Analogously, one can show that R10 10 and the definition of 1 (z, z 0 ). Likewise, ⇤ h01 = Pr[✏  µ01 (x), U 2 R01 (z)]

⇤ Pr[✏  µ01 (x), U 2 R01 (z 0 )]

⇤ ⇤ = Pr[✏  µ01 (x), U 2 R01 (z)\R01 (z 0 )]

= Pr[✏  µ01 (x), U 2

2 0 + (z, z )]

⇤ ⇤ Pr[✏  µ01 (x), U 2 R01 (z 0 )\R01 (z)]

Pr[✏  µ01 (x), U 2

2

(z, z 0 )].

Also, by the definitions of the partitions, h11 = Pr[✏  µ11 (x), U 2 = Pr[✏  µ11 (x), U 2

(z, z 0 )] 1

(z, z 0 )] + Pr[✏  µ11 (x), U 2

2

(z, z 0 )]

and h00 = =

Pr[✏  µ00 (x), U 2 Pr[✏  µ00 (x), U 2

0 + (z, z )] 1 0 + (z, z )]

30

Pr[✏  µ00 (x), U 2

2 0 + (z, z )].

Now combining all the terms yields h(z, z 0 , x) = Pr[✏  µ11 (x), U 2

+ Pr[✏  µ11 (x), U 2 + Pr[✏  µ10 (x), U 2 + Pr[✏  µ01 (x), U 2

1

(z, z 0 )] 2

(z, z 0 )]

1 0 + (z, z )] 2 0 + (z, z )]

Pr[✏  µ10 (x), U 2

1

Pr[✏  µ01 (x), U 2 Pr[✏  µ00 (x), U 2 Pr[✏  µ00 (x), U 2

(z, z 0 )] 2

(z, z 0 )]

1 0 + (z, z )] 2 0 + (z, z )].

Then by Assumption M(i), µ1,d s (x) µ0,d s (x) share the same signs for all s and 8d {0, 1} and therefore sgn{h(z, z 0 , x)} = sgn µ1,d s (x) µ0,d s (x) . Lastly, to exploit the variation of X, define20 ˜ z 0 ; x 0 , x1 , x h(z, ˜1 , x2 ) ⌘ Pr[✏  µ11 (x2 ), U 2

1

+ Pr[✏  µ11 (x2 ), U 2 + Pr[✏  µ10 (x1 ), U 2 + Pr[✏  µ01 (˜ x1 ), U 2

(z, z 0 )] 2

(z, z 0 )]

1 0 + (z, z )] 2 0 + (z, z )]

Pr[✏  µ10 (x1 ), U 2

1

s

2

(z, z 0 )]

Pr[✏  µ01 (˜ x1 ), U 2 Pr[✏  µ00 (x0 ), U 2 Pr[✏  µ00 (x0 ), U 2

2

(z, z 0 )]

1 0 + (z, z )] 2 0 + (z, z )],

˜ then the results of Lemma 3.2 will hold with h replaced by h. Remark 4.1. Even if we assume that the joint distribution of the first-stage unobservables is known, the bounding strategy of Ciliberto and Tamer (2009) does not work in our setting. To see this, following Ciliberto and Tamer (2009), ⇤ ⇤ h10 + h01 = Pr[✏  µ10 (x), U 2 R10 (z)] + Pr[✏  µ01 (x), U 2 R01 (z)] ⇤ Pr[✏  µ10 (x), U 2 R10 (z 0 )]

⇤ Pr[✏  µ01 (x), U 2 R01 (z 0 )]

has a lower bound min min Pr[✏  µ10 (x), U 2 R10 (z)] + Pr[✏  µ01 (x), U 2 R01 (z)] max 0 Pr[✏  µ10 (x), U 2 R10 (z )]

max 0 Pr[✏  µ01 (x), U 2 R01 (z )],

where Rdmin and Rdmax denote the minimal and maximal possible regions that predict d 2 {(1, 0), (0, 1)}. Note that the first two terms and the last two terms cannot be combined. The first and the third terms as well as the second and the last terms can be combined but do not yield the di↵erence of regions that correspond to the regions yielded from h11 + h00 .

4.2

Player-Specific Outcomes

Henceforth, we considered a scalar Y that may represent an outcome common to all players in a given market or a geographical region. The outcome, however, can also be an outcome that is specific to each player. In this regard, consider a vector of outcomes Y = (Y1 , ..., YS ) where each element Ys is a player-specific outcome. An interesting example of this setting may be where Y is also an equilibrium outcome from strategic interaction not only through D but also through itself. In this case, it would become important to have a vector of unobservables 20 Note that we cannot assign a di↵erent value of x for µ10 (x) and µ01 (x), otherwise we cannot apply the assertion of Assumption M(i) in the proof.

31

even after assuming e.g., rank invariance, since we may want to include ✏D = (✏1,D , ..., ✏S,D ), where ✏s,D is an unobservable directly a↵ecting Ys .21 We may also want to include a vector of observables of all players X = (X1 , ..., XS ), where Xs directly a↵ects Ys . Then interaction among Ys can be modeled via a reduced-form representation: Ys = ✓s (D, X, ✏D ),

s 2 {1, ..., S}.

In firms’ entry, the first-stage scalar unobservable Us may represent each firm’s unobserved fixed cost (while Zs captures observed fixed cost). The vector of unobservables in the playerspecific outcome equation represents multiple shocks, such as the player’s demand shock and variable cost shock, and other firms’ variable cost shocks and demand shocks. Unlike in a linear model, it would be hard to argue that these errors are all aggregated in a scalar variable in this nonlinear outcome model, since it is not known in which fashion they enter the equation.

4.3

Relation to Manski (2013)

Manski (2013) introduces a framework for social interaction where responses (i.e., outcomes) of agents are dependent on one another through their treatments. The framework relaxes the stable unit treatment value assumption (SUTVA) by allowing interaction across the units. Our framework is similar to Manski (2013) in that we also allow interaction among outcomes of players through their treatments, as we discuss in Section 4.2. The di↵erence is that we consider interaction across treatment/player unit s, whereas he considers interaction across observational unit i. Furthermore, we explicitly model the selection process of how treatments are determined simultaneously through players’ strategic interaction. His model, following his earlier work (e.g., Manski and Pepper (2000)), stays silent about the process. Despite the di↵erence, the two settings share a similar spirit of departing from the SUTVA. The shape restrictions we impose are related to the assumptions of Manski (2013) for the treatment response, which we compare here. First of all, Assumption SY(i) appears in Manski as an anonymity assumption. Also, we find that Assumptions SY(i) and SY2(i) are related to the constant TR (CTR) assumption in Manski, although he assumes anonymity separate from this assumption. The CTR assumption states that, with d = (di )N i=1 , c(d) = c(d0 ) =) Yd = Yd0 . As noted in Manski, c(d) is an e↵ective treatment in that, as long as c(d) stays constant, the response does not change. SY(i) and SY2(i) can be restated using this concept with a particular choice of c(d): with d = (ds )Ss=1 , c(d) = c(d0 ) =) E[Yd |X = x, U = u] = E[Yd0 |X = x, U = u] for given x 2 X and a.e. u, where c(d) is chosen such that the game for treatment decisions has a unique equilibrium in terms of c(d). The conditional symmetry assumption (Assumption SY(i)) can be seen as one example of this, where the game has a unique equilibrium in terms of c(d) that is invariant to permutation, such as the numeber of players who choose to P take the action (c(d) = Ss=1 ds ). Likewise, SY2(i) corresponds to c(d) = (c1 (d), ..., cG (d)) 21

In this case, Assumption R should be imposed on ✏s,D for each s.

32

P Sg g with cg (d) = s=1 ds . There can certainly be other choices of c(d) that delivers a unique equilibrium in each game, although we do not explore this further.

4.4

Point Identification of the ATE

When there exist player-specific excluded instruments of large support, we point identify the ATEs. In this case, the shape restrictions (especially on the outcome function) are not needed. The following assumption holds for each s 2 {1, ..., S}. Assumption NU2. For each d Lebesgue density.

s

2 D

s,

⌫ s (d

s , ·)|(X, Z s )

has an everywhere positive

Assumption NU2 is stronger than Assumption NU. It imposes not only the exclusion restriction of NU but also a player-specific exclusion restriction and large support. Theorem 4.1. In model (2.1) and (2.2), suppose Assumptions IN, E and NU2 hold. Then the ATE in (2.4) is identified. The identification strategy is to employ the identification at infinity argument based on Assumption NU2, which simultaneously solves the multiple equilibria problem and the endogeneity problem. Suppose S = 2 and Zs is scalar for illustration; the general case can be proved analogously. For example, to identify E[Y11 |X], consider E[Y |D = (1, 1), X = x, Z = z] = E[Y11 |D = (1, 1), X = x, Z = z] = E[✓(1, 1, x, ✏11 )|⌫ 1 (1, z1 )

U1 , ⌫ 2 (1, z2 )

U2 ]

! E[✓(1, 1, x, ✏11 )] = E[Y11 |X = x], where the second equation is by Assumption IN, and the convergence is by Assumption NU2 with z1 ! 1 and z2 ! 1. Likewise, E[Y00 |X = x] can be identified. The identification of E[Y10 |X = x] and E[Y01 |X = x] can be achieved by similar reasoning. Note that D = (1, 0) or D = (0, 1) can be predicted as an outcome of multiple equilibria. When either (z1 , z2 ) ! (1, 1) or (z1 , z2 ) ! ( 1, 1) occurs, however, a unique equilibrium is guaranteed as a dominant strategy, i.e., D = (1, 0) or D = (0, 1), respectively. Based on these results, we can (point) identify all the ATE’s.

5

The LATE

The results of Proposition 3.1 can be used to establish a framework that defines the LATE parameter for multiple treatments that are generated by strategic interaction. In this section, given model (2.1)–(2.2), we only maintain the assumptions on the equations for Ds such as the conditional symmetry in the payo↵ functions, but not the assumptions on the equation for Y . In particular, we no longer require Assumptions R, M(i) and SY(i). In the case of a single binary treatment, there is well-known equivalence between the LATE monotonicity assumption and the specification of a selection equation (Vytlacil (2002)). This equivalence result is inapplicable to our setting due to the simultaneity in the first stage.22 But Proposition 22

For example in a two-player entry game, when entry costs Z1 and Z2 increase, it may be the case that in one market only the first player enters given this increase as her monopolistic profit o↵sets the increased

33

3.1 implies that, under Assumptions SS and SY(ii), there is in fact a monotonic pattern in the way the equilibrium regions lie in the space of U as written in (3.23). This monotonicity, formalized in Corollary 3.1, allows us to establish equivalence between a version of the LATE monotonicity assumption and the simultaneous selection model (2.2). We first introduce a relevant counterfactual outcome that can be used in defining the LATE parameter. For M ✓ D, introduce a selection variable DM 2 M that selects an equilibrium DM = d when facing a set of equilibria, M . This variable is useful in decomposing the event D = d into two sequential events: D = d is equivalent to an event that D 2 M and DM = d. Trivially, we have DD = D. When M ( D is not a singleton, DM is not observed precisely because the equilibrium selection mechanism is not observed in general.23 Using DM , we define a joint counterfactual outcome YM as an outcome had D been an element in M : X YM = 1[DM = d]Yd . (5.1) d2M

Conditional on D 2 M , YM is assigned to be one of the usual counterfactualP outcome Yd based on the equilibrium being selected. When M = D, we can write Y = YD = d2D 1[D = d]Yd , which yields the standard expression that relates the observed with the potential SK outcome K ˜ ˜ outcomes. Moreover, for any partition {Mk }k=1 such that k=1 Mk = D, we can express X

d2D

1[D = d]Yd =

K X X

˜k k=1 d2M

˜ k ]1[D ˜ = d]Yd = 1[D 2 M Mk

K X k=1

˜ k ]Y ˜ , 1[D 2 M Mk

where the first equality is by the equivalence of the events mentioned above and the second equality is by (5.1). Therefore, we can establish the following relationship: Y =

K X k=1

˜ k ]Y ˜ , 1[D 2 M Mk

(5.2)

˜ k. that is, YM˜ k is observed when D 2 M Now, consider a treatment of dichotomous states (e.g., dichotomous market structures): for j = 0, ..., S 1, D 2 M >j vs. D 2 M j ,

S S where M j ⌘ jk=0 Mk and M >j ⌘ Sk=j+1 Mk are previously defined; e.g., for S = 2 and j = 1, M 1 = {(1, 0), (0, 1), (0, 0)} and M >1 = {(1, 1)}. Consider a corresponding treatment cost, while in another market only the second player enters by the same reason applied to this player. The direction of monotonicty is reversed in these two markets. 23 Alternatively, following the notation of Heckman et al. (2006), we can introduce a equilibrium selection indicator DM,d that indicates that an equilibrium d is selected among equilibria in a set M : ( 1 if d 2 M is selected, DM,d = 0 o.w. Then, DM = d if and only if DM,d = 1.

34

e↵ect: YM >j

YM j ,

where Y = 1[D 2 M >j ]YM >j + 1[D 2 M j ]YM j by (5.2). This quantity is the e↵ect of being treated with an equilibrium of at least j + 1 entrants relative to being treated with an equilibrium of at most j entrants. We now establish that a version of the LATE monotonicity assumption for this treatment 1[D 2 M >j ] of dichotomous states is implied by the model specification (2.2), using Corollary 3.1. Let D(z) ⌘ (D1 (z1 ), ..., DS (zS )) where Ds (zs ) is the potential treatment decision had the player s been assigned Zs = zs . Lemma 5.1. Under Assumptions M(ii), SS and SY(ii), the first-stage game (2.2) implies that, for any z, z 0 2 Z and j = 0, ..., S 1, D(z 0 ) 2 M >j ) D(z) 2 M >j w.p.1 or >j D(z) 2 M ) D(z 0 ) 2 M >j w.p.1

(5.3)

The condition 5.3 is a generalized version of Imbens and Angrist (1994)’s monotonicity assumption. Proof. By Corollary 3.1, (0, ⌫js (zs )] 8s are the only relevant intervals in defining R>j (z) = D\Rj . For given z, z 0 2 Z, suppose without loss of generality that in Assumption M(ii), ⌫ds s (zs ) ⌫ds s (zs0 ) 8d s and 8s. Then it follows that R>j (z) ◆ R>j (z 0 ), and thus w.p.1, 1[D(z) 2 M >j ] = 1[U 2 R>j (z)]

1[U 2 R>j (z 0 )] = 1[D(z 0 ) 2 M >j ].

Now, Lemma 5.1 allows us to give the IV estimand an LATE interpretation in our model: Theorem 5.1. Given model (2.1)–(2.2), suppose Assumptions IN, NU, M(ii), SS and SY(ii) hold. Then it satisfies that, for any j = 0, ..., S 1, P

h(z, z 0 ) E[Y |Z = z] = D 0 Pr[D 2 M >j |Z = z] k>j hk (z, z ) = E[YM >j

E[Y |Z = z 0 ] Pr[D 2 M >j |Z = z 0 ]

YM j |D(z) 2 M >j , D(z 0 ) 2 M j ].

The LATE parameter E[YM >j YM j |D(z) 2 M >j , D(z 0 ) 2 M j ] is the average of treatment e↵ect YM >j YM j for a subgroup of “markets” that form more competitive markets (with at least j + 1 entrants) when players face Z = z, but form less competitive markets (with at most j entrants) when players face Z = z 0 . For concreteness, suppose S = 2, j = 1, Zs is each airline company’s entry cost and Y is the pollution level in a market. The LATE E[Y{(1,1)}

Y{(1,0),(0,1),(0,0)} |D(z) = (1, 1), D(z 0 ) 2 {(1, 0), (0, 1), (0, 0)}]

35

is the e↵ect of the existence of competition on pollution levels for markets consist of “compliers.”24 It is the average di↵erence of potential pollution levels in a duopolistic market (i.e., competition) versus a monopolistic or non-operating market (i.e., no competition) for the subgroups of markets that form a duopoly when companies are facing low cost (Z = z) but form a monopoly or do not operate when facing high cost (Z = z 0 ). Figure 7 depicts this subgroup of markets. In this example, the LATE monotonicity assumption (implied by the entry game of strategic substitute with symmetric payo↵s) rules out those markets that respond to entry cost as “defiers.” The LATE becomes the ATE when 1 = Pr[D(z) = (1, 1), D(z 0 ) 2 {(1, 0), (0, 1), (0, 0)}] = Pr[D = (1, 1)|Z = z] Pr[D 2 {(1, 0), (0, 1), (1, 1)}|Z = z 0 ], which is related to the identification at infinity argument in Theorem 4.1. In general, the LATE can be defined with YM YM 0 for any two partitioning sets M and M 0 of D (i.e., D = M [M 0 with M \M 0 = ;) as long as 1[D(z) 2 M ] = 1 1[D(z) 2 M 0 ] satisfies the LATE monotonicity assumption. Lemma 5.1 ensures that our simultaneous selection model imposes this monotonicity for a particular partition, M = M >j and M 0 = M j . Remark 5.1. Similarly, it may be possible to recover the marginal treatment e↵ect (MTE) of Heckman and Vytlacil (1999, 2005, 2007). Given our setting, it should be a transition-specific MTE for YMj YMj 1 . The identification of this MTE would require continuous variation of Z. For discrete Z, the approach by Brinch et al. (2017) can be applied by imposing structures on the MTE function. The MTE approach is also related to Lee and Salani´e (2016). Note that the LATE can be expressed as an integrated MTE. The LATE parameter considered in this paper is di↵erent from their LATE parameter. Instead of considering the ATE with two di↵erent treatment values for what they call “super-compliers” group, we consider an average dichotomous treatment e↵ect for compliers group between the dichotomous choices. Whether one version is more appropriate than theirs depends on applications. Remark 5.2. The equilibrium selection mechanism may di↵er across di↵erent counterfactual worlds. In terms of our notation, DM (z) may di↵er from DM (z 0 ), where DM (z) is the counterfactual variable of DM . Note that not only the equilibrium being selected is di↵erent but also the selection mechanism can be di↵erent. This feature may be emphasized by writing DM (z) = z (z, U ) where the functional form of the equilibrium selection function may also change in z. By considering YM instead of Yd , however, we can be agnostic about the selection mechanism, i.e., about the specification of z (·, ·). The definition (5.1) asserts that Yd can be meaningfully analyzed within the current framework only when the equilibrium being selected is known.

6

Numerical Studies

To illustrate the main results of this paper, we calculate the bounds on the ATE using the following data generating process: Yd = 1{˜ µd + X 24

✏},

In this multi-agent multi-treatment scenario, compliers are defined as those players whose behaviors are such that market structures are formed in conformance with the LATE monotonicity assumption (5.3). Unlike in the traditional setting (Imbens and Angrist (1994)) where compliers are defined in terms of the subset of population, the subpopulation in the present setting is the collection of the markets consist of the complying players.

36

D1 = 1{ 2 D2 +

1 Z1

V1 },

D2 = 1{ 1 D1 +

2 Z2

V2 },

where (✏, V1 , V2 ) are drawn from a joint normal distribution with zero means and each correlation coefficient being 0.5, and drawn independent of (X, Z). We draw Zs (s = 1, 2) and X from multinomial, allowing Zs to take two values, Zs = { 1, 1} and X to take either three values, X = { 1, 0, 1}, or fifteen values, X = { 1, 67 , 57 , ..., 57 , 67 , 1}. Being consistent with Assumptions M and SY, we choose µ ˜11 > µ ˜10 = µ ˜01 > µ ˜00 , and with Assumption SS, we choose 1 < 0 and 2 < 0. Without loss of generality, we choose positives values for 1 , ˜11 = 0.25, µ ˜10 = µ ˜01 = 0 and µ ˜00 = 0.25. For default values, 2 , and . Specifically, µ = 1 and = 0.5. 1 = 2 ⌘ = 0.1, 1 = 2 ⌘ In this exercise, we focus on the following ATE: E[Y11

Y00 |X = 0] = (˜ µ11 )

(˜ µ00 ),

where (·) is the CDF of the standard normal distribution. Given the parameter values, E[Y11 Y00 |X = 0] = 0.2. For h(z, z 0 , x), we consider z = (1, 1) and z 0 = ( 1, 1). Note that H(x) = h(z, z 0 , x) and H(x, x0 , x00 ) = h(z, z 0 ; x, x0 , x00 ) since Zs is binary. Then we can derive the sets XdU (0; d0 ) and XdL (0; d0 ) for each d 2 {(1, 1), (0, 0)} and d0 6= d in Theorem 3.1. Based on our design, H(0) > 0 and thus the bounds when we use Z only are, with x = 0, max Pr[Y = 1, D = (0, 0)|z, x]  Pr[Y00 = 1|x]  min Pr[Y = 1|z, x], z2Z

z2Z

and max Pr[Y = 1|z, x]  Pr[Y11 = 1|x]  min {Pr[Y = 1, D = (1, 1)|z, x] + 1 z2Z

z2Z

Pr[D = (1, 1)|z, x]} .

Using both Z and X, we have narrower bounds. For example when |X | = 3, with H(0, 1, 1) < 0, the lower bound on Pr[Y00 = 1|X = 0] becomes max {Pr[Y = 1, D = (0, 0)|z, 0] + Pr[Y = 1, D 2 {(1, 0), (0, 1)}|z, 1]} . z2Z

With H(1, 1, 0) < 0, the upper bound on Pr[Y11 = 1|X = 0] becomes min {Pr[Y = 1, D = (1, 1)|z, 0] + Pr[Y = 1, D 2 {(1, 0), (0, 1)}|z, 1] + Pr[D = (0, 0)|z, 0]} . z2Z

For comparison, we calculate the bounds by Manski (1990) using Z. The Manski’s bounds are max Pr[Y = 1, D = (0, 0)|z, x]  Pr[Y00 = 1|x] z2Z

 min {Pr[Y = 1, D = (0, 0)|z, x] + 1 z2Z

37

Pr[D = (0, 0)|z]} ,

and max Pr[Y = 1, D = (1, 1)|z, x]  Pr[Y11 = 1|x] z2Z

 min {Pr[Y = 1, D = (1, 1)|z, x] + 1 z2Z

Pr[D = (1, 1)|z]} .

We also compare the estimated ATE using the specification of a standard linear IV model where the nonlinearity of the true DGP are ignored: ✓

Y = ⇡0 + ⇡1 D1 + ⇡2 D2 + X + ✏, ◆ ✓ ◆ ✓ ◆✓ ◆ ✓ ◆ D1 Z1 V1 10 11 12 = + + . D2 Z2 V2 20 21 22

Here the first stage is the reduced-form representation of the linear simulatneous equations model for strategic interaction. Under this specification, the ATE becomes E[Y11 Y00 |X = 0] = ⇡1 + ⇡2 , which is estimated via two-stage least squares (TSLS). The bounds calculated for the ATE are shown in Figures 8–11. Figure 8 shows how the bounds on the ATE change as the value of changes from 0 to 2.5. The larger is the stronger the instrument Z is. The first conspicuous result is that the TSLS estimate of the ATE is biased due the the problem of misspecification. Next, as expected, the Manski’s bounds and our proposed bounds converge to the true value of the ATE as the instrument becomes stronger. Overall, our bounds, with or without exploiting the variation of X, are much narrower than the Manski bounds.25 Notice that the sign of the ATE is identified in the whole range of as predicted by the first part of Theorem 3.1, in contrast to the Manski’s bounds. By using the additional variation from X with |X | = 3, we further narrow down the bounds, in particular the upper bounds on the ATE in this particular simulation design. Figure 9 depicts the bounds using X with |X | = 15, which yields narrower bounds than using X with |X | = 3. Figure 10 shows how the bounds change as the value of changes from 0 to 1.5, where a larger corresponds to a stronger exogenous variable X. The jumps in the upper bound are associated with the sudden changes in the signs of H( 1, 0, 1) and H(0, 1, 1). At least in this simulation design, the strength of X is not a crucial factor to obtain narrower bounds. In fact, based other simulation results (which are omitted in the paper), we conclude that the number of values X can take matters more than the dispersion of X (unless we pursue point identification of the ATE). Figure 11 shows how the width of the bounds is related to the extent to which the opponents’ actions D s a↵ect one’s payo↵, captured in . We vary the value of from 2 to 0, and when = 0, the players solve a single-agent optimization problem. Thus, heuristically, the bound at this point would be similar to the ones that can be obtained when Shaikh and Vytlacil (2011) is extended to a multiple-treatment setting with no simultaneity. In the figure, as the value of gets smaller, the bounds get narrower. 25 Although we do not make a rigorous comparison of the assumptions here, note that the bounds by Manski and Pepper (2000) under the semi-MTR is expected to be similar to ours. Their bounds, however, need to assume the direction of the monotonicity.

38

References Andrews, D. W., Schafgans, M. M., 1998. Semiparametric estimation of the intercept of a sample selection model. The Review of Economic Studies 65 (3), 497–517. 1 Bajari, P., Hong, H., Ryan, S. P., 2010. Identification and estimation of a discrete game of complete information. Econometrica 78 (5), 1529–1568. 1, 4.1.2 Berry, S. T., 1992. Estimation of a model of entry in the airline industry. Econometrica: Journal of the Econometric Society, 889–917. 1, 3, 3.2, 15 Bresnahan, T. F., Reiss, P. C., 1990. Entry in monopoly market. The Review of Economic Studies 57 (4), 531–553. 3.2 Bresnahan, T. F., Reiss, P. C., 1991. Entry and competition in concentrated markets. Journal of Political Economy, 977–1009. 3.2 Brinch, C., Mogstad, M., Wiswall, M., 2017. Beyond LATE with a discrete instrument. Journal of Political Economy, Forthcoming. 5.1 Chernozhukov, V., Hansen, C., 2005. An IV model of quantile treatment e↵ects. Econometrica 73 (1), 245–261. 3 Chesher, A., 2005. Nonparametric identification under discrete variation. Econometrica 73 (5), 1525–1550. 1 Chesher, A., Rosen, A. M., 2012. Simultaneous equations models for discrete outcomes: coherence, completeness, and identification. CeMMAP working paper, Centre for Microdata Methods and Practice. 3.2 Chiburis, R. C., 2010. Semiparametric bounds on treatment e↵ects. Journal of Econometrics 159 (2), 267–275. 3.4 Ciliberto, F., Murry, C., Tamer, E., 2016. Market structure and competition in airline markets. University of Virginia, Penn State University, Harvard University. 1 Ciliberto, F., Tamer, E., 2009. Market structure and multiple equilibria in airline markets. Econometrica 77 (6), 1791–1828. 1, 1, 3, 3, 3.2, 4.1 de Paula, A., 2013. Econometric analysis of games with multiple equilibria. Annu. Rev. Econ. 5 (1), 107–131. 4.1.2 Foster, A., Rosenzweig, M., 2008. Inequality and the sustainability of agricultural productivity growth: Groundwater and the green revolution in rural india. In: Prepared for the India Policy Conference at Stanford University. 5 Gentzkow, M., Shapiro, J. M., Sinkinson, M., 2011. The e↵ect of newspaper entry and exit on electoral politics. The American Economic Review 101 (7), 2980–3018. 3 Goolsbee, A., Syverson, C., 2008. How do incumbents respond to the threat of entry? Evidence from the major airlines. The Quarterly Journal of Economics 123 (4), 1611–1633. 2 39

Heckman, J., Pinto, R., 2015. Unordered monotonicity. University of Chicago. 1 Heckman, J. J., Urzua, S., Vytlacil, E., 2006. Understanding instrumental variables in models with essential heterogeneity. The Review of Economics and Statistics 88 (3), 389–432. 2, 23 Heckman, J. J., Vytlacil, E., 2005. Structural equations, treatment e↵ects, and econometric policy evaluation1. Econometrica 73 (3), 669–738. 5.1 Heckman, J. J., Vytlacil, E. J., 1999. Local instrumental variables and latent variable models for identifying and bounding treatment e↵ects. Proceedings of the national Academy of Sciences 96 (8), 4730–4734. 5.1 Heckman, J. J., Vytlacil, E. J., 2007. Econometric evaluation of social programs, part I: Causal models, structural models and econometric policy evaluation. Handbook of econometrics 6, 4779–4874. 5.1 Imbens, G. W., Angrist, J. D., 1994. Identification and estimation of local average treatment e↵ects. Econometrica 62 (2), 467–475. 1, 2, 5, 24 Jun, S. J., Pinkse, J., Xu, H., 2011. Tighter bounds in triangular systems. Journal of Econometrics 161 (2), 122–128. 2 Kalai, E., 2004. Large robust games. Econometrica 72 (6), 1631–1665. 3 Khan, S., Tamer, E., 2010. Irregular identification, support conditions, and inverse weight estimation. Econometrica 78 (6), 2021–2042. 1 Kline, B., Tamer, E., 2012. Bounds for best response functions in binary games. Journal of Econometrics 166 (1), 92–105. 3, 7 Lee, S., Salani´e, B., 2016. Identifying e↵ects of multivalued treatments. Columbia University. 1, 5.1 Manski, C. F., 1990. Nonparametric bounds on treatment e↵ects. The American Economic Review 80 (2), 319–323. 3, 6 Manski, C. F., 1997. Monotone treatment response. Econometrica: Journal of the Econometric Society, 1311–1334. 1, 3, 3.3 Manski, C. F., 2013. Identification of treatment response with social interactions. The Econometrics Journal 16 (1), S1–S23. 1, 3, 4.3 Manski, C. F., Pepper, J. V., 2000. Monotone instrumental variables: With an application to the returns to schooling. Econometrica 68 (4), 997–1010. 1, 3.3, 4.3, 25 Menzel, K., 2016. Inference for games with many players. The Review of Economic Studies 83, 306–337. 3 Mourifi´e, I., 2015. Sharp bounds on treatment e↵ects in a binary triangular system. Journal of Econometrics 187 (1), 74–81. 3.4 40

Pinto, R., 2015. Selection bias in a controlled experiment: The case of moving to opportunity. University of Chicago. 1 Schlenker, W., Walker, W. R., 2015. Airports, air pollution, and contemporaneous health. The Review of Economic Studies, rdv043. 1 Sekhri, S., 2014. Wells, water, and welfare: the impact of access to groundwater on rural poverty and conflict. American Economic Journal: Applied Economics 6 (3), 76–102. 5 Shaikh, A. M., Vytlacil, E. J., 2011. Partial identification in triangular systems of equations with binary dependent variables. Econometrica 79 (3), 949–955. 1, 3, 3.1, 3.3, 6, A.5 Tamer, E., 2003. Incomplete simultaneous discrete response model with multiple equilibria. The Review of Economic Studies 70 (1), 147–165. 1, 3.2 Vytlacil, E., 2002. Independence, monotonicity, and latent index models: An equivalence result. Econometrica 70 (1), 331–341. 1, 5 Vytlacil, E., Yildiz, N., 2007. Dummy endogenous variables in weakly separable models. Econometrica 75 (3), 757–779. 1, 6, 3, 3.1, 3.2 Walker, R. E., Keane, C. R., Burke, J. G., 2010. Disparities and access to healthy food in the united states: A review of food deserts literature. Health & place 16 (5), 876–884. 4

41

A A.1

Appendix Proof of Proposition 3.1

The following proposition is useful in proving Proposition 3.1: Proposition A.1. Let R and Q be sets defined by Cartesian products: R = Q Q = Ss=1 qs where rs and qs are intervals in R. Then the following holds: (i) If rs \ qs = ; for some s, then R \ Q = ;; (ii) If rs ⇠ qs 8s, then R ⇠ Q; (iii) If rs ⌧ qsQfor some s, then R ⌧ Q; (iv) R \ Q = Ss=1 rs \ qs ; Q (v) cl(R) = Ss=1 cl(rs ) where cl(·) is the closure of its argument.

QS

s=1 rs

and

The proof of this proposition follows directly from the definition of R and Q. Before proving Proposition 3.1(i), we prove (i0 ) that appears in the text below Proposition 3.1. We first show a simple case as a reference: Rej \ Rej 1 = ; for j = 1, ..., S. Note that 9 ( j ) 8 S < Y = Y ⇤ ⇤ Rej (z) = 0, ⌫js 1 (zs ) ⇥ ⌫js (zs ), 1 : ; s=1 s=j+1 9 (j 1 ) 8 S
s=j

⇣ i ⇣ i and the j-th coordinates are 0, ⌫jj 1 (zj ) in Rej and ⌫jj 1 (zj ), 1 in Rej 1 . Since these two intervals are disjoint, by Proposition A.1(i), we can conclude that Rej \ Rej 1 = ;. Now to prove (i0 ), we equivalently prove Rdj \ Rdj t = ; for t 1 and 0  j t  S t, and draw insights from the simple case. Note that dj t⇣contains S i j + t zeros. Then⇣ there existsi ⇤ ⇤ s⇤ such that djs⇤ = 1 but djs⇤ t = 0, i.e., Us⇤ 2 0, ⌫js 1 (zs⇤ ) in Rdj but Us⇤ 2 ⌫js t (zs⇤ ), 1 in Rdj t . Suppose not. Then 8s such that djs = 1, it must hold that djs t = 1. This implies that dj t has at ⇣least as many elements of unity as dj , which is contradiction as i ⇣ i ⇤ ⇤ t 1. Therefore since 0, ⌫js 1 (zs⇤ ) and ⌫js t (zs⇤ ), 1 are disjoint, Rdj and Rdj t are ⇣ ⇤ i ⇤ ⇤ disjoint. When t 2, by Assumption SS, ⌫js t (zs⇤ ) > ⌫js 1 (zs⇤ ) and therefore ⌫js t (zs⇤ ), 1 ⇣ i ⇤ s ⇤ and 0, ⌫j 1 (zs ) are disjoint and thus the same conclusion follows. Also when t = 1, ⇣ ⇤ i ⇣ i ⇤ ⌫js 1 (zs⇤ ), 1 and 0, ⌫js 1 (zs⇤ ) are obviously disjoint. This proves (i0 ). For Proposition 3.1(i), one can conclude from (i0 ) that Rdj is disjoint to Rdj 0 for any S S 2 Mj 0 and hence is disjoint to d2M 0 Rd . This is true 8dj 2 Mj , and therefore d2Mj Rd j S is disjoint to d2M 0 Rd . 0 dj

j

To prove (ii0 ) of the text for two cartesian products, by Proposition A.1(ii), one needs to show that each pair of intervals of the same coordinate are neighboring intervals. ⇣ This is iimmediately true for Rej and Rej 1 above, since (a) for coordinates 1  s  j 1, 0, ⌫js 1 (zs ) ⇠ 42



i ⇣ i ⇣ i 0, ⌫js 2 (zs ) with a nonempty intersection since 0, ⌫js 1 (zs ) ⇢ 0, ⌫js 2 (zs ) ; (b) for coor⇣ i ⇣ i dinate s = j, 0, ⌫jj 1 (zj ) ⇠ ⌫jj 1 (zj ), 1 and they are disjoint; and (c) for coordinates ⇣ i ⇣ i ⇣ i j + 1  s  S, ⌫js (zs ), 1 ⇠ ⌫js 1 (zs ), 1 with a nonempty intersection since ⌫js (zs ), 1 ⇣ i ⌫js 1 (zs ), 1 . Now consider Rdj and Rdj 1 . In dj and dj 1 , there exists s⇤ such that djs⇤ = 1 but djs⇤ 1 = 0 by the same argument as above with t = 1. The rest of the elements in dj and dj 1 fall into one of the four types: for s 6= s⇤ , (a0 ) djs = djs 1 = 1; (b0 ) djs = 1 but djs 1 = 0; (c0 ) djs = djs 1 = 0; and (d0 ) djs = 0 but djs 1 = 1. See Table 1 in the main text for an example of this result. We want to express the corresponding intervals of Us that generate these values of djs and djs 1 . By definition, the number of ones (and zeros) in dj and dj 1 di↵ers only by one, which happens in each vector’s s⇤ -th element. Knowing this, for these pairs of djs and djs 1 in (a0 )–(d0 ), we can determine the decision of the opponents of player s (i.e., the value of j in ⌫js ) which is useful to construct the payo↵ of s, and thus the corresponding ⇣interval of U we that pairsi i s . Specifically, ⇣ i can determine ⇣ i the⇣corresponding i interval ⇣ s s 00 s 00 s s 00 are: (a ) 0, ⌫j 1 (zs ) and 0, ⌫j 2 (zs ) ; (b ) 0, ⌫j 1 (zs ) and ⌫j 1 (zs ), 1 ; (c ) ⌫j (zs ), 1 ⇣ i ⇣ i ⇣ i and ⌫js 1 (zs ), 1 ; (d00 ) ⌫js (zs ), 1 and 0, ⌫js 2 (zs ) . It is straightforward that the pairs in (a00 )–(c00 ) are neighboring sets by the same arguments as for (a)–(c). The pair in (d00 ) are also neighboring sets because ⌫js (zs ) < ⌫js 2 (zs ) by Assumption SS. Lastly, for coordinate s⇤ , ⇣ i ⇣ ⇤ i ⇤ 0, ⌫js 1 (zs⇤ ) ⇠ ⌫js 1 (zs⇤ ), 1 as in (b00 ). Therefore, Rdj ⇠ Rdj 1 . For Proposition 3.1(ii), one can conclude from (ii0 ) that Rdj neighbors Rdj 1 for any S 2 Mj 1 and hence neighbors d2Mj 1 Rd . This is true 8dj 2 Mj , and therefore S S d2Mj Rd ⇠ d2Mj 0 Rd . The result in Proposition 3.1(iii) follows⇣from the proof existsi s⇤ i of (i’) above that⇣ there ⇤ ⇤ such that djs⇤ = 1 but djs⇤ t = 0, i.e., Us⇤ 2 0, ⌫js 1 (zs⇤ ) in Rdj but Us⇤ 2 ⌫js t (zs⇤ ), 1 in ⇣ i ⇤ ⇤ ⇤ Rdj t . When t 2, by Assumption SS, ⌫js t (zs⇤ ) > ⌫js 1 (zs⇤ ) and therefore 0, ⌫js 1 (zs⇤ ) ⌧ ⇣ ⇤ i ⌫js t (zs⇤ ), 1 which implies that, by Proposition A.1(iii), their Cartesian products are not neighboring sets. dj 1

Lastly, we prove Proposition 3.1(iv). We consider a S-dimensional hyper-grid for (0, 1]S that runs through all possible values of ⌫js across j = 0, ..., S for each s = 1, ..., S. Specifically, under Assumption SS and by conveniently letting ⌫Ss = 0 and ⌫ s 1 = 1, the hyper-grid is a Cartesian product of 1-dimensional grids defined by 0 = ⌫Ss < ⌫Ss 1 < · · · < ⌫0s < ⌫ s 1 = 1 for each coordinate s. Let each hyper-cube in this hyper-grid be represented as ⇤ ⇤ ⇤ r1 (j1 ) ⇥ r2 (j2 ) ⇥ · · · ⇥ rS (jS ) ⌘ ⌫j11 , ⌫j11 1 ⇥ ⌫j22 , ⌫j22 1 ⇥ · · · ⇥ ⌫jSS , ⌫jSS 1 , where rs (·) are intervals implicitly defined in the equation and labeled with js = 0, ..., S.

43

Using these notations, Rej for j = 0, ..., S can be expressed as 8 8 9 8 99 j [ j S S <
jj =j jj+1 =0

(A.1)

(A.2)

jS =0

where the second equality is by iteratively applying the following: for sets A, B and C being Cartesian products (including intervals as a trivial case), (A [ B) ⇥ C = (A ⇥ C) [ (B ⇥ C). More generally, Rdj for some (·) 2 ⌃ can be defined as R dj =

=

8 < : 8 < :

U : (U

U : (U

(1) , ..., U (S) )

(1) , ..., U (S) )

2 2

8 j [ S
r

(s) (k)

s=1 k=j

S [

j1 =j

···

S [

·

j [

9 = ;

jj =j jj+1 =0



···

8 j S < Y [ :

r

s=j+1 k=0

j [

jS =0

r

(1) (j1 )

(s) (k)

99 == ;;

⇥ ··· ⇥ r

(A.3)

(S) (jS )

9 = ;

.

(A.4)

Below we show that any hyper-cube r1 (j1 ) ⇥ r2 (j2 ) ⇥ · · · ⇥ rS (jS ) is contained in one of Rdj ’s for some j and (·). We first proceed by showing that there are hyper-cubes that are contained in Rej ’s. We then show that any hyper-cube can be transformed using a permutation function into a hyper-cube contained in Rej , which means that the original hyper-cube is contained in some Rdj which is a “permutated version” of Rej . Claim 1: For j1 j2 · · · jS , r1 (j1 ) ⇥ r2 (j2 ) ⇥ · · · ⇥ rS (jS ) is contained in Rej for some j  j1 . Claim 2: For any {j1 , ..., jS }, r1 (j1 ) ⇥ r2 (j2 ) ⇥ · · · ⇥ rS (jS ) is contained in Rdj for j  max{j1 , ..., jS }. Proof of Claim 1: Start with a hyper-cube at a corner: ⇤ ⇤ ⇤ r1 (0) ⇥ r2 (0) ⇥ · · · ⇥ rS (0) ⌘ ⌫01 , 1 ⇥ ⌫02 , 1 ⇥ · · · ⇥ ⌫0S , 1 .

This hyper-cube is contained in Re0 as the two in fact coincide. Consider the next hyper-cube on the grid along the 1-st coordinate: r1 (1) ⇥ r1 (0) · · · ⇥ rS (0).

44

This hyper-cube is contained in Re1 as Re1 =

S [

·

1 [

j1 =1 j2 =0

···

1 [

jS =0

r1 (j1 ) ⇥ · · · ⇥ rS (jS ).

We move to the 2-nd coordinate holding the 1-st coordinate fixed. Then r1 (1)⇥r2 (1)⇥r3 (0)⇥ · · · ⇥ rS (0) is still contained in Re1 . Likewise, from r1 (1) ⇥ r2 (1) ⇥ r3 (1) ⇥ r4 (0) ⇥ · · · ⇥ rS (0) all the way to r1 (1) ⇥ · · · ⇥ rS (1), these hyper-cubes are contained in Re1 . Now consider the next hyper-cube along the 1-st coordinate, i.e., r1 (2)⇥r2 (0)⇥· · ·⇥rS (0). This is contained in Re1 . We move to the next coordinates holding the 1-st coordinate fixed. Then r1 (2) ⇥ r2 (1) ⇥ r3 (0) ⇥ · · · ⇥ rS (0), r1 (2) ⇥ r2 (1) ⇥ r3 (1) ⇥ r4 (0) ⇥ · · · ⇥ rS (0) to r1 (2)⇥r1 (1)⇥· · ·⇥rS (1) are still contained in Re1 . But the next r1 (2)⇥r2 (2)⇥r3 (0)⇥· · ·⇥rS (0) is no longer contained in Re1 but is contained in Re2 =

S [

·

S [

·

2 [

j1 =2 j2 =2 j3 =0

···

2 [

jS =0

r1 (j1 ) ⇥ r2 (j2 ) ⇥ · · · ⇥ rS (jS ).

Likewise, following the same sequential rule, r1 (2) ⇥ r2 (2) ⇥ r3 (1) ⇥ r4 (0) ⇥ · · · ⇥ rS (0), r1 (2) ⇥ r2 (2) ⇥ r3 (1) ⇥ r4 (1) ⇥ r5 (0) ⇥ · · · ⇥ rS (0) to r1 (2) ⇥ · · · ⇥ rS (2) are all contained in Re2 . This argument can iteratively be applied to all other hyper-cubes r1 (j1 ) ⇥ r2 (j2 ) ⇥ · · · ⇥ rS (jS ) generated by the same sequential rule maintaining j1 j2 · · · jS . This proves Claim 1. Proof of Claim 2: In general, consider r1 (j1 ) ⇥ · · · ⇥ rS (jS ) for given j1 , ..., jS . There exists permutation (·) and a sequence {ks }Ss=1 such that js = k (s) and k1 k2 · · · kS . Then a hyper-cube r in the space of (U

(1) (j1 )

⇥ ··· ⇥ r

(1) , ..., U (S) )

(S) (jS )

=r

(1) (k (1) )

⇥ ··· ⇥ r

(S) (k (S) )

or equivalently r1 (k1 ) ⇥ · · · ⇥ rS (kS )

in the space of (U1 , ..., US ) satisfies the condition in Claim 1 and thus is contained in Rej 1 (·) be the inverse of (·). Note that 1 (·) itself is a for some j  kS by Claim 1. Let permutation function. In general, for permutation ˜ (·), if r1 (k1 ) ⇥ · · · ⇥ rS (kS ) is contained in Rej for some j, then r ˜ (1) (k1 ) ⇥ · · · ⇥ r ˜ (S) (kS ) is contained in Rdj by definition. Therefore, ˜ since r 1 ( (s)) (js ) = rs (js ) 8s, we can conclude that r1 (j1 ) ⇥ · · · ⇥ rS (jS ) is contained in Rdj for j  kS = j 1 (S) . This proves Claim 2. 1

A.2

Proof of Corollary 3.1

First, consider a pair of Rdj+1 (z) and Rdj (z) (for dj+1 2 Mj+1 and dj 2 Mj ) in Rj+1 (z) and Rj (z), respectively. From the proof of Proposition 3.1(ii), we know that the elements in dj+1 and dj fall into one of the four types (a0 )–(d0 ) (including si⇤ ), and⇣thus the corresponding pairsi ⇣ i ⇣ † s s † s of intervals fall into one of the four types: (a ) 0, ⌫j (zs ) and 0, ⌫j 1 (zs ) ; (b ) 0, ⌫j (zs ) 45

⇣ i ⇣ i ⇣ i ⇣ i ⇣ i s (z ), 1 and ⌫ s (z ), 1 ; (d† ) ⌫ s (z ), 1 and 0, ⌫ s (z ) . and ⌫js (zs ), 1 ; (c† ) ⌫j+1 s j s j+1 s j 1 s

Definition A.1. For two Cartesian products R and Q such that R ⇠ Q and R \ Q = ;, their border R k Q is a set that satisfies R k Q ⌘ cl(R) \ cl(Q). Also, the border R k Q is a hyper-surface that is common to cl(R) and cl(Q). By Proposition 3.1, Rdj+1 (z) ⇠ Rdj (z) and Rdj+1 (z) \ Rdj (z) = ;, and thus their border † † can be properly defined. ⇣ i ⇣ Given (a i )–(d ), we show that Rdj+1 (z) k Rdj (z) is a Cartesian product of 0, ⌫js (zs ) k ⌫js (zs ), 1 = {⌫js (zs )} (for some s) and other intervals. Specifically, by applying Proposition A.1(iv) and (v) with R = cl(Rdj+1 (z)), Q = cl(Rdj (z)), and rs and qs being the closures of the intervals in (a† )–(d† ), we have Y Rdj+1 (z) k Rdj (z) = {⌫js (zs )} ⇥ rk \ q k , (A.5) k6=s

h

0, ⌫jk (zk )

i

{⌫jk (zk )},

h

i

⌫jk (zk ), 1

h

k (z ), ⌫ k (z ) ⌫j+1 k j 1 k

i

for some s, where each rk \qk is one of , , and . Observe that Rdj+1 (z) k Rdj (z) is therefore a lower-dimensional Cartesian product (with dimension less than S), which is consistent with the notion of a border or a hyper-surface. Also, observe that this hyperspace is located at ⌫js (zs ) in the s-coordinate. Likewise, (A.5) holds for any Rdj+1 (z) and Rdj (z) pair with a di↵erent value of s and di↵erent choice S for each rk \ qk . But, since cl(A [ B) = cl(A) [ cl(B) for any sets A and B, cl(Rj+1 (z)) = d2Mj+1 cl(Rd (z)) S and cl(Rj (z)) = d2Mj cl(Rd (z)), and thus Rj+1 (z) k Rj (z) =

[

dj+1 2M

j+1

[

dj 2M

(Rdj+1 (z) k Rdj (z)) .

(A.6)

j

S S Now, let R>j (z) ⌘ d2M >j Rd (z) = U\Rj (z) where M >j ⌘ Sk=j+1 Mk . Note that Rj (z) ⇠ R>j (z) and Rj (z) \ R>j (z) = ; by Proposition 3.1. Then Rj (z) k R>j (z) = Rj+1 (z) k Rj (z) by the discussions around (3.23). Since Rj (z) [ R>j (z) = U by definition, Rj (z) k R>j (z) is the only nontrivial hyper-surface of cl(Rj (z)) (and of cl(R>j (z))), i.e., a surface that is not part of the surface of cl(U ). Therefore by (A.5) and (A.6), we can conclude that cl(Rj (z)) and hence Rj (z) is a function of z only through ⌫js (zs ) 8s. Moreover, in the expression of Rdk (z) in (3.21) with k  j 1 (and hence in the expression of Rj 1 (z)), there is no interval with ⌫js (zs ) in its endpoint by definition.26 Also, the interval in the expression of Rdj (z) in (3.21) (and hence in the expression of Rj (z)) that has ⌫js (zs ) in ⇣ i its endpoints is ⌫js (zs ), 1 8s. Consequently, Rj (z) = Rj 1 (z) [ Rj (z) is only expressed ⇣ i with ⌫js (zs ), 1 8s and (0, 1]. If Rj (z) is expressed using other intervals whose endpoints are functions of zs , then it contradicts the fact that Rj (z) is a function of z only through ⌫js (zs ). This completes the proof. 26

That is, the payo↵ ⌫js (zs ) is not relevant in defining markets with fewer than j entrants.

46

A.3

Proof of Lemma 3.4

We prove Lemma 3.4 by drawing on the results of Proposition 3.1. the text n As discussed in o PS 0 1 0 D 0 j in relation to (3.23), for z and z such that k=j 0 hk (z, z ) = 1 Pr[U 2 R (z)] n o 0 1 Pr[U 2 Rj 1 (z 0 )] > 0 for j 0 = 1, ..., S, we have Rj (z) ✓ Rj (z 0 )

(A.7)

for j = 0, ..., S, including RS (z) = RS (z 0 ) = U as a trivial case. For those z and z 0 , introduce notations27 j,+ (z, z j,

0 0

) ⌘ Rj (z)\Rj (z 0 ),

(A.8)

0

(z, z ) ⌘ Rj (z )\Rj (z),

(A.9)

and j

(z 0 , z) ⌘ Rj (z 0 )\Rj (z).

(A.10)

Note that, for j = 1, ..., S, Rj (·) = Rj (·)\Rj since Rj (z) ⌘

Sj

j,+ (z, z

k=0 Rk (z).

0

1

= Rj (z) \ Rj

1

= Rj (z) \ Rj

=

1

(z)c \ Rj (z 0 ) \ Rj

(A.11)

1

(z)c \ Rj (z 0 )c [ Rj

(z 0 )c

1

c

(z 0 )

(z)c \ Rj (z 0 )c [ Rj (z) \ Rj

Rj (z)\Rj (z 0 ) \ Rj

j 1

(·),

Fix j = 1, ..., S. Consider

) = Rj (z) \ Rj

=

1

1

(z 0 , z) \ Rj (z),

(z)c [

Rj

1

1

(z)c \ Rj

(z 0 )\Rj

1

1

(z 0 )

(z) \ Rj (z)

where the first equality is by plugging in (A.11) into (A.8), the third equality is by the distributive law, and the last equality is by (A.7) and hence Rj (z)\Rj (z 0 ) \ Rj 1 (z)c = ;. But j 1

(z 0 , z) \ Rj (z) =

j 1

(z 0 , z)\

j 1

(z 0 , z)\Rj (z) .

Note that + (z, z 0 ) and (z, z 0 ) defined in Section 3.1 for the S = 2 are simplified versions of these 0 0 notations: + (z, z ) = 1,+ (z, z ) and (z, z 0 ) = 1, (z, z 0 ). 27

47

Symmetrically, by changing the role of z and z 0 , consider j,

(z 0 , z) = Rj (z 0 ) \ Rj

1

= Rj (z 0 ) \ Rj

1

= Rj (z 0 ) \ Rj = =

0

j

1

j

(z 0 )c \ Rj (z) \ Rj

(z 0 )c \ Rj (z)c [ Rj

(z 0 , z) \ Rj

1

j 1

(z 0 , z) \ Rj

1

0 c

[

(z )

(z 0 )c ,

where the last equality is by (A.7) that Rj j

(z)c

1

c

(z)

(z 0 )c \ Rj (z)c [ Rj (z 0 ) \ Rj

R (z )\R (z) \ R

j

1

(z 0 )c =

1 (z) j

R

1 (z 0 ).

⇢ Rj

(z 0 , z)\

j 1

j

(z)\R

1

(z 0 )c \ Rj

j 1

0

1

(z)

(z ) \ R (z 0 ) j

But

(z 0 , z) \ Rj

1

(z 0 ) .

Note that j 1

(z 0 , z)\Rj (z) =

j

(z 0 , z) \ Rj

1

(z 0 ) ⌘ A⇤ ,

because j 1

(z 0 , z)\Rj (z) = Rj

1

= Rj

1

(z 0 ) \ Rj

1

(z)c \ Rj (z)c

(z 0 ) \ Rj (z)c

= Rj (z 0 ) \ Rj (z)c \ Rj j

= where the second equality is by Rj Rj (z 0 ). In sum,

1 (z)

j,+ (z, z j,

0 0

(z 0 , z) \ Rj

1

1

(z 0 )

(z 0 ),

⇢ Rj (z) and the third equality is by Rj

)=

(z, z ) =

1 (z 0 )



j 1

(z 0 , z)\A⇤ ,

(A.12)

j

0

(A.13)



(z , z)\A .

(A.12) and (A.13) show how the outflow ( j,+ (z, z 0 )) and inflow ( j, (z, z 0 )) of Rj can be written in terms of the inflows of Rj 1 and Rj , respectively. And figuratively, A⇤ adjusts for the “leakage” when the change from z to z 0 is relatively large. Now, with #j (u) ⌘ #j (x; u) ⌘ E[✓(ej , x, ✏)|U = u], (3.25) can be expressed as Z Z #j (u)du #j (u)du Rj (z) Rj (z 0 ) Z Z Z Z = #j (u)du + #j (u)du #j (u)du #j (u)du 0 0 Rj (z)\Rj (z 0 ) Rj (z)\Rj (z 0 ) j,+ (z,z ) j, (z,z ) Z Z = #j (u)du #j (u)du, (A.14) j,+ (z,z

0)

j,

(z,z 0 )

48

where the last equality is derived by IN and SY(i). First, for j = 1, ..., S, by (A.12)–(A.13), Z Z Z Z #j (u)du #j (u)du = #j (u)du #j (u)du 0 0 j (z 0 ,z)\A⇤ j (z 0 ,z)\A⇤ j,+ (z,z ) j, (z,z ) Z Z Z = #j (u)du + #j (u)du #j (u)du j 1 (z 0 ,z)\A⇤ A⇤ A⇤ (Z ) Z Z = where the last equality is because j A⇤ . For j = 0, Z Z #0 (u)du 0,+ (z,z

since

0)

j (z 0 ,z)\A⇤

Z

j

1 (z 0 , z)

0,

(z,z 0 )

A⇤ and

j (z 0 , z)

Z

#0 (u)du =

S,

(z,z 0 )

S

since S, (z, z 0 ) = ; by the choice of (z, z 0 ) and (3.25) and (A.14)–(A.17) evaluated at x = xj , 0

h(z, z ; x) ⌘

S X

0

hj (z, z , xj ) =

j=0

S Z X j=1

S,+ (z, z

1 (z 0 ,z)

j

0)

A⇤

j (z 0 ,z)

1 (z 0 ,z)

{#j (xj ; u)

A⇤

#j (u)du

#j (u)du, (A.15)

#0 (u)du,

0 (z 0 , z).

=

#j (u)du

A⇤ by the definition of

0 (z 0 ,z)

0)

0)

Z

#j (u)du

= ; by the choice of (z, z 0 ) and 0, (z, z 0 ) = Z Z Z #S (u)du #S (u)du =

0,+ (z, z

S,+ (z,z

A.4

1 (z 0 ,z)

#j (u)du +

For j = S,

#S (u)du,

S 1 (z 0 , z).

#j

(A.16)

(A.17)

Then combining

1 (xj 1 ; u)} du.

Proof of Lemma 3.3(ii)

R P By Lemma 3.4, h(z, z 0 ; x) = Sj=1 j 1 (z0 ,z) {#j (xj ; u) ¯ j 1 (z 0 )\R ¯ j 1 (z), which can be rewritten as R XZ h(z, z 0 ; x) {#k (xk ; u) =

Z

k6=j

j

1 (z 0 ,z)

k 1 (z 0 ,z)

{#j (xj ; u)

#j

#j

1 (xj 1 ; u)} du

#k

1 (xk 1 ; u)} du

1 (xj 1 ; u)} du.

with

j 1 (z 0 , z)



(A.18)

We prove the case ◆ = 1; the proof for the other cases R follows symmetrically. For k 6= j, when #k 1 (xk 1 ; u) #k (xk ; u) > 0 a.e. u, it satisfies #k 1 (xk 1 ; u)} du > k 1 (z 0 ,z) {#k (xk ; u) 0 0. Combining with h(z, z ; x) > 0 implies that the l.h.s. of (A.18) is positive. This implies that #j (x; u) #j 1 (x; u) > 0 a.e. u. Suppose not and suppose #j (xj ; u) #j 1 (xj 1 ; u)  0 with positive probability. Then by Assumption Y, #j (x; u) #j 1 (x; u)  0 a.e. u, which is contradiction. 49

A.5

Proof of Theorem 3.2

Recall M j ⌘ M j and M >j ⌘ rewritten as

SS

k=j+1 Mk .

Then the bounds (3.36) and (3.37) can be

Udj (x) = inf {˜ pM >j 1 (z, x) + pM j 1 (z)} , z2Z

Ldj (x) = sup {˜ pM j (z, x) + pM >j (z)} . z2Z

where for a set M ⇢ D, p˜M (z, x) ⌘ Pr[Y = 1, D 2 M |Z = z, X = x] and pM (z) ⌘ ˜ ˜ Pr[D 2 M |Z = z]. Since D = M j [ M >j for some ˜j, note that p˜M >˜j (z, x) = Pr[Y = P 0 1|Z = z, X = x] p˜M ˜j (z, x). Using this result, for z, z 0 such that Sk=j 0 +1 hD k (z, z ) = pM >j 0 (z) pM >j 0 (z 0 ) > 0 (j 0 = 0, ..., S 1), observe that each term in Udj (x) satisfies p˜M >j 1 (z 0 , x) =

p˜M >j 1 (z, x) pM j 1 (z)

p˜M j 1 (z, x) + p˜M j 1 (z 0 , x)

= Pr[✏  µD (x), U 2

pM j 1 (z 0 ) =

Pr[U 2

j

j

(z 0 , z)]

(z 0 , z)]

by (A.7) and (A.10), and thus p˜M >j 1 (z, x) + pM j 1 (z)

p˜M >j 1 (z 0 , x) + pM j 1 (z 0 ) =

Pr[✏ > µD (x), U 2

j

(z 0 , z)] < 0.

Then this relationship creates a partial ordering of p˜M >j 1 (z, x) + pM j 1 (z) as a function of z in terms of pM >j 0 (z) (for any j 0 ). According to this ordering, p˜M >j 1 (z, x) + pM j 1 (z) takes its smallest value as pM >j 0 (z) takes its largest value. Therefore, by (3.39), ¯ x) + pM j 1 (z). ¯ Udj (x) = inf {˜ pM >j 1 (z, x) + pM j 1 (z)} = p˜M >j 1 (z, z2Z

By a symmetric argument, Ldj (x) = supz2Z {˜ pM j (z, x) + pM >j (z)} = p˜M j (z, x)+pM >j (z). To prove that these bounds on E[Ydj |X = x] are sharp, it suffices to show that for any ⇤ given x 2 X and sj 2 [Ldj (x), Udj (x)], there exists a density function f✏,U such that the following claims hold: ⇤ (A) f✏|U is strictly positive on R. (B) The proposed model is consistent with the data: 8j = 0, ..., S Pr[D 2 M j |X = x, Z = z] = Pr[U ⇤ 2 Rj (z)],

Pr[Y = 1|D 2 M j , X = x, Z = z] = Pr[✏⇤  µD (x)|U ⇤ 2 Rj (z)], Pr[Y = 1|D 2 M >j , X = x, Z = z] = Pr[✏⇤  µD (x)|U ⇤ 2 R>j (z)]. (C) The proposed model is consistent with the specified values of E[Ydj |X = x]: Pr[✏⇤  µdj (x)] = sj . Corollary 3.1 combined with the partial ordering above establishes monotonicity of the event U 2 Rj (z) (and U 2 R>j (z)) w.r.t. z. For example, for z, z 0 such that pM >j (z) >

50

pM >j (z 0 ), Corollary 3.1 implies that Rj (z) ⇢ Rj (z 0 ) and hence 1[U 2 Rj (z 0 )]

1[U 2 Rj (z)] = 1[U 2 Rj (z 0 )\Rj (z)].

(A.19)

Given 1[D 2 M j ] = 1[U 2 Rj (Z)], (A.19) is analogous to a scalar treatment decision ˜ = 1[D ˜ = 1] = 1[U ˜  P˜ ] with a scalar instrument P˜ , where 1[U ˜  p0 ] 1[U ˜  p] = 1[p  D 0 0 ˜ U  p ] for p > p. Based on this result and the results for the first part of Theorem 3.2, we can modify the proof of Theorem 2.1(iii) in Shaikh and Vytlacil (2011) to show (A)–(C).

A.6

Proof of Lemma 4.1

For a given j = 1, ..., S 1, suppose there exist z 0 , z such that ⌫js 1 (zs )  ⌫js (zs0 ) 8s (Assumption ES⇤ ). For any dj and d˜j (dj 6= d˜j ), the expression of the region of multiple equilibria j ⇤ Rdj (z) \ Rd˜j (z) can be inferred as follows. First, there exists ⇣ s ⇤such that i ds⇤ = 1 and d˜js⇤ = 0, otherwise it contradicts dj 6= d˜j . That is, Us⇤ 2 0, ⌫js 1 (zs⇤ ) in Rdj (z) and ⇣ ⇤ i s ⇤ ⇤ Us 2 ⌫j (zs ), 1 in Rd˜j (z). For other s 6= s⇤ , the pair is realized to be one of the four types: (i) djs = 1 and d˜js = 0; (ii) djs = 0 and d˜js = 1; (iii) djs = 1 and d˜js = 1; (iv) djs = 0 and d˜js = 0. Then the corresponding falls ⇣ pair of intervals i ⇣ for Rdij (z) and ⇣ Rd˜j (z), i respectively, ⇣ i s s s s into one of the four types: (i) 0, ⌫j 1 (zs ) and ⌫j (zs ), 1 ; (ii) ⌫j (zs ), 1 and 0, ⌫j 1 (zs ) ; ⇣ i ⇣ i ⇣ i ⇣ i (iii) 0, ⌫js 1 (zs ) and 0, ⌫js 1 (zs ) ; (iv) ⌫js (zs ), 1 and ⌫js (zs ), 1 . Then by Proposition ⇣ ⇤ i ⇤ A.1(iv), Rdj (z) \ Rd˜j (z) is a product of ⌫js (zs⇤ ), ⌫js 1 (zs⇤ ) (by Assumption SS) and some ⇣ i ⇣ i ⇣ i of ⌫js (zs ), ⌫js 1 (zs ) , 0, ⌫js 1 (zs ) and ⌫js (zs ), 1 .

Now we show that Rdj (z) \ Rd˜j (z) and Rdj† (z 0 ) for any dj† has empty intersection. Note that for s⇤ such that [djs⇤ = 1 and d˜js⇤ = 0] or [djs⇤ = 0 and d˜js⇤ = 1], it holds that dj† s⇤ = 0. j† 6= d˜j then this is true by If dj† = dj or dj† = d˜j , this is trivially true. If d⇣j† 6= dj and d i s⇤ (z 0 ), 1 . Then by Proposition A.1(i), ⇤ 2 contradiction. But dj† = 0 corresponds to U ⌫ ⇤ ⇤ s s s j ⇣ ⇤ i ⇣ ⇤ i ⇤ 0 s Rdj (z) \ Rd˜j (z) \ Rdj† (z ) = ; as long as ⌫j (zs⇤ ), ⌫js 1 (zs⇤ ) \ ⌫js (zs0 ⇤ ), 1 = ;, where ⇤



the latter is implied by ⌫js 1 (zs⇤ )  ⌫js (zs0 ⇤ ).

A.7

Proof of Theorem 5.1

For given j = 0, ..., S

1, consider

E[Y |Z = z] E[Y |Z = z 0 ] ⇥ ⇤ = E YM j + 1[D(z) 2 M >j ] {YM >j YM j } ⇥ ⇤ E YM j + 1[D(z 0 ) 2 M >j ] {YM >j YM j } ⇥ = E 1[D(z) 2 M >j ] 1[D(z 0 ) 2 M >j ] {YM >j = E[YM >j E[YM >j = E[YM >j

YM j }



YM j |D(z) 2 M >j , D(z 0 ) 2 M j ] Pr[D(z) 2 M >j , D(z 0 ) 2 M j ]

YM j |D(z) 2 M j , D(z 0 ) 2 M >j ] Pr[D(z) 2 M j , D(z 0 ) 2 M >j ]

YM j |D(z) 2 M >j , D(z 0 ) 2 M j ] Pr[D(z) 2 M >j , D(z 0 ) 2 M j ], 51

(A.20)

where the first equality plugs in Y = 1[D 2 M >j ]YM >j + 1 1[D 2 M >j ] YM j and applies Assumption IN, and the last equality is by supposing that the result of Lemma 5.1 is satisfied with Pr[D(z) 2 M j , D(z 0 ) 2 M >j ] = 0. (A.21) But note that Pr[D(z) 2 M >j , D(z 0 ) 2 M j ] = Pr[D(z) 2 M >j ]

Pr[D(z) 2 M >j , D(z 0 ) 2 M >j ],

where Pr[D(z) 2 M >j , D(z 0 ) 2 M >j ] = Pr[D(z 0 ) 2 M >j ] by (A.21). Combining this result with (A.20) yields the desired result.

52

1

U3

1

0

U2 U1

1

(a) R000

(b) R100

(c) R010

(d) R001

(e) R101

(f) R011

(g) R110

(h) R111

1 , ⌫2 ) (⌫10 10

1 , ⌫2 ) (⌫00 00 3 ⌫00

1

U3 1

3 ⌫11

1 , ⌫2 ) (⌫11 11

0

1 , ⌫2 ) (⌫01 01

U1 (i)

S

0j3

nS

d2Mj

o

3 = ⌫3 ⌫10 01

U2

1

Rd = U ⌘ (0, 1]3

Figure 5: Illustration of Proposition 3.1 for S = 3.

53

1 , ⌫2 ) (⌫10 10

1 , ⌫2 ) (⌫00 00

1 U3 1

1 , ⌫2 ) (⌫11 11

0

1 , ⌫2 ) (⌫01 01

U1

3 = ⌫3 ⌫10 01

U2

1

Figure 6: Depicting the Regions of Multiple Equilibria for S = 3.

1

(⌫11 (z1 ), ⌫02 (z2 )) U2

(⌫11 (z10 ), ⌫02 (z20 )) (⌫01 (z1 ), ⌫12 (z2 )) (⌫01 (z10 ), ⌫12 (z20 )) (z 0 , z)

0

U1

1

Figure 7: LATE Subgroup for S = 2.

54

Figure 8: Bounds under Di↵erent Strength of Z with |X | = 3.

Figure 9: Bounds under Di↵erent Strength of Z with |X | = 15.

55

Figure 10: Bounds under Di↵erent Strength of X with |X | = 15.

Figure 11: Bounds under Di↵erent Strength of Interaction with |X | = 3.

56

Multiple Treatments with Strategic InteractionThe author ...

Apr 25, 2017 - number of players who choose to take the action—e.g., the number of entrants in an entry ... When instruments have a joint support that is rectangular, we .... Yd0 |D = d00,Z = z, X = x] for d,d0,d00 2 P. Unlike the ATT or the ...... different treatment values for what they call “super-compliers” group, we consider ...

1MB Sizes 1 Downloads 147 Views

Recommend Documents

Multiple Treatments with Strategic InteractionThe author ...
The “symmetry of payoffs” has a different meaning in their paper. ...... subpopulation in the present setting is the collection of the markets consist of the complying.

Multiple Treatments with Strategic InteractionThe author ...
Apr 17, 2017 - ⇤The author is grateful to Tim Armstrong, Steve Berry, Jorge Balat, Áureo de Paula, Phil Haile, Karam. Kang ... of entry of carbon-emitting companies on local air pollution and health outcomes, the effects ...... group, so do smalle

Multiple Treatments with Strategic InteractionThe author ...
Jan 5, 2018 - Treatments are determined as an equilibrium of a game and these strategic decisions of players endogenously ... equilibrium actions (i.e., a profile of binary endogenous treatments) determine a post-game outcome in a nonseparable model

Multiple Treatments with Strategic InteractionThe author ...
Jun 1, 2017 - University of Texas at Austin ... of entry of carbon-emitting companies on local air pollution and health outcomes, the effects ...... region Rj(z) \ Rj(z0) can be seen as a common support of U with j entrants for Z being z or z0, and .

Multiple Treatments with Strategic InteractionThe author ...
Apr 20, 2018 - The exogenous variables X are variables excluded from all the equations ...... tivity growth: Groundwater and the green revolution in rural india.

Multiple Treatments with Strategic InteractionThe author is grateful to ...
Jan 5, 2018 - and R3 = R111; also see Figures 4 and 5 for illustrations of individual Rdj 's and regions of multiple equilibria for this case. For concreteness, we henceforth discuss Proposition 3.1 in terms of an entry game. By. (i) and the fact tha

Multiple Treatments with Strategic InteractionThe author ...
Without imposing parametric restrictions or large support assumptions, this ..... Yd0 |D = d00,Z = z, X = x] for d,d0,d00 2 P. Unlike the ATT or the ..... is no multiple equilibria where one equilibrium has j entrants and another has j0 entrants for.

Strategic Information Disclosure to People with Multiple ...
from the behavior predicted by the game theory based model. Moreover, the ... systems to take an increasingly active role in people's decision-making tasks,.

Strategic delegation in a sequential model with multiple stages
Jul 16, 2011 - c(1 + n2n − 2−n). 2n−1h(n). (2). 7To see this, recall that quantities and market price, before the choices of (a1,a2, ..., an) are made, are given by ...

Strategic delegation in a sequential model with multiple stages
Jul 16, 2011 - We also compare the delegation outcome ... email: [email protected] ... in comparing the equilibrium of the sequential market with the ...

Interoperability with multiple instruction sets
Feb 1, 2002 - 712/209,. 712/210. See application ?le for complete search history. ..... the programmer speci?es the sorting order is to pass the address of a ...

Interoperability with multiple instruction sets
Feb 1, 2002 - ABSTRACT. Data processing apparatus comprising: a processor core hav ing means for executing successive program instruction. Words of a ...

Testosterone Treatments - CiteSeerX
May 1, 2006 - in libido and sexual function in hypogonadal men.10-13 .... Information from references 10 through 24. .... Methyltestosterone‡ (Android). 10 to ...

1 a. author, b. author and c. author
Building Information Modeling is the technology converting the ... the problem of preparing their students for a career in a BIM enabled work environment.

Communication with Multiple Senders: An Experiment - Quantitative ...
The points on each circle are defined by the map C : [0◦,360◦)2 →R2 ×. R. 2 given by. C(θ) := (( sinθ1 ..... While senders make their decisions, receivers view a.

CANDIDATES WITH MULTIPLE- FIRST SELECTION.pdf
11 HANS ANSIGAR JUNIOR M S4405/0052/2014. BACHELOR OF ACCOUNTING AND FINANCE IN. BUSINESS SECTOR. 12 ANETH CALIST MASSAWE F ...

SELECTED-APPLICANTS-WITH-MULTIPLE-ADMISSIONS-UDOM.pdf ...
Sn FirstName MiddleName Surname Gender F4 Index number Programme Name ...... SELECTED-APPLICANTS-WITH-MULTIPLE-ADMISSIONS-UDOM.pdf.

TroubleShooting Route RedistribuTion with Multiple RedestribuTion ...
TroubleShooting Route RedistribuTion with Multiple RedestribuTion Points.pdf. TroubleShooting Route RedistribuTion with Multiple RedestribuTion Points.pdf.

SELECTED-APPLICANTS-WITH-MULTIPLE-ADMISSIONS-UDOM.pdf ...
30 FEDRICK MTAMIHELA Male S3914/0156/2014 Doctor of Medicine (MD) .... SELECTED-APPLICANTS-WITH-MULTIPLE-ADMISSIONS-UDOM.pdf. Page 1 of ...

Communication with Multiple Senders: An Experiment - Quantitative ...
a mutual best response for the experts and DM, full revelation is likely to be a ..... the experimental interface, all angular measurements will be given in degrees. ...... computer strategy would seem natural candidates: the equilibrium strategy 릉

Maximal Revenue with Multiple Goods ...
Nov 21, 2013 - †Department of Economics, Institute of Mathematics, and Center for the Study of Ra- tionality, The ... benefit of circumventing nondifferentiability issues that arise from incentive ... We call q the outcome function, and s the.

SELECTED-APPLICANTS-WITH-MULTIPLE-ADMISSIONS-UDOM.pdf ...
Page 1 of 525. Sn FirstName MiddleName Surname Gender F4 Index number Programme Name. 1 ahmed hussein mchina Male S4217/0051/2014 Doctor of ...