A Weak Law for Moments of Pairwise-Stable Networks

Viewer
Transcript

A Weak Law for Moments of Pairwise-Stable Networks Michael P. Leung˚ April 9, 2017

Abstract. We develop asymptotic theory for strategic network-formation models under the assumption that the econometrician observes a single large pairwise-stable network. Drawing on techniques from the literature on random graphs, we derive primitive restrictions on the model that establish a weak law of large numbers for a large class of network moments. Under these restrictions, we show that the set of pairwise-stable networks can be simulated in a computationally feasible manner. Additionally, we characterize an identified set of structural parameters based on a new class of network moments and show consistency of sample moments as an application of the weak law. JEL Codes: C31, C57, D85 Keywords: social networks, network formation, multiple equilibria, objective method

˚

USC, Department of Economics and MIT, Institute for Data, Systems, and Society. E-mail: [email protected]. I thank seminar audiences at Boston University, Cleveland Federal Reserve, Duke, Penn State, UCSD, UNC, UT Austin, and many conferences.

1

Michael P. Leung

1

Introduction

The increasing availability of network data has made it possible to study the formation of social and economic connections in a variety of contexts. An empirical model commonly used for this purpose is the dyadic- or pairwise-regression model, which takes the form ( Gij “ 1 Zij1 θ ` εij ą 0 . (1) Here Gij is an indicator for whether or not nodes i and j are connected, Zij is a vector of observed exogenous attributes, and εij is unobserved.Interest often centers on the marginal effects on link formation of measures of dissimilarity included in Zij , e.g. geographic distance, which capture the pervasive tendency for similar individuals to form associations, known as homophily (McPherson et al., 2001). However, in many settings we may be interested in regressors that are functions of the network in order to capture network externalities or strategic interactions. Upon the inclusion of such statistics, (1) can be viewed as a strategic model of network formation, where the solution concept corresponds to pairwise stability. Well-known examples of network externalities include preferential attachment, the tendency for those with many connections to obtain additional connections, and transitivity, the propensity for those with common connections to associate (Jackson, 2008). We can allow for the former by including in Zij the number of connections to i and j and the latter by including, say, the number of common connections. In applied work, there have been some attempts to estimate models with network externalities,1 but the inference procedures used are not formally justified. The inclusion of network externalities creates new statistical and computational problems, and their resolution is the subject of this paper. The main problem we address in this paper is a lack of relevant asymptotic theory for network formation. Estimation requires a large sample of sufficiently uncorrelated observations in order to ř obtain a law of large numbers for moments of interest such N řN 1 as the average degree N i“1 j“1 Gij , where the inner sum is the degree of node i. Unfortunately, in practice the econometrician typically observes only a small number of plausibly independent networks, and network externalities generate a potentially strong and nonstandard form of dependence between ř ř network links. For example, consider model (1), where Zij includes k Gik and k Gjk , the degrees of i and j. Because the incentives for link formation between nodes i and j depend on the set of links formed by j, and likewise, the same incentives for j and k depend on links formed by k, it follows that node i’s links depend indirectly on those of k’s, and by induction, on links farther in the network, as well. This suggests that moments such as average degree are averages of statistics that may not be weakly dependent, at least not without additional restrictions. We derive a set of primitive conditions for weak dependence that establish a weak law of large numbers for network moments under an asymptotic sequence sending the 1

E.g. Comola (2010), Hochberg et al. (2012), and Powell et al. (2005).

2

A Weak Law for Network Moments size of the network to infinity. Such moments may be potentially complex nonlinear functions of a single observed network, for example the average clustering coefficient, which is a well-known measure of transitivity, and general subgraph counts, which enumerate the occurrence of particular subnetwork structures in the observed network. Weak dependence in our setting does not appear to map easily onto conventional measures of dependence like α-mixing. Instead, our proof draws on the objective method of Penrose and Yukich (2001) and Penrose and Yukich (2003) for deriving limit theorems for functionals of random point sets. A second contribution of this paper is a characterization of an identified set of structural parameters based on a tractable class of network moments corresponding to the distribution of polyadic outcomes. The polyadic outcome of a set of K nodes is the tuple consisting of the realization of their subnetwork and network-dependent statistics included in Zij in (1). We characterize an identified set in terms of moment inequalities that fully exploit the empirical content of polyadic outcome moments, in contrast to inequalities in the existing literature which are generally conservative. As an application of our weak law, we show that sample analogs are consistent estimators. A final problem we address is the difficulty of computing the set of pairwise-stable networks. This is of interest in order to analyze the effects of counterfactual policy interventions on network structure, often with the aim of altering individual outcomes. For instance, desegregation policies that reallocate peers across classrooms may have little effect on achievement if high-ability peers self-segregate within classrooms (Carrell et al., 2013). This motivates the use of structural models of peer-group formation to simulate the effect of reallocation policies on segregation (see e.g. Mele, 2011). However, simulating counterfactual networks is difficult because the number of pos2 sible networks on a set of n nodes is on the order of en , so a naive brute-force search is impossible. Fortunately, our conditions for weak dependence naturally suggest a simple modification to brute-force search that can be completed in Opn2 q time with high probability, which solves the curse of dimensionality problem. To understand our approach to showing weak dependence, it is useful to distinguish between two key sources of dependence in our setting: “contagion” due to network externalities and “coordination” through equilibrium selection. Contagion refers to the fact that the perturbation of a single link may render other links pairwise unstable, thereby triggering a large set of nodes to alter their links in best response, a reaction that begets further adjustments. Our insight is that there often exist node pairs with sufficiently (un)desirable attributes (such as when εij in (1) is (small) large) such that these pairs (fail to) form links regardless of the state of the network. Consequently, they function as barriers to contagion, and we call such potential links robust. With enough robust potential links, no contagion will spread far the initial perturbation, which ensures that the set of potential links in the observed network is weakly dependent. Mathematically, the problem of limiting contagion via robustness can be reduced to the problem of ensuring that a certain branching process is “subcritical,” mean3

Michael P. Leung ing that it reaches extinction eventually. Sufficient conditions for this property have been developed in random-graph theory (Bollobás et al., 2007). In our context, we show that these conditions can be interpreted as restrictions on the strength of network externalities and therefore closely resemble weak-dependence conditions used in time-series and spatial econometrics that bound the effects of temporal or spatial lags. Indeed, we argue that restrictions of this sort can be understood as controlling analogous notions of contagion, highlighting a close connection between our approach and the broader econometric literature on dependent data. The second source of dependence results from equilibrium selection, the latent process by which nodes coordinate on one of typically many possible pairwise-stable networks. Unrestricted equilibrium selection can generate arbitrary dependence between potential links, since all nodes may coordinate on the basis of a single signal. For instance, in forming friendship networks, students in a Boston school may coordinate with those in a San Francisco school on the basis of any set of student characteristics, for example a single student’s parental income, which seems unrealistic. Consequently, we impose a local-coordination condition, which requires that certain “neighborhoods” of nodes separately coordinate on the realizations of their individual subnetworks. This is the case if nodes in such neighborhoods only have complete information about their own attributes, which is more realistic in a largenetwork setting. Thus, students in the Boston school only coordinate on the basis of their own attributes and not on those of San-Francisco students. Related Literature. A growing econometric literature studies frequentist inference in network-formation models when the econometrician observes a single network. Chandrasekhar and Jackson (2015) and Boucher and Mourifié (2015) study the estimation of different random-graph models, which are quite distinct from the models we consider. Dzemski (2014) and Graham (2014) focus on dyadic link-formation models that rule out network externalities but allow for unrestricted unobserved heterogeneity. Leung (2015) studies strategic models with incomplete information. Menzel (2016) considers a class of models with an asymptotically logit form. Exploiting properties of the logistic distribution, he derives sufficient statistics for equilibrium selection such that network links in the limit model are independent across node pairs conditional on these statistics. In contrast, we study models without logit limits that appear to lack such sufficient statistics. Our approach is to impose nonparametric restrictions on selection to obtain unconditional weak dependence. Miyauchi (2014) and Sheng (2014) develop moment inequalities for network formation games, which may be applied to our setting. Their moments are constructed from necessary conditions for pairwise stability and are therefore conservative. In contrast, our inequalities fully characterize the empirical content of polyadic outcome moments by accounting for the distribution of network-dependent regressors. The importance of accounting for such regressors was first recognized by de Paula et al. (2015) in their notion of “network type.” We show that this idea can be applied to the larger class of subnetwork moments related to those used by Sheng under weaker 4

A Weak Law for Network Moments conditions than those required by de Paula et al. There is also a literature on estimating dynamic strategic network-formation models, which specify a meeting technology that determines how nodes form links sequentially over time. Much of this literature focuses on inference when the econometrician observes a single network, and most adopt Bayesian estimation techniques for this purpose (Christakis et al., 2010; Hsieh and Lee, 2013; Mele, 2015; Snijders et al., 2010). Lastly, our paper is related to the growing literature on inference for large games (Agarwal and Diamond, 2015; Bisin et al., 2011; Brock and Durlauf, 2001; Fox, 2016; Menzel, 2015a,b; Song, 2014; Xu, 2015). These papers study matching markets and models of social interactions, which do not include strategic models of network formation. Outline. In the next section, we describe the model, the class of network moments for which we establish a law of large numbers, and the main conditions for weak dependence. In §3, we present the main result, a law of large numbers. Next, §4 discusses partial identification and estimation of polyadic outcome moments. We then detail a new algorithm for simulating counterfactual networks in §5 and analyze its complexity. We conduct a Monte Carlo study evaluating the informativeness of the identified set in §6, finding that our estimator can consistently recover the signs of key parameters of interest. Finally, §7 concludes. Notation. The abbreviation “a.s.” stands for “almost surely.” For a density f , cumulative distribution function (CDF) F , and random vector X, denote their respective supports by supppf q, supppF q, supppXq. For a set S, let |S| be its cardinality. Lastly, R` denotes the set of non-negative real numbers.

2

Model

The econometrician observes a set of nodes or agents, the network they form, and a subset of their attributes relevant for network formation. Generalizing the linear setup of (1), link formation will be governed by a potentially nonlinear latent index known up to a finite-dimensional vector of parameters, which is the object of inference. This section formalizes the model and presents key restrictions required for a law of large numbers. Note that, until the discussion of identification in §4, we make no formal distinction between objects observed and unobserved by the econometrician. We first introduce some standard notation for networks. We represent a network on a set of nodes N as an array G “ pGij ; i, j P N q, where the potential link Gij is an indicator for whether or not nodes i, j P N are connected. Following the usual convention, we require that Gii “ 0 for all i P N , meaning that there are no self links. In the main text, we focus on undirected networks, so Gij “ Gji . Let G´ij “ pGkl ; k, l P N , tk, lu ‰ ti, juq, the network excluding the potential link

5

Michael P. Leung between i and j.2 Attributes. Each observed node i is endowed with a position Xi P Rd . Positions correspond to vectors of homophilous attributes, meaning we will later require that the probability of link formation is decreasing in the distance between node positions. Leading examples of homophilous attributes include income and geographic location. We assume that positions are unique with probability one, for example if they are drawn i.i.d. from a continuous distribution. Let X Ď Rd denote the set of node positions. We arbitrarily label the elements of X by 1, 2, . . . , letting N denote this set of node labels. Each i P N is endowed with a vector of node-level attributes αi , which may include characteristics such as race or gender, and each node pair pi, jq is endowed with a vector of pair-level attributes ζij , which typically represents an idiosyncratic random shock. Let Wij “ pαi , αj , ζij q, and define the array W “ pWij ; i, j P N q. Network. The observed network G obeys the following pairwise-stability condition, which is essentially a generalization of (1): for every i, j P N , Gij “ 1 tV pδij , Sij , Wij q ą 0u .

(2)

The first input of V is the homophily measure δij “ r´1 ||Xi ´ Xj ||, where || ¨ || is a norm on Rd and r P R` . We will later require V to be eventually decreasing in δij . Hence, nodes that are dissimilar in terms of their positions are less likely to link, which captures homophily. The sparsity parameter r in the definition of δij will be taken to zero as the network size grows in order to ensure sparsity of G. This rate will be discussed in detail in §3. The second input of V is a vector of endogenous statistics Sij “ SpXi , Xj , G´ij , pr´1 ||Xk ´ Xl ||; k, l P N q, W q. The dependence of Sij on G captures network externalities or strategic interactions. We call any G obeying (2) a pairwise-stable network and V the joint surplus in reference to a standard game-theoretic interpretation of the model.3 A leading example of interest is the following. Example 1. Suppose G represents a friendship network, and positions represent geographic locations. Consider the model V pδij , Sij , Wij q “ hpαi , αj ; θ1 q ` θ2 max Gik Gjk ` θ3 p∆ij ` ∆ji q ` ρpXi , Xj q ` ζij , k

2

We will make use of the following standard terminology for networks. For two networks G, G1 , we say that G is a subnetwork of G1 if every link in G is a link in G1 . Two nodes i and j are path-connected in a network if there exists a path from i to j. A path in a network from node i to j is a sequence of distinct nodes starting with i and ending with j such that for each k, k 1 in this sequence, k and k 1 are directly linked in the network. The length of a path is the number of links it involves. The path distance between two nodes in G is the length of the shortest path that connects them. 3 More precisely, (2) corresponds to the solution concept of pairwise stability with transferable utility (Jackson, 2008).

6

A Weak Law for Network Moments ř ¯ where ρpXi , Xj q is a function decreasing in δij and ∆ij “ maxt k‰j Gik , ∆{2u, the degree of node i excluding her potential link with j, truncated above at some user¯ specified level ∆{2. In this model, Sij “ pmaxk Gik Gjk , ∆ij , ∆ji q. An example of the penalty function ρp¨q is " 0 if ||Xi ´ Xj || ď r ρpXi , Xj q “ . (3) ´8 if ||Xi ´ Xj || ą r Here links only form among geographic neighbors, those located less than a distance r apart from each other. Another example is ρpXi , Xj q “ ´δij , which allows a link to form with positive probability even if δij is quite large. In either case, ρpXi , Xj q captures homophily in geographic distance. The term hpαi , αj ; θ1 q captures the effect of attributes on link formation. This typically includes node-level controls and measures of homophily. For example, in their study of peer effects in smoking, Gilleskie and Zhang (2010) estimate a logit model of friendship formation that includes measures of homophily with respect to age, grade level, gender, and race. The remaining terms of interest capture network externalities. If θ2 is positive and i and j have some friend k in common, then θ2 maxk Gik Gjk ą 0, so the propensity of link formation is higher. Hence, θ2 captures transitivity or clustering, the tendency for individuals with friends in common to become friends. Social networks tend to exhibit high degrees of clustering, and a number of theories purport to explain this fact (Jackson, 2008). Graham (2016) discusses the policy implications of distinguishing between clustering and homophily. The parameter θ3 represents the importance of popularity or high degree; if θ3 ą 0, then individuals prefer to be friends with those who have many friends. This is analogous to the preferential attachment mechanism of (Barabási and Albert, 1999). ¯ in the definition of ∆ij ensures that enNote that the choice of truncation ∆ dogenous statistics are uniformly bounded above in this example, an assumption also imposed by de Paula et al. (2015) and Menzel (2016). In some cases, our conditions ¯ “ 8 (see Example 5 below). hold even if ∆ The endogenous statistics in Example 1 obey a local externalities restriction that we impose more generally as follows. Below we emphasize the dependence of Sij on G, X , W by writing Sij pG, X , W q. Assumption 1 (Local Externalities). For any i, j P N ; position sets X , X 1 ; attribute arrays W, W 1 ; and networks G, G1 such that Gkl “ G1kl and pXk , Xl , Wkl qGkl “ pXk1 , Xl1 , Wkl1 qG1kl for any k P ti, ju and l P N , it is the case that Sij pG, X , W q “ Sij pG1 , X 1 , W 1 q. This condition states that networks, attributes, and positions only enter the joint surplus through the “neighbors” of nodes i and j, i.e. nodes k linked to either i or j. 7

Michael P. Leung

Figure 1: Network neighbors.

For instance in Figure 1, the joint surplus for the node pair connected by a solid line depends only on links, attributes, and positions of those connected by dotted black lines but not gray lines. Most of the models studied in the econometric literature obey this restriction (e.g. Christakis et al., 2010; Goldsmith-Pinkham and Imbens, 2013; Graham, 2016; Mele, 2015; Sheng, 2014). Indeed most focus on statistics capturing transitivity and preferential attachment, as in Example 1. Other examples include covariate-weighted analogs, such as the number of common friends with a particular ř attribute k Gik Gjk 1tαk “ au. Distribution. We assume that tαi ; i P N u and tζij ; i, j P N u are each independently distributed. In the main text, we will focus on the case where each are also independent of X only because this simplifies the exposition, leaving the general case to §B.1 in the appendix. We will be concerned with two processes that generate X , each associated with a different value of r. In the finite model, r “ p nκ qd for some constant κ ą 0, and X is a set of Nn ` 2 i.i.d. random vectors with density f , where Nn is drawn from a Poisson distribution with intensity n, independently of all other primitives. Hence, p ErNn s “ n and Nnn ÝÑ 1 as n Ñ 8. We assume that this model generates the observed network. §3 discusses this model in detail, including the role of r. For large n, network moments under the finite model approach those of the limit model, introduced in §3, where r is constant, normalized to one, and X is a Poisson point process on Rd . Having a random number of nodes in the finite model is sensible in a large-game setting, since the econometrician does not exercise control over the sample size (the number of agents). The assumption of a random sample size has also been used in the networks literature (Sheng, 2014) and spatial statistics (Karr, 1986; Politis et al., 1998; Sherman, 2011), the latter also assuming that Nn is Poisson. Equilibrium Selection. Model (2) does not define a complete data-generating process for G, since for any W and X , there may be multiple networks satisfying the pairwise-stability condition (see e.g. Sheng, 2014, for examples). As usual, in order to complete the model, we introduce a selection mechanism that maps the set of

8

A Weak Law for Network Moments possible pairwise-stable networks to a single selected network that is observed in the data (de Paula, 2013). This function is a reduced-form representation of the process by which nodes coordinate on a particular pairwise-stable network. Define GpX , W, rq as the set of pairwise-stable networks, that is, the set of networks satisfying (2) for all i, j P N . Assumption 2 (Selection Mechanism). With probability one, |GpX , W, rq| ě 1, and there exists a GpX , W, rq-valued function λpX , W, rq ” λppδkl ; k, l P X q, W q, termed the selection mechanism, such that G “ λpX , W, rq.4 The first part of the assumption says that a pairwise-stable network exists. Without this assumption, a complete description of the model additionally requires a specification for G when none exist. If X is a.s. finite, there are sufficient conditions in the literature that guarantee existence (Miyauchi, 2014; Sheng, 2014). These same conditions turn out to be sufficient for existence in the limit model we consider, as discussed in Remark 5 in §3. The second part of the assumption defines the reducedform mapping λ from the model primitives to the outcome, interpreted as the process by which nodes coordinate on an equilibrium. Specific functional forms for λ can represent equilibrium refinements (Bajari et al., 2010). Note that for estimation, we will not require knowledge of λ.

2.1

Network Moments

We establish a law of large numbers for a general class of network moments. Define the node statistic of a node i P N as a real-valued functional ψpXi , G, X , W q and a ř 1 network moment as an average of node statistics |N | iPN ψpXi , G, X , W q. Note that because positions are a.s. unique, there exists a bijection from X to N , so ψ can depend on N through X . The main restriction on ψ is K-locality: i’s node statistic only depends on its arguments through the positions, attributes, and links formed by nodes its K-neighborhood, the set of nodes whose path distance from i is at most K. Before introducing the formal definition, we present three examples. ř Example 2. The degree of node i in G is jPN Gij , which is a node statistic satisř fying 1-locality. The average degree |N1 | i,jPN Gij is the network analog of a choice probability, since it is proportional to the empirical frequency of link formation. More generally, the class of network moments includes (scaled) subnetwork counts, of which the average degree is a special case. A subnetwork count simply enumerates the frequency with which a certain subnetwork occurs in the observed network. This class 4

The selection mechanism is more commonly defined as a conditional distribution of pairwisestable network, given W and X . We show that this is subsumed by our definition in Remark 6 in §B.2.

9

Michael P. Leung includes, for instance, the number of triplets that form a “transitive triad,” where all nodes in the triplet are connected to each other. Sheng (2014) constructs moment inequalities for network-formation models using subnetwork counts. Example 3. In §4, we construct moment inequalities based on polyadic outcome moments. The polyadic outcome of a set of nodes A is simply the tuple consisting of the subnetwork of G formed by A and the set of endogenous statistics pSij ; i, j P Aq. An ř example of a polyadic outcome moment is the distribution of dyadic outcomes, 1 i,jPN 1tpGij , Sij q “ p`, squ. In this case, the node statistic is given by the inner |N | ř sum jPN 1tpGij , Sij q “ p`, squ, and under Assumption 1, this statistic satisfies 2locality, since it depends on nodes j connected to i and potentially on those connected to these nodes j through Sij . Example 4. The individual clustering for a node i is ř j‰i;k‰j;k‰i Gij Gik Gjk , Cli pGq “ ř j‰i;k‰j;k‰i Gij Gik with Cli pGq ” 0 if i has at most one link. The numerator counts the number of pairs pj, kq linked to i that are themselves linked, while the denominator counts the number of pairs linked to i. Hence, Cli pGq is ř a node statistic satisfying 1-locality. 1 The average clustering coefficient of G is |Nn | iPNn Cli pGq, a well-known measure of transitivity in network science. Thus, its population analog is likely informative for θ2 in Example 1. We now formally define K-locality. Let LG pi, jq be the length of the shortest path between i, j P N in the network G. For K P N, we say ψ is K-local if ψpXi , G, X , W q “ ψpXi , G1 , X 1 , W 1 q for any node i P N , positions X , X 1 , attributes W, W 1 , and networks 1 qG1jk for all j, k P N G, G1 such that Gjk “ G1jk and pXj , Xk , Wjk qGjk “ pXj1 , Xk1 , Wjk such that LG pi, jq ď K ´ 1.

2.2

Robustness to Contagion

An unconditional law of large numbers for network moments requires node statistics to be weakly dependent. This is challenging to establish in our context due to two main sources of dependence. First, due to the presence of network externalities, the perturbation of a potential link or attribute may directly affect the pairwise stability of other potential links. This can be conceptualized in terms of contagion, where nodes best respond to the initial perturbation by deviating from their existing links, and these deviations induce further deviations by other nodes. Second, the existence of multiple equilibria can induce dependence through equilibrium selection, or coordination. We discuss the second issue in §2.3, focusing here on contagion. 10

A Weak Law for Network Moments Our main idea is that node pairs that draw sufficiently (un)desirable values of Wij (fail to) form links regardless of the state of the network. Potential links for such pairs are therefore robust in the sense that their pairwise-stability is unaffected by changes to links in the ambient network. Consequently, robust potential links function as barriers to contagion. We next impose an assumption essentially requiring that robust links occur sufficiently often. We show that this assumption can also be interpreted as a restriction on the effect of network externalities on link formation. As we will see, this condition is analogous to standard weak-dependence conditions used in spatial econometrics, time series, and the estimation of large games on networks that restrict the magnitude of autoregressive coefficients. We now formally define the concepts alluded to above. Construct a network Dprq, where for any i, j P N , * " ´1 ´1 Dij prq “ 1 inf V pr ||Xi ´ Xj ||, s, Wij q ď 0 X sup V pr ||Xi ´ Xj ||, s, Wij q ą 0 . s

s

(4) If Dij prq “ 0, we say that Gij is a robust potential link. The motivation behind this terminology is that if inf s V pδij , s, Wij q ą 0, then Gij “ 1 for all values of Sij and therefore for all states of the network G. Consequently, we say that i and j form a robust link. Likewise, if sups V pδij , s, Wij q ď 0, then Gij “ 0 for all values of Sij , and we say that a link between i and j is robustly absent. Thus, if Dij prq “ 0, this means that Gij remains pairwise stable under arbitrary changes to the ambient network, whereas if Dij prq “ 1, then the pairwise stability of Gij is sensitive to changes in the network. Hence, links in Dprq mark potential links in G that are susceptible to contagion.5 We now state our first key weak-dependence condition. Let dα be the dimension of node-level attributes αi . Recall from the end of §2 that κ ą 0 is a constant such that in the finite model r “ p nκ qd , and f is the density of node positions.6 As discussed in the example below, κ and f control the limiting expected degree of a node. Assumption 3 (Subcriticality). For x, y P Rd and α, α1 P Rdα , let Λpx, y, α, α1 q “ 5

The definition of Dprq can be easily extended to the case of nontransferable utility by instead defining " Dij prq “ 1 inf V pδij , s, Wij q ď 0 X sup V pδij , s, Wij q ą 0 s

s

*

Y infs V pδij , s, Wji q ď 0 X sup V pδij , s, Wji q ą 0

.

s

That is, Dij prq “ 0 (a potential link is robust) if the marginal utility of adding the link for both i and j is positive or negative regardless of the state of the network G. The intuition for weak dependence will remain exactly the same. 6 These are formally defined in Assumption 6 below.

11

Michael P. Leung ˇ “ ‰ E D12 p1q ˇ X1 “ x, X2 “ y, α1 “ α, α2 “ α1 . We have ˛1{2 ¨ ˙1{2 ¸2 ż ˜ż ˆż Λpx, y, α, α1 q2 dΦpα1 | yq dy sup κf pwq ˝ dΦpα | xq‚ ă 1, x,wPRd

Rdα

Rd

Rdα

(5) where Φp¨ | xq the conditional distribution of αi given Xi “ x. Notice that we set r “ 1 in the definition of Λp¨q. This is due to a rescaling of the model discussed in §3. The next example shows that Assumption 3 can be viewed as a restriction on the marginal effect of network externalities. Example 5. Consider the model in Example 1 with ρp¨q defined in (3) and θ2 , θ3 ě 0. Similar derivations for the case ρpXi , Xj q “ ´δij can be found in §C.1 in the appendix. Suppose positions are uniformly distributed on r0, 1s2 and hpαi , αj ; θ1 q “ θ1 . Then (5) reduces to γκπ ă 1, (6) where κπ is an upper bound on the limiting expected degree of G due to the specification of ρp¨q, and ` ˘ ` ˘ ¯ ` ζij ą 0 ´ P θ1 ` ζij ą 0 . γ “ PpDij p1q “ 1 | δij ď 1q “ P θ1 ` θ2 ` θ3 ∆ This is the partial equilibrium marginal effect on link formation of changing the endogenous statistics from their smallest value to their largest value, or for short, the marginal effect of externalities. iid To get a sense of magnitudes, suppose ζij „ N p0, 1q, and κπ “ 10, so the expected degree is at most ten, as is the case for the Harvard Dorm network studied by Leider ¯ ď 0.25. On et al. (2009). Then (6) reduces to γ ă 0.1. If θ1 “ 0, this implies θ2 `θ3 ∆ the other hand, if θ1 ě 1.3, then the model generates robust links sufficiently often, ¯ satisfies γ ă 0.1. This includes ∆ ¯ “ 8, so a uniform and any value of θ2 ` θ3 ∆ upper bound on the endogenous statistics is unnecessary to impose. Suppose instead κπ “ 3.5, so the expected degree is at most 3.5, as is the case for the widely studied AddHealth high-school friendship network. Then γ can be as high as 0.285, and (5) ¯ when θ1 ě 0.57. is satisfied for any value of θ2 ` θ3 ∆ Notice the inverse relationship between expected degree and the marginal effect of externalities required by (6). A smaller upper bound on degree (sparser network) allows for a higher marginal effect. This is because a higher value of γ promotes contagion, and a smaller value of κπ inhibits contagion, since it implies a higher proportion of robustly absent links and therefore fewer “pathways” through which contagion may spread. Bounds on marginal effects similar to (6) appear often in econometrics and in contexts seemingly unrelated to network formation. 12

A Weak Law for Network Moments 1. (Games on Networks) Xu and Lee (2015) study discrete games on networks and impose a high-level condition that, for the case of bounded-degree graphs, ¯ ă 1, where ∆ ¯ uniformly bounds node degrees and γ is their analog reduces to γ ∆ of the marginal effect of externalities. They require this condition in order to apply a central limit theorem for near-epoch dependent spatial processes. Xu (2015) imposes a similar “network decaying dependence condition.” A variant of the well-known linear-in-means model of peer effects assumes that outcomes are ř linear in the total action of their peers: Yi “ β1 ` β2 Si ` εi , where Si “ j Gij Yj and β2 is the endogenous effect. If |β2 | maxi ni ă 1, where ni is the degree of node i, then an equilibrium exists (Bramoullé et al., 2009), and a central limit theorem holds (Lee, 2003). Note that β2 ni is the expected partial-equilibrium marginal effect of increasing the actions of all of i’s peers by one unit. 2. (Spatial Econometrics) Jenish and Prucha (2012) derive a general Lipschitz condition for functionals of random fields to satisfy near-epoch dependence. For řk the special case of the k-nearest neighbor spatial autoregressive model Zi “ v , . . . , vk are the k nearest neighbors of i, the Lipschitz l“1 al Zi´vl ` εi , where řk 1 condition reduces to l“1 |al | ă 1, which restricts the expected marginal effect of increasing the outcomes of i’s k-nearest neighbors by one unit. 3. (Time Series) The usual AR(1) model is a special case of a spatial autoregressive model in which each observation’s “neighbor” is the observation in the preceding time period. The Lipschitz condition above reduces to the usual restriction that the autoregressive coefficient satisfies |θ| ă 1. de Jong and Woutersen (2011) require an analogous condition for the binary-outcome analog of the AR(p) model: the marginal effect of lagged dependent variables must be ă 1. The repeated occurrence of the ă 1 threshold can be explained heuristically in terms of branching processes, which in our context provide a formalism for the notion of contagion (see Lemma 3 in §D). Consider exogenously changing by one unit the outcome of a node / potential link / observation. Due to strategic interactions / lagged dependent outcomes, this may induce network / spatial / temporal neighbors to change their outcomes. These changes in turn induce the neighbors of neighbors to change, etc. At each step of this contagion, interpret the total change in outcomes across all neighbors of a given unit as its “offspring” count and the total offspring at each step as a “generation” of a branching process. If the expected offspring of each unit is strictly less than one in expectation, then the size of each new generation is smaller than the previous generation on average. Consequently, a Galton-Watson branching process will reach extinction in finite time. In terms of contagion, this means the initial perturbation triggers contagion that only affects a small set of observations. Assumption 3 and the weak-dependence conditions above bound the total expected change in outcomes below one in expectation, therefore guaranteeing extinction of

13

Michael P. Leung the associated branching processes. Hence, the conditions can be understood as controlling the spread of contagion in their respective contexts. Remark 1 (Testability). Assumption 3 is testable in the sense that the left-hand side of (5) is known up to structural parameters θ0 of V and the distribution of pXi , Xj , Wij q, which is typically assumed to be known up to θ0 . §4 below provides methods for estimating an identified set for θ0 , which in principle implies bounds for (5).7 While the left-hand side of (5) appears complicated, simpler testable sufficient conditions often exist, as illustrated in Example 5 and §C.1.

2.3

Local Coordination

As noted at the start of the previous subsection, there are two sources of dependence in this model: “contagion” due to network externalities and “coordination” due to equilibrium selection. We now discuss the latter. Formally, dependence arises from equilibrium selection because λ may depend on the entirety of W and X , so the realization of any potential link may depend on the attributes of any node in the network. We first need several definitions. For any network G, we can partition the set of nodes N into connected subnetworks that are disconnected from the rest of the network. Following standard terminology, we call each element of the finest such partition a component of the network. For i P N , let CpXi , X , D, rq Ď N be the component of Dprq that contains i, or for short, i’s Dprq-component. Define i’s strategic neighborhood as Ci` “ CpXi , X , D, rq Y ! ) j P N : Dk P CpXi , X , D, rq such that inf V pr´1 ||Xj ´ Xk ||, s, Wjk q ą 0 . (7) s

That is, Ci` adds to i’s Dprq-component all nodes that are connected to this component through a robust link. Example 6. Suppose the realization of W and X induce a network Dprq that is given in Figure 2 and an observed network G for which the only robust link is G34 . Then the Dprq-components are t1, 2, 3u and t4, 5, 6u, and besides G34 , all other potential links that bridge the two components by construction must be robustly absent. There are two strategic neighborhoods: t1, 2, 3, 4u and t3, 4, 5, 6u. If we suppose instead that G13 , G23 , G34 are all robust links, then the Dprq-components are t1, 2u, t3u, t4, 5, 6u, and the associated strategic neighborhoods are t1, 2, 3u, t1, 2, 3, 4u, t3, 4, 5, 6u. 7

Of course, assuming a central limit theorem exists in our setting, we can only assess the power of a test against alternatives for which a central limit theorem also holds.

14

A Weak Law for Network Moments

2 1

5 4

3

6

Figure 2: Dprq with two components.

For any strategic neighborhood C, let GC be the subnetwork of G on C. Any such subnetwork has the following stability property. Lemma 1. Under Assumption 1, for any strategic neighborhood C induced by X , W, r, the set of pairwise-stable subnetworks on C equals tGC : G P GpX , W, rqu. The proof can be found in §F. This lemma implies that for any pairwise-stable network G the pairwise stability of potential links between nodes in C is entirely determined by the attributes and positions of nodes in C. Hence, for any pairwise-stable network G, if we remove all nodes in N zC from the network, the remaining subnetwork is still pairwise stable, which typically fails to hold for subnetworks on arbitrary sets of nodes. Dependence through equilibrium selection originates from the fact that the realization of potential links in GC may still depend on the attributes and positions of nodes in N zC, even though the pairwise stability of these links does not. To see this, suppose for some realization of W and X that Dprq is given by Figure 2, and there are no robust links. Then there are two strategic neighborhoods: t1, 2, 3u and t4, 5, 6u. Further suppose that for this realization of W and X , for both strategic neighborhoods, there exist two pairwise-stable subnetworks, labeled A and B. Unrestricted selection allows the true λ to be such that, for example, both neighborhoods coordinate on A if α6 is realized in a certain set and on B otherwise. More generally, potential links between t1, 2, 3u can depend arbitrarily on the attributes of t4, 5, 6u, despite the two being strategically disjoint. We next specify a “local coordination” condition that rules out this type of dependence. Let C ` pN , rq “ tCi` ; i P N u be the collection of strategic neighborhoods, WC “ ` pWij ; i, j P Cq, and ˇ XC “ pXi ; i P Cq. For C P C pN , rq and λ defined in Assumption 2, let λpX , W, rqˇC be the restriction of the range of λ to GpXC , WC , rq, the set of pairwise-stable subnetworks on C. Assumption 4 (Local Coordination). There exists a function λ1 pXC , WC , rq, defined for all C P C ` pN , rq, with range GpXC , WC , rq, such that for any strategic neighborˇ hood C, we have λpX , W, rqˇC “ λ1 pXC , WC , rq.8 8

The proof of the weak law does not make use the fact that the range of λ1 is the set of pairwise-

15

Michael P. Leung This assumption states that nodes coordinate only on the attributes of their respective strategic neighborhoods to form links. Lemma 1 ensures that this is a coherent requirement. The idea is that the pairwise stability of strategic neighborhoods does not depend on attributes of external nodes by Lemma 1, so nodes in distinct strategic neighborhoods have no incentive to coordinate on each other’s attributes. The assumption can also be viewed as a restriction on the information sets of agents, since nodes are only allowed to coordinate on the attributes of those in their neighborhood. In the large-network context, this is a more realistic assumption about agents’ knowledge than what is allowed by fully unrestricted selection, since anecdotally, most individuals are unaware of their higher-order network neighbors. Note that while the related literature discussed in §2.2 do not explicitly require Assumption 4, this is only because they study models with unique equilibria or assume specific selection mechanisms that implicitly satisfy local coordination. Example 7. Equilibrium selection via myopic best-response dynamics satisfies local coordination. These dynamics select a pairwise-stable network by starting at a random network and then repeating the following process: randomly choose a node pair and have them form or sever their link, best responding to the state of the existing network myopically. This process is guaranteed to converge to a pairwise-stable network, assuming one exists (Jackson and Watts, 2002). It satisfies local coordination because it is equivalent to running these dynamics separately on each strategic neighborhood. Myopic best-response dynamics are well studied in the theoretical literature and used to justify pairwise stability as a decentralized solution concept (Jackson and Watts, 2002). Most papers in the econometric literature on dynamic network formation assume equilibrium selection through these dynamics (Christakis et al., 2010; Hsieh and Lee, 2013; Mele, 2015). Example 8. A selection mechanism that only depends on X , W through GpX , W, rq trivially satisfies local coordination. Such selection mechanisms may arise after equilibrium refinements. For instance, under the strategic complementarity and convexity conditions in Hellmann (2013), Miyauchi (2014) shows that the set of pairwise-stable networks is a nonempty, complete lattice with a partial order, and thus there exists a “maximum” pairwise-stable network. Then a selection mechanism that chooses the maximum of GpX , W, rq is equivalent to one that chooses the maximum of GpXC , WC , rq for each strategic neighborhood C and therefore satisfies local coordination. Such a selection mechanism is often used to ensure that the likelihood is feasible to simulate (Boucher, 2016; Xu and Lee, 2015). Remark 2. Since economic theory provides no guidance for equilibrium selection, a typical view in the empirical games literature is that the econometrician ought to leave stable subnetworks on C. Hence, the result holds even if λ1 selects a pairwise-unstable subnetwork. However, the identification results in §4 rely on pairwise stability.

16

A Weak Law for Network Moments selection completely unrestricted. However, as argued by Epstein et al. (forthcoming), this literature in fact implicitly imposes restrictions on selection through the sampling assumption that the econometrician observes i.i.d. markets. In particular, markets cannot be independent without an analog of Assumption 4 ensuring independent selection across markets.

2.4

Sparsity

The next condition, together with Assumption 3, ensures that G is sparse. Empirically, sparsity is the observation that the typical number of connections formed by a node is small relative to the number of nodes in the network, a widely documented stylized fact about real-world social networks (Barabási, 2015; Chandrasekhar, 2016). This motivates the standard asymptotic notion of sparsity, which is that the expected degree of any node is finite in the limit. Assumption 5 (Sparsity). supxPRd

ş Rd

P pinf s V p||x ´ y||, s, W12 q ą 0q dy ă 8.

This is a mild condition analogous to (6.2) of Meester and Roy (1996). Consider the model in Example 1, with hpαi , αj ; θ1 q “ θ1 and ρp¨q equal to the hard-threshold penalty (3) (and r “ 1). Then inf s V pδij , s, Wij q ą 0 only if ||Xi ´ Xj || ď 1, so Assumption 5 holds if ż sup 1t||x ´ y|| ď 1u dy ă 8, xPRd

Rd

which is true by inspection. For the soft-threshold penalty ρpXi , Xj q “ ´δij , we also need the distribution of ζij to have sufficiently thin tails. For instance, if this distribution is normal, then Assumption 5 holds, as we verify in §C.1 in the appendix. Assumptions 3 and 5 imply that G is sparse. To see this, note that expected number of robust links formed by a node i, conditional on her position, equals ff « Nn ! ) ˇˇ ÿ E 1 inf V prn´1 ||Xi ´ Xj ||, s, Wij q ą 0 ˇˇ Xi s j“1,j‰i ż ´ ˇ ¯ ď κ sup f pxq P inf V p||Xi ´ y||, s, W12 q ą 0 ˇ Xi dy ` 1, (8) x

Rd

s

where the inequality follows from calculations in §C.2. The rate condition rn “ p nκ qd is the reason for the appearance of κ in the right-hand side. Since κ supx f pxq ă 8 by assumption, finiteness of the right-hand side holds ř nunder Assumption 5. A similar calculation shows that under Assumption 3, Er N j“1,j‰i Dij prn q | Xi s is uniformly bounded over n, so the expected number of nonrobust potential links formed by any node i is also finite in the limit. Since any potential link in G is either robust or nonrobust, it follows that G is sparse. 17

Michael P. Leung Remark 3 (Interpretation of rn ). There are many different ways to model sparsity. Mele (2015) suggests sending the intercept (say hpαi , αj ; θ1 q in Example 1) to ´8 as n Ñ 8. Menzel (2016) imposes a marginal cost to link formation, ? equal to the maximal element of J i.i.d. random variables, where J diverges at a n rate. Graham (2014) allows for sparsity through a high-level rate condition on the frequency with which “tetrads” form. He does not discuss primitive conditions, but an approach similar to that of Mele (2015) can generate sparsity in his model. All of these approaches increase the marginal cost of link formation as the network size grows in order to reduce the rate of link formation. In our setting, homophily plays the role of imposing a marginal cost on link formation, hence why sending rn´1 Ñ 8 ensures sparsity. We emphasize that the particular rate chosen for rn has no economic content. Its sole purpose is to obtain a realistic limit model (11) that captures qualitative features of real-world networks.

3

Law of Large Numbers

To establish a law of large numbers, we embed the data-generating process for the observed network in a sequence of network-formation models indexed by mean network size n. Formally, define a network-formation model as the tuple pV, λ, X , W, rq, where V is the joint surplus, λ the selection mechanism, X the set of node positions, W the associated array of attributes, and r the sparsity parameter. Assumption 6. Let Nn be a Poisson random variable with intensity n that is drawn independently of W and rn “ p nκ qd for some κ P R` . Let Nn “ t1, . . . , Nn ` 2u and Xn be a set of Nn ` 2 draws from a density f on Rd that are i.i.d. conditional on Nn . The observed network is realized according to Assumption 2 under the nth element of a sequence of finite network-formation models tpV, λ, Xn , W, rn q; n P Nu. All asymptotic statements in this paper hold under this sequence, sending n Ñ 8. As emphasized by Menzel (2015b), the asymptotic sequence should be chosen to preserve qualitative features of the finite model in order to obtain a realistic limit. The first aspect of realism is a nondegenerate limit model in which the joint surplus remains random, with network externalities playing a nontrivial role in link formation. This aspect will be evident from the discussion of the limit model immediately following the statement of the theorem below. The second aspect is a limit model that generates sparse networks, which holds under Assumption 5. Our main result is a law of large numbers for network moments defined in §2.1. We first clarify some notation. The attribute array and network can be written as functionals W pX q and GpX , W pX q, rq. This is because W can be viewed as a mapping from pairs of node positions pXi , Xj q to a random vector of attributes Wij , and the network G is a deterministic functional by Assumption 2. To simplify notation, we

18

A Weak Law for Network Moments let ψpXi , G, X 1 , W pX qq ” ψpXi , GpX 1 , W pX q, 1q, X 1 , W pX qq for any X , X 1 Ď Rd . Theorem 1 (Weak Law). For K P N, let ψ be a K-local node statistic satisfying uniform square-integrability: for some ą 0 and any X P Xn , “ ` ˘ ‰ sup E |ψ X, G, X ` rn´1 pXn ´ Xq, W pXn q |2` ă 8.9 (9) nPN

Then under Assumptions 1-5, as n Ñ 8, ˘ 1 ÿ ` ψ Xi , G, Xi ` rn´1 pXn ´ Xi q, W pXn q |Nn | iPN n ż “ ` ˘‰ L2 ÝÑ E ψ x, G, Pκf pxq Y txu, W pPκf pxq Y txuq f pxq dx, (10) where Pκf pxq is a Poisson point process on Rd with intensity κf pxq, and W pPκf pxq Y txuq “ pWij ; i, j P Pκf pxq Y txuq. That is, W pPκf pxq Y txuq associates each pair of positions in Pκf pxq Y txu with a vector of attributes. Theorem 1 states that network moments converge to the expectations of their analogs applied to a limit model defined below. Condition (9) is a common regularity condition used in triangular array limit theorems. It is obviously satisfied by bounded node statistics, such as the clustering coefficient. However, some moments, such as average degree, contain sums over Nn , in which case additional restrictions are required. In §4, we discuss primitive sufficient conditions for (9) for a class of moments that define an identified set. The left-hand side of (10) differs ř from the ` original definition of network moments ˘ 1 in §2.1, which corresponds to |Nn | iPNn ψ Xi , GpW pXn q, Xn , rn q, Xn , W pXn q . To understand why these two expressions are equivalent and the form of the limit (10), we define the following limit model of network formation: pV, λ, Pκf pXq Y tXu, W pPκf pXq Y tXuq, 1q,

(11)

where X is a random vector with density f . In this model, there exists a countably infinite number of nodes with positions drawn from a Poisson point process with random intensity; the sparsity parameter is fixed at unity; and V and λ remain the same as the finite model defined in Assumption 6. The limit model is obtained from a Poisson approximation that lies at the heart of the “objective method” for proving limit results in stochastic geometry (Penrose and Yukich, 2003). Specifically, in our setting the Poisson process arises from the ˜ j ||, following observation. For i, j P Nn , we can rewrite rn´1 ||Xi ´ Xj || as ||Xi ´ X 9

For a P R and b P Rd , let a pXn ´ bq “ ta pi ´ bq; i P Xn u.

19

Michael P. Leung ˜ j P Xi ` r´1 pXn ´ Xi q. Then we can equivalently represent a finite networkwhere X n formation model as the tuple pV, λ, Xi ` rn´1 pXn ´ Xi q, W pXn q, 1q,

(12)

since Xn enters V and λ only through differences rn´1 ||Xi ´ Xj ||. The significance of this representation is the following Poisson approximation, which holds under an appropriate coupling: ` ˘ lim P Xi ` rn´1 pXn ´ Xq X BpXi , Rq “ Pκf pXi q Y tXi u X BpXi , Rq (13) nÑ8

for any constant R ą 0, where BpXi , Rq Ď Rd is the ball of radius R centered at Xi .10 This suggests we might obtain a limit model by replacing, roughly speaking, the process Xi ` rn´1 pXn ´ Xi q in (12) with Pκf pXi q Y tXi u to obtain something similar to the limit model (11). This still leaves a discrepancy between W pXn q and W pPκf pXq Y tXuq, but note that W pXn q and W pX ` rn´1 pXn ´ Xqq are identically distributed conditional on Xn , since attributes are i.i.d. and independent of Xn , and (13) suggests W pX ` rn´1 pXn ´ Xqq « W pPκf pXq Y tXuq.11 Proof Idea. Under Assumption 4, a pairwise-stable subnetwork formed on any strategic neighborhood only depends on the attributes of nodes within that neighborhood. In this sense, these subnetworks can be interpreted as weakly dependent subunits of G. They are weakly dependent because they depend on Xn , W in a complicated way through the components of Dprn q. Nonetheless, we can show that this dependence is essentially negligible between disjoint strategic neighborhoods asymptotically, provided that each neighborhood contains a finite number of nodes in the limit. If this is the case, then G is comprised of a large set of weakly dependent strategic neighborhoods, which provides the basis for a law of large numbers. Using a branching process argument explained heuristically in §2.2, we can show that under Assumption 3, Dprn q-components have asymptotically finite sizes (see Lemma 3 in §D). Since a strategic neighborhood consists of a Dprn q-component plus all robust links emanating from each node in this component, asymptotic finiteness of strategic neighborhoods follows from sparsity (Assumption 5). Theorem 1 is a corollary of Theorem 4 in §B.1, which generalizes the setup to allow for correlation between positions and attributes. A more detailed sketch of the proof can be found in §A. Remark 4 (Technical Role of Homophily). Conceptually, our approach to generating weak dependence does not rely on the existence of homophily in position, as is clear from the proof sketch above; our assumptions could be restated for models without homophily. Remark 3, for instance, discusses alternative ways to model sparsity. We impose homophily for the technical reason that the only approach available in the 10 11

This is Lemma 3.1 of Penrose and Yukich (2003) and Lemma 6 in the appendix. In §B.1, we consider the case where attributes may be correlated with Xn .

20

A Weak Law for Network Moments random graphs literature (of which we are aware) for deriving a sufficiently general law of large numbers is the objective method of Penrose and Yukich (2003), which only appears applicable for spatial graphs. Given a comparable result for random graphs without a spatial structure, such as the Erdős-Rényi model, we conjecture that our approach could be replicated without requiring homophily. Remark 5 (Equilibrium Existence). While conditions for existence are available for finite models, they do not appear to exist for models with an infinite number of agents. Fortunately, under Assumption 4, in the limit model, G is composed of pairwise-stable subnetworks formed on strategic neighborhoods, and as we show in the proof of Theorem 1, sparsity and subcriticality imply that any strategic neighborhood contains a finite number of nodes in the limit. Hence, the conditions for existence of pairwisestable networks in any finite model guarantee existence in the limit model. Hellmann (2013) establishes existence of pairwise-stable networks under convexity and strategic complementarity conditions. These amount to sign restrictions on coefficients that multiply endogenous statistics. Sparsity imposes no restrictions on these coefficients, while subcriticality imposes restrictions on the magnitudes of these coefficients. This can be seen in Example 5, which satisfies Hellmann’s conditions. Hence, our assumptions are compatible with conditions for existence.

4

Identification and Estimation

In this section, we construct moment inequalities that define an identified set of parameters. In contrast to the existing literature, these inequalities fully exploit the empirical content of a certain set of network moments, while remaining computationally feasible. We then apply Theorem 1 to derive consistent estimators of the moments. Our identification analysis makes use of the following assumptions. Assumption 7 (Analyst’s Information). (a) The selection mechanism λ and dε components of pXi , Xj , Wij q are unobserved by the econometrician. We denote the unobserved subvector by εij and the observed subvector by Zij . Also, the number of nodes Nn is observed. (b) The distribution of εij | Zij and the joint surplus V are known up to a parameter θ0 P Θ Ď Rdθ . We may emphasize this dependence by writing V pδij , Sij , Wij ; θ0 q. Assumption 8. (a) The range of S, denoted by Ψ, is finite. (b) There exists a component of εij whose conditional distribution given Zij is continuous. Assumption 7 and 8(b) are standard. Assumption 8(a) is convenient to impose for computational reasons, and analogous assumptions are used in de Paula et al. (2015), 21

Michael P. Leung Menzel (2016), and Sheng (2014). Example 1 satisfies this condition, and it is often simple to modify a surplus function to ensure finiteness by truncating in the manner of ∆ij . Note that this assumption is not necessary for deriving a characterization of the identified set in general.12 We employ it here to make direct use of a theorem due to Beresteanu et al. (2011) to characterize the empirical content of the model in terms of a finite set of conditional moment inequalities. However, continuously distributed endogenous statistics can easily be accommodated by following our approach below but applying, for example, Theorem 1 of Galichon and Henry (2011) to derive a characterization in terms of an infinite set of conditional moment inequalities.13

4.1

Moment Inequalities

In order to define the moments, we will need several definitions. For A Ď Nn , let GA “ pGij ; i, j P Aq, ZA “ pZij ; i, j P Aq, WA “ pWij ; i, j P Aq, and SA ” SA pG, Xn , W q “ pSij pG, Xn , W q; i, j P Aq. A polyadic outcome of degree K is a tuple YA “ pGA , SA q for |A| “ K. If G is generated according to the finite model in Assumption 6, we let YA prn q denote the polyadic outcome on A induced by G. Notice that Assumption 8 ensures that YA “ pt0, 1u ˆ ΨqKpK´1q{2 , the range of YA , is finite. Lastly, we will write PNn , ENn , VarNn to mean the conditional on Nn probability, expectation, and variance, respectively. Under Assumption 2, the selection mechanism generates a distribution over pairwisestable networks, and marginalizing gives us the distribution of polyadic outcomes of degree K: fi » ˇ ÿ ` ` ˘ ˘ˇ PNn λ Xn , W, rn “ g | Xn , W ˇˇ ZA fl , PNn pYA prn q “ yA | ZA q “ ENn – gPGA pyA ;W q

(14) for all yA P YA , where GA pyA ; W q is the set of networks g on Nn such that yA “ pgA , SA pg, Xn , W qq. A polyadic outcome yA “ pgA , sA q P YA is stable with respect to XA , WA , and θ P Θ if for all i, j P A, V prn´1 ||Xi ´ Xj ||, sij , Wij ; θq ą 0 if gij “ 1, V prn´1 ||Xi ´ Xj ||, sij , Wij ; θq ď 0 if gij “ 0,

(15) (16)

where gij and sij are respectively the ijth elements of gA and sA . The stable set Sθ pA, rn q is the set of polyadic outcomes YA on A that such that YA is stable with respect to XA , WA , and θ. Following Beresteanu et al. (2011), we then define the random set ( Qθ pWA , rn q “ IpYA q; YA P Sθ pA, rn q , 12

See §B.2 in the appendix for a characterization that does not use this assumption. Andrews and Shi (2015), Chernozhukov et al. (2014), and Menzel (2014) develop inference procedures for models with many moment inequalities. 13

22

A Weak Law for Network Moments where IpYA q “ p1tYA “ yA u; yA P YA q. Theorem 2. Let K P Nzt1u and UA “ t0, 1u|YA | for any A Ď Nn with |A| “ K. Under Assumptions 7 and 8, there exists λ satisfying Assumption 2 such that (14) holds for all yA P YA if and only if for any such A, with probability one, « ˇ ff ˇ u1 ENn rIpYA prn qq | ZA s ď ENn sup u1 q ˇˇ ZA @u P UA . (17) qPQθ0 pA,rn q

Theorem 2 characterizes the empirical content of polyadic outcome moments under unrestricted equilibrium selection. This is in contrast to moment inequalities proposed by Miyauchi (2014) and Sheng (2014), which are based on implications of the model and therefore do not fully exploit information contained in the moments they consider.14 Proof Idea. In the conventional cross-sectional setting in which the econometrician observes a large number of independent networks, we typically conceptualize the model as a mapping from the primitives X , W to the set of pairwise-stable networks, since the data reveals the joint distribution of links. This is not the case if only a single network is observed. Our first observation is that in a single-network setting, the econometrician instead observes a large number of polyadic outcomes. By marginalizing over the joint distribution of links as in (14), we can view the model as a mapping from the attributes of a polyad XA , WA to the set of stable polyadic outcomes YA . Theorem 2 derives moment inequalities for this characterization using Theorem 1 of Beresteanu et al. (2011). Our second observation is that, rather than using subnetwork moments as in Sheng (2014), it is important to use polyadic outcome moments to obtain a sharp characterization of the empirical content of these moments in order to fully capture link-formation incentives through (16). These ideas can be easily applied to derive similar moment inequalities for other large games, such as models of social interactions. Connected Polyadic Outcomes. In order for ENn rIpYA prn qq | ZA s to be considered observed, it must be consistently estimable. To use Theorem 1, analog estimators of components of this vector must be written as averages of K-local node statistics. However, a problem arises if YA “ pGA , SA q, where GA is not a connected subnetwork. For instance, if |A| “ 2, and SA is degenerate and always equal to a constant ř 1 s, then the analog estimator of PNn pYA prn q “ p0, sqq is |Nn | i,j p1 ´ Gij q, which is not a K-local node statistic for any K. We restrict attention to polyadic outcomes associated with connected subnetworks, since their sample analogs can always be written as averages of K-local node statistics. 14

If Theorem 1 is applied to construct consistent moment estimators, one must also maintain Assumption 4, which imposes nonparametric restrictions on equilibrium selection. §B.2 in the appendix derives moment inequalities that additionally exploit identifying information contained in this assumption. However, these moments are more computationally expensive and require consistent estimates of the distribution of observables, so we prefer (17) in practice.

23

Michael P. Leung Let Y˜A be the set of polyadic outcomes on A such that for each pGA , SA q P Y˜A , GA is a connected subnetwork. That is, any i, j P A is path-connected in GA , so we say that Y˜A is the set of connected polyadic outcomes on A. Define the random set ( ˜ θ pA, rn q “ ˜IpYA q; YA P Sθ pA, rn q , Q where ˜IpYA q “ p1tYA “ yA u; yA P Y˜A q. We consider moment inequalities « ˇ ff ˇ u1 ENn r˜IpYA prn qq | ZA s ď ENn sup u1 q ˇˇ ZA @u P U˜A , ˜

(18)

qPQθ0 pA,rn q

˜ where U˜A “ t0, 1u|YA | .

Example 9 (Dyadic Outcomes). Consider the specification in Example 1 with θ3 “ 0, ρp¨q defined in (3), and hpαi , αj ; θ1 q “ θ1 . In this specification, when K “ 2, the range of dyadic outcomes is Yij “ t0, 1u2 . Notice that for any s P t0, 1u, exactly one of p1, sq or p0, sq is in Sθ pti, ju, rn q; we refer to this as observation (‹). Abusing notation, for s, t P Yij , let ty, zu be the event that y and z are stable with respect to Wij , Xi , Xj and θ. For example, tp1, 1q, p0, 0qu equals ( ( θ1 ` θ2 ` ζij ě 0 X θ1 ` ζij ă 0 . Since p0, 0q and p0, 1q correspond to an unlinked dyad, which is not a connected subnetwork, Y˜ij “ Yij ztp0, 0q, p0, 1qu “ tp1, 0q, p1, 1qu. Let u “ pu1 , u2 q P Uij , where we associate u1 with Yij “ p1, 1q and u2 with p1, 0q. From observation (‹), it is then easy to see that « ˇ ff ˇ ˇ ˘ ` ENn sup u1 q ˇˇ Zij “ maxtu1 , u2 uPNn p1, 1q, p1, 0q ˇ Zij ˜ θ pti,ju,rn q qPQ

ˇ ˘ ˇ ˘ ` ` ` u1 PNn p1, 1q, p0, 0q ˇ Zij ` u2 PNn p0, 1q, p1, 0q ˇ Zij . (19) Note that the conditional probabilities can be easily simulated, as the events ty, zu simply define a partition of ζ-space, which is one-dimensional. Moreover, if ζij is independent of observables and, say, normally distributed, then the conditional probabilities can be computed in closed form.

4.2

Estimation

We next show consistency of analog estimators of the moments in Theorem 2. This will enable us to define an identified set based on polyadic outcome moments. Equation (18) implies that for any instrument hpZA q, « ff 1 1 u ENn r˜IpYA prn qqhpZA qs ď ENn sup u qhpZA q @u P U˜A . (20) ˜ θ pA,rn q qPQ 0

24

A Weak Law for Network Moments Converting to unconditional moments in such a manner is without loss of generality if we include all instrument functions within the classes constructed in Andrews and Shi (2013). Several of these classes are uniformly bounded, so we will restrict attention to a class H of uniformly bounded instrument functions. We require consistent estimates of scaled limits of the expectations in (20), i.e. limnÑ8 |Nn |K´1 ENn ru1˜IpYA prn qqhpZA qs and limnÑ8 |Nn |K´1 ENn rsupqPQ˜ θ pA,rn q u1 q hpZA qs for any h P H. The scaling by |Nn |K´1 is necessary to obtain nondegenerate limits due to sparsity. The following proposition demonstrates consistency of analog estimators and establishes the existence of these limits. Recall the definition of the network M prq from (30). If G is generated from the limit model (11), then we write YA p1q to denote a polyadic outcome on A induced by G. Proposition 1. Let K P Nzt1u and θ P Θ. Suppose there exists ą 0 such that for any Xi1 distributed according to f , »˜ ¸2` fi ÿ ÿ fl ă 8. ¨¨¨ Mi1 ,i2 prn q ¨ ... ¨ MiK´1 ,iK prn q sup ENn – (21) Nn PN

i2 PNn

iK PNn

Let AK “ ti1 , . . . , iK u Ă N and xAK “ pxi1 , . . . , xiK q, a vector of elements in Rd . Then under Assumptions 1-8, for all u P U˜A and h P H, 1 |Nn | i

p

ÿ

u1˜IpYAK prn qqhpZAK q ÝÑ

1 ,...,iK PNn

ż

ż K

κ

” ı ˇ u1 ENn ˜IpYAK p1qqhpZAK q ˇ XAK “ xAK f px1 qK dx1 . . . dxK , (22)

¨¨¨ Rd

Rd

and 1 |Nn | i

« ÿ

ENn

ż κK

«

ż ENn

¨¨¨ Rd

sup ˜ θ pAK ,rn q qPQ

1 ,...,iK PNn

Rd

ff ˇ ˇ p u1 q ˇˇ ZAK hpZAK q ÝÑ

sup ˜ θ pAK ,1q qPQ

ff ˇ ˇ u1 q hpZAK q ˇˇ XAK “ xAK f px1 qK dx1 . . . dxK . (23)

The left-hand side of (23) is a feasible estimator because the summands can always be computed via simulation. In some cases a closed form exists, as discussed in Example 9. Condition (21) is sufficient for uniform square-integrability. We discuss primitive conditions in §C.3. Identified Set. With Proposition 1, we can now explicitly define an identified set based on polyadic outcome moments. Let µ be a distribution over H. Then (17) 25

Michael P. Leung holds if and only if for any n, K P Nzt1u and A Ď Nn , |A| “ K, «˜ « ff ˇ ff¸ ż ˇ max E u1˜IpYA prn qq ´ ENn sup u1 q ˇˇ ZA hpZA q dµphq “ 0. H uPU˜A

˜ θ pA,rn q qPQ

(cf. Beresteanu et al., 2011). Scaling up the expectations by |Nn |K´1 and taking limits inside the integral yields the following identified set by Proposition 1: „ˆ E u1 IpYAK p1qq´ ΘI “ θ P Θ : sup max κ ˜ dK KPN H uPUA « ff¸ R ff + ˇ ˇ ˇ ˇ ENn sup u1 q ˇˇ ZAK hpZAK q ˇˇ XAK “ xAK f px1 qK dxAK dµphq “ 0 . (24) ˜ "

ż

ż

K

qPQθ pAK ,1q

In practice, K needs to be chosen small to keep the number of moments manageable and also because larger values of K correspond to using high-order moments of the network, which have higher variance. If Zij has finite support, then for any finite K, the set of instrument functions is finite, so we have a finite set of unconditional moments. Then an estimator for ΘI can be constructed using Proposition 1 and results in Chernozhukov et al. (2007).

5

Simulating Networks

In this section, we present an algorithm for generating the set of pairwise-stable networks and analyze its complexity. To the best of our knowledge, this is the first model of network formation for which it is computationally feasible to simulate the set of counterfactual networks. The algorithm proceeds by generating pairwise-stable subnetworks on strategic neighborhoods using brute-force search and then essentially taking a “Cartesian product” of these subnetworks. Algorithm 1 provides a more formal statement of the procedure. We will make use of the following notation. Let ΓpCq represent a network on C Ď Nn . For any strategic neighborhood C ` P C ` pNn , rn q (recall definition (7)), let GpC ` q be the set of networks on C ` that are ˜ pairwise stable. Next, for any C, C 1 Ď Nn , define ΓpCq ˆ ΓpC 1 q to be the network Γ 1 on C Y pC zCq such that $ Γij pCq if i, j P C and either i or j R C 1 ’ ’ & 1 Γij pC q if i, j P C 1 and either i or j R C ˜ ij “ Γ (25) maxtΓij pCq, Γij pC 1 qu if i, j P C X C 1 ’ ’ % 0 otherwise. ˜ defines a “Cartesian product” between two subnetworks on potentially That is, Γ nondisjoint sets of nodes C and C 1 . Lastly, for C1` , C2` P C ` pNn , rn q, we define a 26

A Weak Law for Network Moments “Cartesian product” between sets of pairwise-stable networks GpC1` q and GpC2` q in the natural way: ( GpC1` q ˆ GpC2` q “ Γ1 ˆ Γ2 ; Γ1 P GpC1` q, Γ2 P GpC2` q . The procedure is formally stated in Algorithm 1. Algorithm 1: Procedure for simulating the set of pairwise-stable networks. 1. Given W , Xn , and V , construct Dprn q according to (4). 2. For each C ` P C ` pNn , rn q, compute GpC ` q using brute-force search, keeping fixed robust potential links. Ś 3. The set of pairwise-stable networks on Nn is C ` PC ` pNn ,rn q GpC ` q.

Note that in step 2 one should not naively search over the set of all possible networks on C ` because this is impossible for components of size greater than thirty. Instead, one should always keep fixed robust potential links and only search over pairs pi, jq for which Dij prn q “ 1.15 To understand the definition of the Cartesian product (25) (and hence step 3), note that if i and j do not lie in a common strategic neighborhood, then by definition, Gij is robustly absent, so Gij must be zero, as required by (25). If instead i, j P C1` X C2` , then Gij must be a robust potential link if C1` and C2` are distinct. Hence, maxtΓ1,ij , Γ2,ij u “ Γ1,ij “ Γ2,ij for any Γ1 P GpC1` q and Γ2 P GpC2` q. The next theorem provides conditions under which the complexity of Algorithm 1 is n2 with high probability. Theorem 3. Define µpx, αq “ κ supw f pwq Suppose inf px,αqPsupppXi ,αi q

µpx, αq ą 0 and

ş Rd

`ş Rdα

˘1{2 Λpx, y, α, α1 q2 dΦpα1 | yq dy.

sup

µpx, αq ă 8.

(26)

px,αqPsupppXi ,αi q

Then under Assumption 3, Algorithm 1 has computational complexity of order n2 with probability approaching one as n Ñ 8. 15

To generate a single pairwise-stable network, for example for the purposes of Monte Carlo, we suggest to run myopic best-response dynamics, as in standard practice, but keeping robust potential links fixed in every iteration. This amounts to running Algorithm 1 but replacing brute-force search in step 2 with myopic best-response dynamics starting from an arbitrary initial subnetwork. In our simulations, we find that for a network of 10000 nodes, this procedure takes only ten seconds on a modern laptop.

27

Michael P. Leung Proof. This result is a corollary of Lemma 4 in §D. We give a sketch of the proof here. First, note that step 1 of the algorithm has complexity Opn2 q surely, since Dij prn q is uniquely determined by pXi , Xj , Wij q. Second for any strategic neighborhood C ` , to compute GpC ` q in step 2, one only needs to iterate through all nonrobust links in ř ` C, the Dprn q-component associated with C . There are at most expt i,jPC Dij prn qu such links. Assumption 3 implies that Dprn q is a sparse network, so the number of nonrobust potential links between nodes in a set of size m is of order m. Furthermore, under the assumptions of this theorem, we show that the largest component of Dprn q has size Op plog nq. Consequently, one only needs to search over Op pelog n q elements. Since there are at most n components, this gives an overall complexity of Op pn2 q. The last step of the algorithm is just accounting. When αi has bounded support, condition (26) typically holds, for instance for models with ζij additively separable with full support such as Example 1. Formal verification of these conditions can proceed using the same calculations in Example 5 and §C.1 for the models considered there.

6

Monte Carlo

DGP. We conduct a simulation study to illustrate the informativeness of the identified set based on polyadic outcome moments. The joint surplus is θ1 ` θ2 Zij ` θ3 max Gik Gjk ` ρpXi , Xj q ` εij , k

where ρp¨q is given by (3). Positions are uniformly distributed on r0, 1s2 , Zij “ αi ` αj iid iid K tεij u. To satisfy Assumption 3, we with αi „ Berp0.5q, εij „ N p0, θ42 q, and tαi u K choose ˇˇ ˘´1 ` ˇˇ κ “ π ˇˇP p´θ3 ă θ1 ` θ2 Zij ` εij ď 0 | Zij q ˇˇ2 ´ 0.1, where || ¨ ||2 is the L2 norm. The parameter space is Θ “ r´1, 1s3 ˆ r0, 1s, and we choose θ0 “ pθ1 , θ2 , θ3 , θ4 q “ p´0.2, 0.5, 0.3, 0.9q.16 We consider estimation both when the sparsity parameter rn in (3) is known and unknown. In the unknown case, we use an initial plug-in estimator given in §B.3. The results are virtually identical, since the estimate is extremely close to the actual value. Thus, below we present results only for the unknown case. We simulate networks by computing a pairwise-stable subnetwork on each strategic neighborhood using myopic best-response dynamics and taking the “Cartesian product” as in Algorithm 1.17 We estimate models with n “ 5000. 16

To arrive at our choice of Θ, we can assume that the parameter space is compact, and without loss of generality, a cube centered at the origin. We can then normalize this to the cube of unit length and drop the negative portion of the parameter space for θ4 , since it is a standard deviation. 17 We set the initial network to be the network of geographic neighbors 1t||Xi ´ Xj || ď rn u.

28

A Weak Law for Network Moments Moments. We estimate an identified set defined by dyadic and triadic outcome moments, i.e. polyadic outcome moments of degree K P t2, 3u. Since ρp¨q is the hard-threshold penalty, additional moments become available (see §B.3), which we use in estimation. To simplify the computation, we use only a subset of the available triadic outcome moments, specifically those for which the left-hand side of (17) equals P pGij “ Gjk “ Gik “ 1, Sij “ Sjk “ Sik “ 1 | Zijk q (transitive triads) and P pGij “ Gjk “ 1 ´ Gik “ 1, Sij “ s1 , Sjk “ s2 , Sik “ 1 | Zijk q for s1 , s2 P t0, 1u (intransitive triads). For dyadic outcome moments, we use instrument functions 1tZij “ ku for k P t0, 1, 2u, and for triadic outcome moments, 1tZij “ k1 , Zjk “ k2 , Zik “ k3 u for k1 , k2 , k3 P t0, 1, 2u. Since there are a finite number of instruments, this results in a finite number of unconditional moment inequalities. We estimate the moments using the analog estimators given by Proposition 1. We compute an estimate of the identified set by performing a grid search on Θ and including all parameters θ that satisfy J ÿ

(2 log n , max mjn pθq, 0 ď n j“1 where pmjn pθq; j “ 1, . . . , Jq is the vector of empirical moments. The threshold choice follows Chernozhukov et al. (2007). The step size for the grid is 0.1. Network Statistics. Tables 1 and 2 display summary statistics for the networks G and Dprn q when n “ 5000, averaged over 100 simulations. The average degree is approximated by multiplying the number of links by n2 . From this quantity, it is clear that the networks are sparse. “Giant Size” is the number of nodes lying in the giant component. We can see that G contains a giant component and virtually all nodes lie within this component. As Lemma 3 in §D predicts, Dprn q does not contain a giant component and contains over 2600 components on average. “Clustering” is the clustering coefficient, a standard measure of transitivity, and we see that G exhibits nontrivial clustering, as expected since θ3 ą 0 and there exists geographic homophily.18 Results. We simulate the model and estimate the identified set 100 times. Table 3 displays summaries of these projections aggregated across the simulations for n “ 5000. The third and fourth columns display, respectively, the narrowest and widest projections across all simulations. To give a sense of the variation across simulations, the “Mean Endpoints” column displays a set rL, U s where L is the smallest value of the parameter in the given dimension, averaged across the simulations, and U is the largest value. From Table 3, we can see that the signs of θ2 and θ3 are correctly estimated across all simulations. Thus, we find that all parameters but the intercept are distinguished from zero, and their signs are correctly estimated. Also, projections of the estimated 18

By comparison, a network generated by an Erdős-Rényi null model has clustering coefficient equal to the fraction of linked pairs, which is essentially zero due to sparsity.

29

Michael P. Leung set onto individual coordinates of θ always contain the corresponding projection of θ0 in all simulations. Table 1: Summary statistics for G. Mean # Nodes # Links Clustering Giant Size # Components

SD

Min

Max

5015.13 65.99 4863.00 15207.05 257.33 14483.00 0.22 0.00 0.21 4940.47 68.79 4785.00 46.03 6.60 30.00

5249.00 15858.00 0.22 5093.00 64.00

n “ 5000, 100 simulations.

Table 2: Summary statistics for Dprn q. Mean # Nodes # Links Clustering Giant Size # Components

SD

Min

Max

5015.13 65.99 4863.00 5249.00 2328.56 63.07 2185.00 2524.00 0.02 0.00 0.02 0.03 37.37 8.02 21.00 63.00 2774.31 54.07 2637.00 2905.00

n “ 5000, 100 simulations.

Table 3: Projections of estimated identified set, n “ 5000. True Value θ1 θ2 θ3 θ4

7

´0.20 0.50 0.30 0.90

Mean Endpoints [´0.90, [0.20, [0.10, [0.40,

Narrowest

Widest

0.07] [´0.90, 0.00] [´0.90, 0.10] 0.60] [0.20, 0.50] [0.20, 0.60] 1.00] [0.10, 1.00] [0.10, 1.00] 1.00] [0.40, 1.00] [0.40, 1.00]

Conclusion

This paper develops asymptotic theory for strategic network-formation models when the econometrician observes a single large network. We observe that for typical random-utility models of network formation, in any realization of the network, there 30

A Weak Law for Network Moments exist potential links whose pairwise stability is invariant to changes to the network. If strategic interactions are local, such links necessarily partition the network into “strategic neighborhoods,” where the pairwise stability of a subnetwork formed on any such neighborhood does not depend on links or attributes of nodes in the ambient network. These neighborhoods form weakly dependent subnetworks if they are asymptotically finite. We establish finiteness by imposing a subcriticality condition that restricts the size of strategic neighborhoods and a local coordination condition ensuring that equilibrium selection occurs separately across these neighborhoods. Under these conditions, we derive a weak law for a large class of network moments. The proof draws on results in random-graph theory on component sizes of inhomogeneous random graphs (Bollobás et al., 2007) and the objective method for deriving limit theorems for functionals of random point sets (Penrose and Yukich, 2003). We also characterize an identified set of parameters based on a new class of network moments with sharpness advantages over the existing literature and apply our law of large numbers to construct consistent estimators. Furthermore, we show that under our model, the set of pairwise-stable networks can be feasibly simulated using a simple modification to brute-force search and show that the runtime is order n2 with high probability, where n is the mean network size. Our arguments for deriving the identified set and simulating equilibrium outcomes are also easily applied to other large games. The asymptotic theory in this paper focuses on establishing a weak law. A central limit theorem may be attainable by building on the ideas of Penrose and Yukich (2001). We are currently studying this issue in a separate project.

A

Theorem 1 Proof Sketch

From the discussion of (12) and Assumption 2, notice that G can be written as a deterministic functional of W and x ` rn´1 pXn ´ xq for any x P Rd . Thus we can define a deterministic functional ξ such that ξpX, X ` rn´1 pXn ´ Xq, W pXn qq “ ψpX, G, X ` rn´1 pXn ´ Xq, W pXn qq. The proof then proceeds in two broad steps. We first prove a law of large numbers 1 ÿ ξpXi , Xi ` rn´1 pXn ´ Xi q, W pXn qq |Nn | iPN n ż ” ı L2 ˜ Pκf pxq Y txu, W pPκf pxq Y txuqq f pxq dx (27) ÝÑ E ξpx, for general functionals ξ˜ satisfying a certain stability property.19 We then verify this stability property for ξ to establish Theorem 1. 19

This is Theorem 6 in §E.

31

Michael P. Leung Step 1. Penrose and Yukich (2003) prove (27) for the case in which ξ˜ does not depend on W . Key to their argument is the assumption that ξ˜ satisfies a certain stability property, a version of which we can state as follows: for any X „ f , there exists an almost surely finite radius R such that ` ˘ ˜ ξpX, Pκf pXq Y tXuq “ ξ˜ X, pPκf pXq Y tXuq X BpX, Rq Y A (28) for any A Ď Rd . This property states that ξ˜ is invariant to perturbations of the point set Pκf pXq outside some neighborhood of i. One can then apply approximation (13) to replace the process X `rn´1 pXn ´Xq in the expectation of the left-hand side of (27) with Pκf piq to establish convergence in mean. This argument can likewise be used to show concentration of the variance by exploiting a spatial independence property of the Poisson point process, which thus establishes L2 convergence. We extend this argument to allow for dependence on W under the assumption that ξ˜ satisfies a W -strong stability property analogous to (28) (Definition 1 in the appendix), a simplified version of which can be stated as follows: there exists an almost surely finite radius R such that for n sufficiently large, ` ˘ ` ˘ ξ X, X ` rn´1 pXn ´ Xq, W “ ξ X, X ` rn´1 pXn X BpX, Rrn q ´ Xq, W . (29) That is, ξ˜ is invariant to only the removal of points outside of some neighborhood, whereas (28) also demands invariance to the addition of points. Furthermore, this needs to hold “uniformly” in n across finite models where X “ X ` rn´1 pXn ´ Xq. Step 2. To prove Theorem 1, it remains to verify W -strong stability for ξ. Define the network M prq, where for all i, j P N , " * Mij prq “ 1 sup V pδij , s, Wij q ą 0 . (30) s

Also recall (7), the definition of i’s strategic neighborhood Ci` from §2.3, and the definition of path distance LG pi, jq from §2.1. Define the path distance between two sets of nodes A, A1 Ď N in the network G as LG pA, A1 q “ mintLG pk, `q; k P A, ` P A1 u. A crucial object for what follows is Ji ” Ji pX , W, rq “ Ci` Y Cj` P N : j P N , LM prq pCi` , Cj` q ď Ku.

(31)

Thus, Ji is the union of strategic neighborhoods C ` such that the M prq-path distance between Ci` and C ` is at most K. Example 10. Consider Figure 3, which depicts robust links in black and nonrobust links in gray. Robustly absent links are unmarked. The strategic neighborhoods are ` C1` “ C2` “ C3` “ t1, 2, 3, 4u, C4` “ t3, 4, 5, 6, 7u, C7` “ t6, 7, 8, 9, 10u, and C10 “ t9, 10u. We consider node statistics satisfying 2-locality. Then node 3’s statistic can depend on node 7’s attribute. This accords with our definition of J3 , since 4 P C3` , 32

A Weak Law for Network Moments

2 1

5 3

4

8 6

7

9

10

Figure 3: Robust and nonrobust links.

6 P C7` , and LM prq p4, 6q ď 2, which implies 7 P J3 when K “ 2. On the other hand, node 1’s statistic does not depend on the attributes of node 10, so accordingly, ` C10 “ t9, 10u and LM prq pk, lq ą 2 for all k P C1` and l P t9, 10u, which implies 10 R J1 . The significance of Ji is the following lemma. Lemma 2. Let J˜i “ Ji pXi ` rn´1 pXn ´ Xi q, W pXn q, 1q. Under Assumptions 2 and 4, ξpi, Xi ` rn´1 pXn ´ Xi q, W pXn qq “ ξpXi , Xi ` rn´1 pXJ˜i ´ Xi q, W pXJ˜i qq

(32)

if ξ is a K-local node statistic. Proof. Let G the network defined by (2), generated from the network-formation model with r “ 1, positions drawn from Xi ` rn´1 pXn ´ Xi q, and attributes given by W pXn q. Let M p1q be the network defined by (30) induced by this model. By K-locality, ξpXi , Xi ` rn´1 pXn ´ Xi q, W pXn qq depends on its arguments only through the positions, attributes, and potential links of nodes in i’s K-neighborhood in G. Since Gij ď Mij p1q, the same statement is true for i’s K-neighborhood in M p1q. The realizations of potential links formed by all pairs within the latter neighborhood are entirely determined by their respective strategic neighborhoods by Assumptions 2 and 4. Thus the claim follows by definition of J˜i . We can then verify (29) as follows. Under Assumption 3, the size of Ci is finite in the limit model for any i by a branching process argument discussed in §2.2.20 By sparsity (Assumption 5), each node forms a finite number of robust links in the limit model. Consequently, the size of Ci` is finite for all i and therefore so is the size of J˜i . Hence, we can find some finite radius R such that XJi Ď BpXi , Rq, and (29) follows by Lemma 2. 20

For a formal argument, see Lemma 3 in §D.

33

Michael P. Leung

B

Extensions

B.1

General Model with Position-Dependent Attributes

In this section, we define the general model that allows attributes to be correlated with positions. All proofs in the appendix are derived under this model. We first define attributes and positions in terms of objects constructed on a common probability space. On a suitable such space, let X, Y be i.i.d. random vectors drawn from a density f bounded above on Rd ; P1 a Poisson process of rate one on Rd ˆ r0, 8q independent of X, Y ; and t˜ αx ; x P Nu and tζxy ; x, y P Nu respectively i.i.d. random vectors, both independent of P1 , X, Y . Positions. We define a coupling for the processes Xn and Pκf pXq using a construction due to Penrose and Yukich (2003). ( ˚ be the restriction of P to pw, tq P Rd ˆ r0, 8q : t ď nf pwq • Finite Model. Let Pnf 1 ˚ under the projection pw, tq ÞÑ w for w P Rd and t P r0, 8q. and Pnf the image of Pnf Then Pnf is an inhomogeneous Poisson point process with intensity function nf p¨q by the mapping theorem for Poisson processes (e.g. Kingman, 1993). We take Xn “ Pnf Y tX, Y u. Our original definition of Xn has the same distribution by Proposition 1.5 of Penrose (2003). Let Nn “ t1, . . . , |Xn |u. ˚ • Limit Model. For any x P Rd , let Pnf pxq be the restriction of P1 to tpw, tq : t ď nf pxqu n n ˚ and Pκf pxq the image of Pnf pxq under pw, tq ÞÑ x ` rn´1 pw ´ xq. Then for all n, Pκf pxq is a Poisson point process on Rd with intensity κf pxq by the mapping theorem. We κ take Pκf pxq “ Pκf pxq .

Attributes. Order the elements of P1 lexicographically and label them 1, 2, . . . . Let N be the set of labels. We associate with each label i P N a node-level pseudo-attribute vector α ˜ i and each pair of labels i, j P N a pair-level pseudo-attribute vector ζ˜ij . Each i P N labels an element pXi , τi q P P1 , where Xi P Rd and τi P r0, 8q. For each label i, we will let Xi denote the position of x in Rd . For i, j P N , define node-level attributes αi “ Hα pXi , α ˜ i q and pair-level attributes ζij “ Hζ pXi , Xj , ζ˜ij q, where Hα and Hζ are functions with respective ranges Rdα and Rdζ . Note that this construction allows for arbitrary dependence between a node’s attributes and her position. Let Φα p¨ | xq be the conditional CDF of αi given Xi “ x and Φζ p¨ | x, yq that of ζij given Xi “ x, Xj “ y. Projection. We often work with transformations of the finite model x ` rn´1 pXn ´ xq, for example (13). However, we are usually concerned with attributes evaluated at the untransformed set of node positions W pXn q, for example (12). For this reason, we define the projection px,r : x ÞÑ px ´ iqr ` x for r P R` and x P Rd . Then X “ px,r px ` r´1 pX ´ xqq for any X Ď Rd . We abuse notation and define px,r pX q “ tpx,r pyq; y P X u. Note this is not the image of X under the projection px,r ; there may be identical elements in this set, in particular when r “ 0. n Now let X˜ P tPnf u Y tPnf pxq ; x P supppf qu. Consider the transformed set of positions x ` r´1 pX˜ ´ xq. To recover the attributes associated with the untransformed set X˜ , we

34

A Weak Law for Network Moments ` ˘ define W px,r px ` r´1 pX˜ ´ xqq as the attribute array ´`

˘ Hα ppx,r pXq, α ˜ `pXq q, Hα ppx,r pY q, α ˜ `pY q q, Hζ ppx,r pXq, px,r pY q, ζ˜`pXq`pY q q ; ¯ X, Y P x ` r´1 pX˜ ´ xq , (33)

where `pXq is the label in N associated with px,r pXq. We will write W ˝ px,r when the argument of W ppx,r p¨qq is left implicit. Finite and Limit Models. We next modify the definitions of the finite and limit models in §3 to allow for dependence between attributes and positions. The (scaled and translated) finite model is pV, λ, X ` rn´1 pXn ´ Xq, W ˝ pX,rn , 1q, where X „ f , and the argument of W ˝ pX,rn is implicitly given by the third element of the tuple. By construction, this is equivalent to the finite model in Assumption 6 for reasons identical to (12). The limit model corresponds to the tuple pV, λ, Pκf pXq Y tXu, W ˝ pX,0 , 1q,

(34)

The projection pX,0 means we set the positions of all nodes in Pκf pXq Y tXu equal to X when constructing attributes. There will still be variation in attributes to the extent that Hα and Hζ depend on their respective last components. An alternate view of the limit model is that there exists a continuum of nodes with positions X distributed according to f and each fixed node X forms links under its own “local” limit model (34). Formally, this localization originates from approximation (13), which states that within a small radius Rrn of X, the set of node positions is approximately distributed according to a homogeneous Poisson point process. Assumptions. We next state assumptions generalizing those in §2 to allow for dependence between attributes and positions. The first assumption, however, is new, and imposes some regularity conditions on the distribution of attributes and the smoothness of various primitives with respect to attributes. Assumption 9 (Regularity). Let X P tPκf pxq ; x P supppf qu and N the associated set of node labels. (a) V pδij , SpXi , Xj , g´ij , p||X ´ Y ||; X, Y P X q, W q, W q, λpX , W, rq, and ψpXi , g, X , W q are continuous in W for any Xi , Xj P X and network g defined on X .21 (b) Φα pα1 | xq and Φζ pζ 1 | x, yq are continuous in x, y for any α1 , ζ 1 . (c) tWij ; i, j P N u is bounded in probability. 21

For instance, λpX , ¨, rq is continuous at W if for every ą 0 there exists δ ą 0 such that if W 1 satisfies supi,jPN |Wij ´ Wij1 | ă δ, then |λpX , W pX q, rq ´ λpX , W 1 pX q, rq| ă .

35

Michael P. Leung Parts (b) and (c) hold automatically when attributes are identically distributed across node positions. Part (a) is new and not required in the identically distributed case. Let N be the set of node labels associated with X˜ , the latter defined prior to (33). For x,r x P Rd , r P R` , and i, j P N , let Wijx,r “ pαix,r , αjx,r , ζij q be the entry of (33) corresponding x,t to nodes i, j P N , and let D prq be the network for which " * x,t x,t x,t ´1 ´1 Dij prq “ 1 inf V pr ||Xi ´ Xj ||, s, Wij q ď 0 X sup V pr ||Xi ´ Xj ||, s, Wij q ą 0 s

s

for all i, j P N , where Xi , Xj are their associated positions in X˜ . Evidently, these coincide with Wij and Dij prq defined in the main text when t “ 1. The next assumptions generalize Assumptions 3 and 5. Assumption 10 (Subcriticality). Let Φw,t pα | xq be the conditional distribution of αiw,t given ˇ w,t p1q ˇ X1 “ x, X2 “ y, αiw,t “ α, αjw,t “ α1 s. There exists Xi “ x and Λw,t px, y, α, α1 q “ ErD12 t1 P R` such that ¨ ż sup κf pzq ˝

˜ż

Λw,t px, y, α, α1 q2 dΦw,t pα1 | yq

Rdα

Rd

˛1{2

¸2

˙1{2

ˆż

dy

dΦw,t pα | xq‚

ă 1,

Rdα

where the supremum is taken over all w P supppf q, x, z P Rd , and 0 ď t ď t1 . w,t Note that compared to Dij p1q, Dij p1q simply replaces Wij with Wijw,t , which is the vector of pair-level attributes when X1 and X2 are positioned “locally” to w, where we think of the localization parameter t as small.

Assumption 11 (Sparsity). There exists t2 P R` such that for any w P supppf q, ż ´ ¯ ˇ w,t sup sup P inf V p||X1 ´ X2 ||, s, W12 q ą 0 ˇ X1 “ x, X2 “ y dy ă 8. xPRd 0ďtďt2 Rd

s

If attributes are identically distributed, meaning that Hα and Hζ do not depend on positions, then both of Assumptions 10 and 11 hold for any t1 , t2 , since Wijw,t does not depend on t. LLN. We can now state our main result, the proof of which can be found in §F. It generalizes Theorem 1 to allow dependence between attributes and positions. In our notation for ψ, when we write W ˝ px,r , the argument for this attribute array is implicitly given by the third argument of ψ. Also, as in the statement of Theorem 1, we omit the arguments of GpX , W, rq in ψ with the understanding that the first two arguments of G are given respectively by the third and fourth arguments of ψ, and the argument r is set to unity. Theorem 4. For K P N, let ψ be a K-local node statistic satisfying uniform squareintegrability: for some ą 0, “ ` ˘ ‰ sup E ||ψ X, G, X ` rn´1 pXn ´ Xq, W ˝ pX,rn ||2` ă 8. (35) nPN

36

A Weak Law for Network Moments Then under Assumptions 1, 2, 4, 6, and 9-11, ˘ 1 ÿ ` ψ Xi , G, Xi ` rn´1 pXn ´ Xi q, W ˝ pXi ,rn |Nn | iPN n ż “ ` ˘‰ L2 ÝÑ E ψ x, G, Pκf pxq Y txu, W ˝ px,0 f pxq dx.

B.2

Identified Set Under Local Coordination

The main result in this subsection is a characterization of the empirical content of polyadic outcome moments. Unlike Theorem 2, the result exploits additional identifying information contained in Assumption 4, which we require for a law of large numbers. The moment inequalities we will derive differ from those of Sheng (2014) in two respects: (i) they incorporate the distribution of endogenous statistics, and (ii) they do not require “truncating” at the subnetwork level for computational feasibility. This is unnecessary in our setting because strategic neighborhoods produce a natural sharp truncation point.22 These differences enable us to sharply characterize the empirical content of polyadic outcome moments when the selection mechanism satisfies local coordination. To derive the result, it will be more useful to define a selection mechanism according to the standard definition, as a conditional distribution σpg | X , W q with support contained in GpX , W, rq, rather than a function with range GpX , W, rq as in Assumption 2. In that case, Assumption 2(b) can be restated as saying that there exists some such conditional distribution σ such that PpG “ g | X , W q “ σpg | X , W q. Remark 6 below shows that our original definition subsumes this definition. Let A Ď Nn with |A| “ K. Recall that Y˜A is the range of polyadic outcomes pgA , sA q such that gA is a connected subnetwork. Recalling the definition of strategic neighborhoods ` from §2.3, let CA be the subset of tCi` ; i P Au consisting of all distinct elements in that set (multiple nodes can have the same strategic neighborhood). Let Πprq “ tti, ju Ď YCPC ` C : A

` j R Ci` u. This is the set of pairs pi, jq that lie in the union of strategic neighborhoods in CA such that i and j are not members of the same strategic neighborhood. In order to economize ` ` ` on notation, we define WA` “ pWij ; i, j P CA q and likewise XA` and gA “ pgij ; i, j P CA q for any network g on Nn . Let S Ď Y˜A . By Assumptions 1, 2, and 4,

ˇ ˘ ` PNn YA prn q P S ˇ ZA “ „ ÿ ENn ` ` ` ` gA :pgA ,SA pgA ,XA ,WA qqPS

ź

p1 ´ gij q

ti,juPΠprn q

ź ` CPCA

ˇ ˘ σ gC ˇ XC , WC `

ˇ  ˇ ˇ ZA ˇ

@S Ď Y˜A , A Ď Nn , |A| “ K, (36) ` ` ` where the sum is over networks gA on CA for which pgA , SA pgA , XA` , WA` qq is a polyadic ` outcome in S, gA is the subnetwork on A induced by gA , and gij is the potential link ` between i and j induced by gA . Note that the set of endogenous statistics SA only depends 22

I thank a referee for pointing this out.

37

Michael P. Leung ` on g and W through gA and WA` due to Assumption 1. Assumption 4 is responsible for ` the product over C P CA . To understand the product over ti, ju P Πprq, note that for any i, j P YCPC ` C, either j lies in Ci` , i’s strategic neighborhood, or their potential link is A

` robustly absent. Thus, the product sets potential links to zero in gA between pairs pi, jq for ` which j R Ci . Following Ciliberto and Tamer (2009) and Sheng (2014), we note that for any strategic neighborhood C Ď Nn , ( ` ˇ ˘ 1 GpXC , WC , rn q “ tgC u ď σ gC ˇ XC , WC ď 1tgC P GpXC , WC , rn qu,

and these bounds sharply characterize ˇ the˘ available information about σ. We therefore ` obtain sharp bounds on PNn YA P S ˇ ZA by replacing the selection mechanism in (36) with these upper and lower bounds. Theorem 5. Let K P Nzt1u and A Ď Nn with |A| “ K. Under Assumptions 1 and 7, there exists λ satisfying Assumptions 2 and 4 such ˘ if and only if for any such A ` that (36)ˇ holds ˜ ˇ and all S Ď YA , with probability one, PNn YA prn q P S ZA P rLpSq, U pSqs, where ˇ  „ ÿ ź ź ˇ LpSq “ ENn p1 ´ gij q 1 tGpXC , WC , rq “ tgC uu ˇˇ ZA , ` ` ` ` gA :pgA ,SA pgA ,XA ,WA qqPS ti,juPΠprn q

` CPCA

„ U pSq “ ENn

ÿ

ź

` ` ` ` gA :pgA ,SA pgA ,XA ,WA qqPS

ti,juPΠprn q

p1 ´ gij q

ź ` CPCA

ˇ  ˇ 1 tgC P GpXC , WC , rqu ˇˇ ZA .

Note that the upper (lower) bound is the probability that there exists a (unique) pairwise` ` ` stable network gA on CA such that the implied polyadic outcome pgA , SA pgA , XA` , WA` qq lies in S. Proof of Theorem 5. The “only if” direction is already proven above. For the “if” direction, suppose to the contrary that Assumptions 2 and/or 4 do not hold. In the case where Assumption 2 fails to hold, this follows from the proof of the “if” direction of Theorem 2. So suppose Assumption 2 holds but` Assumptionˇ 4 does not. Then by construction (36) ˘ fails to hold, so the bounds on PNn YA prn q P S ˇ ZA are invalid, which establishes the contrapositive. With this result, we can define the identified set of parameters in a manner similar to (24). Note that unlike Theorem 2, we do not maintain Assumption 8 for this characterization. In principle, the bounds can be computed via simulation for any candidate parameter value, although this requires a consistent estimate of the distribution of observed attributes. With such an estimate, the upper bound U pSq is straightforward to simulate by repeatedly drawing ` Xn , W , constructing CA , computing the sum in the definition of U pSq, and then averaging over simulation draws. The lower bound LpSq is more computationally intensive, since for each strategic neighborhood C, one must check whether gC is the unique equilibrium. However, this can be done using Algorithm 1, which is fast under the conditions of Lemma 4.

38

A Weak Law for Network Moments Remark 6. We justify a claim made above and in footnote 4 that our definition of the selection mechanism in Assumption 2 contains the conventional definition as a special case. A selection mechanism is more commonly represented as a conditional distribution σpG | X , W q with support contained in GpX , W, rq. To show that our definition is equivalent, let G1 , G2 , . . . enumerate the elements of GpX , W, rq. Let p1 , p2 , . . . be the corresponding probabilities assigned to the realization of each such network, according to σp¨ | X , W q. Partition the unit interval r0, 1s into subintervals S1 , S2 , . . . , where Si “ rpi´1 , pi´1 ` pi s, with ř p “ 1. Endow each node i P X with an “auxp0 ” 0. This is well defined since 8 i i“1 iliary type” (in the terminology of Menzel, 2015a) νi P R, where, conditional on X , W , iid tνi ; i P N u „ U r0, 1s. Let ν¯ be the auxiliary type of the node closest to the origin, arbitrarily breaking ties, and let ν “ pνi ; i P N q. Define λpν, X , W, rq as Gi if ν¯ P rpi´1 , pi´1 ` pi s. Then by construction, Ppλpν, X , W, rq “ g | X , W q “ σpg | X , W q for any g P GpX , W, rq. This is a special case of our original definition because we can let νi be the first component of αi and rewrite λpν, X , W, rq as λpX , W, rq.

B.3

Estimation with Hard-Threshold Penalty

Suppose Gij “ 1 only if ||Xi ´ Xj || ď rn as is the case in Example 1 with ρp¨q given by (3), and suppose node positions are observed. Then additional moments can be consistently estimated. Define the network of geographic neighbors Rprn q with Rij prn q “ 1t||Xi ´Xj || ď rn u. We define the augmented polyadic outcome of degree K for A Ď Nn , |A| “ K, as a tuple o , and YAo “ pRA prn q, GA , SA q satisfying GA Ď RA prn q. Define the set of such outcomes YK o 23 let Y˜K be the set of polyadic outcomes for which RA prn q is a connected subnetwork. Then it is easy to show that Theorem 2 remains valid after replacing YA with YAo and YK with o . We can use moment inequalities (18) in estimation if we replace Y ˜K in the definition YK o . with Y˜K Example 11. To illustrate how this changes the right-hand side of (17), consider the example of dyadic outcomes (Example 9). We have Y2o “ t0, 1u3 and Y˜2o “ t1u ˆ t0, 1u2 , the latter because augmented polyadic outcomes with Rij prn q “ 0 violate the requirement that RA prn q is connected. Let u “ pu1 , u2 , u3 , u4 q P Uij , where we associate u1 with Yij “ p1, 1, 1q, u2 with p1, 0, 1q, u3 with p1, 1, 0q, and u4 with p1, 0, 0q. For s, t P Y2 , let ty, zu be the event that y and z are stable with respect to Wij and θ and that Rij prn q “ 1. For example, tp1, 1, 1q, p1, 0, 0qu equals ( ( tRij prn q “ 1u X θ1 ` θ2 ` ζij ě 0 X θ1 ` ζij ă 0 . Similar to the definition of Y˜K , the requirement that RA prn q is connected ensures that analog estimators of the implied moments are valid node statistics, so Theorem 1 is applicable. 23

39

Michael P. Leung Then using observation (‹) in Example 9, « ff ˇ ˇ ˇ ˘ ` 1 ˇ ENn sup u q ˇ Zij “ maxtu1 , u3 uPNn p1, 1, 1q, p1, 1, 0q ˇ Zij qPQθ pWij ,rn q ˇ ˘ ` ` maxtu1 , u4 uPNn p1, 1, 1q, p1, 0, 0q ˇ Zij ˇ ˘ ` ` maxtu2 , u3 uPNn p1, 0, 1q, p1, 1, 0q ˇ Zij ˇ ˘ ` ` maxtu2 , u4 uPNn p1, 0, 1q, p1, 0, 0q ˇ Zij . Theorem 1 cannot be applied to establish consistency of analog estimators of augmented polyadic outcome moments, since they cannot be written as averages of node statistics currently defined. However, it is simple matter to generalize this definition. Say that ψ is Klocal˚ if ψpXi , G, X , W q “ ψpXi , G1 , X 1 , W 1 q for any node i, networks G, G1 , R, R1 , attributes 1 , G R 1 1 1 1 W, W 1 , and positions X , X 1 for which Rjk “ Rjk jk jk “ Gjk Rjk , and Wjk Rjk “ Wjk Rjk for all j, k P N such that LR pi, jq ď K ´ 1. Analog estimators for augmented polyadic outcome moments can be written as averages of K-local˚ node statistics, and Theorem 1 continues to hold for this larger class of moments under virtually the exact same proof. The only differences are (1) we replace M prn q with Rprn q and (2) we use sparsity of Rprn q, which follows from Assumption 5. We conjecture that rˆ “ maxt||Xi ´ Xj || : Gij “ 1u is a consistent estimator for rn .24 In our simulations, we find that the difference between rˆ and rn is negligible for moderate sample sizes, and the results in §6 are virtually identical when rn is treated as known and when rn is replaced rˆ.

C

Calculations and Examples

C.1

Continuous Penalty Example

Consider the model in Example 1 with hpαi , αj ; θ1 q “ θ1 , θ2 , θ3 ě 0, ρpXi , Xj q “ ´θ4 ||Xi ´ Xj ||, and positions uniformly distributed on r0, 1s2 . We first verify Assumption 5: ż τ

´ ¯ ˇ P inf V p||X1 ´ X2 ||, s, W12 q ą 0 ˇ X2 “ y dy s Rd ż “κ P pθ1 ´ θ4 ||X1 ´ X2 || ` ζ12 ą 0 | X2 “ yq dy. (37) Rd

Sparsity requires this to be finite, which amounts to a restriction on the tails of the distribution of ζij . In fact, (37) is finite if ζij „ N p0, ş81q. To see this, note that after a change of variables to polar coordinates, (37) equals 2πκ 0 xΦpθ1 ´ xq dx, where Φ is the CDF of the standard normal distribution. Some calculus shows that for constants a, b with b ă 0, ż8 ˘ 1 ` xΦpa ` bxq dx “ 2 pa2 ` 1qΦpaq ` aφpaq , (38) 2b 0 p

p

It is trivial to see that |ˆ rn ´ rn | ÝÑ 0. The stronger notion of ratio-consistency ( rrˆn ÝÑ 1) appears harder to prove. 24

40

A Weak Law for Network Moments where φ is the density of the standard normal distribution. Therefore, setting a “ θ1 , and b “ ´θ4 , (37) equals ‰ κπ “ 2 pθ ` 1qΦpθ q ` θ φpθ q , (39) 1 1 1 1 θ42 which is finite when θ4 ą 0, as desired. Turning to Assumption 3, the left-hand side of (5) equals ż sup κ xPRd

Rd

¯ ă θ1 ´ θ4 ||x ´ y|| ` ζ12 ď 0q dy Pp´θ2 ´ θ3 ∆ ż “ ‰ ¯ ´ Φpθ1 ´ θ4 ||y||q dy “κ Φpθ1 ´ θ4 ||y|| ` θ2 ` θ3 ∆q Rd ż8 ‰ “ ¯ ´ Φpθ1 ´ θ4 zq dz. “ 2πκ z Φpθ1 ´ θ4 z ` θ2 ` θ3 ∆q 0

Applying (38), we obtain the following closed-form expression for the left-hand side of (5): ˘ κπ “` ¯ 2 ` 1 Φpθ1 ` θ2 ` θ3 ∆q ¯ pθ1 ` θ2 ` θ3 ∆q 2 θ4 ` 2 ˘ ‰ ¯ ¯ `pθ1 ` θ2 ` θ3 ∆qφpθ 1 ` θ2 ` θ3 ∆q ´ θ1 ` 1 Φpθ1 q ´ θ1 φpθ1 q . (40) Note that unlike Example 5, which uses the hard-threshold penalty, here we do not have a simple multiplicatively separable form between a sparsity parameter and a marginal effect of externalities. To get a sense of magnitudes, consider θ4 “ κ1{2 . Then when θ1 “ 0, we have (40) ă 1 ¯ ď 0.32, which when ||Xi ´ Xj || “ 0 yields a marginal effect of externalities of if θ2 ` θ3 ∆ 0.125. When θ1 “ ´1, so that there is a higher exogenous cost to link formation, then the ¯ ď 0.85, which when ||Xi ´ Xj || “ 0 yields a marginal effect same inequality holds if θ2 ` θ3 ∆ of 0.282.

C.2

Proof of Equation (8)

The expected number of robust links formed by a node i, conditional on her position, equals « E

Nÿ n `2

j“1,j‰i

! ) 1 inf V prn´1 ||Xi ´ Xj ||, s, Wij q ą 0 s

» “ E–

ÿ ´1 Xj PXi `rn pXn ´Xi q

ˇ ff ˇ ˇ Xi ˇ

fi ¯ ˇˇ ˇ P inf V p||Xi ´ Xj ||, s, Wij q ą 0 ˇ Xi , Xj ˇˇ Xi fl . (41) s ´

d

Since Nn „ Poissonpnq, by Proposition 1.5 of Penrose (2003), Xn “ Pnf YtXi , Xj u, where iid

Pnf is an inhomogeneous Poisson point process on Rd with intensity nf p¨q, and tXi , Xj u „ f , independent of Pnf . By the mapping theorem (Kingman, 1993), Xi ` rn´1 pPnf ´ Xi q has the

41

Michael P. Leung same distribution as a Poisson point process with intensity nf pp¨ ´ Xi qrn ` Xi qrnd . Thus, by Campbell’s theorem (Kingman, 1993), ż (41) “

´ ¯ ˇ P inf V p||X1 ´ X2 ||, s, W12 q ą 0 ˇ X1 , X2 “ y nf ppy ´ X1 qrn ` X1 q rnd dy s Rd ´ ˇ ¯ ` P inf V prn´1 ||Xi ´ Xj ||, s, Wij q ą 0 ˇ Xi żs ´ ¯ ˇ P inf V p||X1 ´ X2 ||, s, W12 q ą 0 ˇ X1 , X2 “ y dy ` 1. ď κ sup f pxq x

C.3

s

Rd

Primitive Conditions for (21)

Condition (21) is sufficient for uniform square-integrability of polyadic outcome moments, since Gij ď Mij prn q. The condition is reasonable to impose because M p1q in the limit model is sparse by Assumptions 3 and 5. Hence, we should expect Mij prn q “ Op pn´1 q.25 Thus the summands of (21) asymptotically contribute Op pn´K`1 q, so we would expect the sum to be finite on average. As we prove below, a simple general sufficient condition for general M prn q and K is ˇ ‰ “ (42) sup sup Nn E Mij prn q ˇ Xi “ x, αi “ α ă 8. Nn PN x,α

This holds automatically when the joint surplus contains a penalty of the form (3), since G is then a subnetwork of Rprn q, the network where Rij prn q “ 1t||Xi ´ Xj || ď rn u. To take another example, consider the model in Example 1 with hpαi , αj ; θ1 q “ θ1 and ρpXi , Xj q “ ´rn´1 ||Xi ´ Xj ||. Suppose the random-utility shock has polynomial tails: for sufficiently large z that Ppζij ą zq ď cz ´d for some constant c and z ě 0. Then for n sufficiently large, ” ` ˇ “ ‰ ˘ ˇ ı ¯ ´d ˇ Xi . nE Mij prn q ˇ Xi , αi ď c E n ´rn´1 ||Xi ´ Xj || ´ θ1 ´ θ2 ´ θ3 ∆ Hence, (42) holds under the rate condition on rn in Assumption 6. We next show that (42) implies (21) for “ 1. We consider case K “ 2 and ř the ř discuss how the argument generalizes. We need to show that ENn rp jPNn kPNn Mik Mkj q3 s is uniformly bounded over Nn . By the AM-GM inequality, the term in the expectation is bounded above by »˜ ¸3 ˜ ¸3 ˜ ¸3 fi ÿ ÿ ÿ ÿ 1 Mij Mik Mil – Mja ` Mkb ` Mlc fl . 3 a c b j,k,l The others being similar, consider the term 1 ÿ ÿ Mij Mik Mil Mja1 Mja2 Mja3 . 3 j,k,l a ,a ,a 1

2

(43)

3

25

More precisely, this follows from Lemma 7 in the appendix, provided the degree of any node in M prn q under the finite model is uniformly integrable.

42

A Weak Law for Network Moments Consider the part of the sum where all the indices are different (i ‰ j ‰ k ‰ l ‰ a1 ‰ a2 ‰ a3 , then this equals 3N1 n times a count of the number of septuplets that form a subnetwork containing a particular minimally connected subnetwork in M , specifically, the subnetwork where i and j are connected, i and k are connected, etc.26 In parts of the sum where any indices coincide, then this may result in a count for a subnetwork that is not minimally connected. Such a count can always be bounded above by an appropriate count for a minimally connected subnetwork. For example, if a1 “ k but all other indices differ, then the corresponding term of (43) is given by 1 ÿ ÿ 1 ÿ ÿ Mij Mik Mil Mjk Mja2 Mja3 ď Mij Mik Mil Mja2 Mja3 . 3 j,k,l a ,a 3 j,k,l a ,a 2

3

2

3

The left-hand side counts the occurrence of a subnetwork that is not minimally connected (it has a cycle between i, j, k), unlike the right-hand side. Thus, consider the part of the sum (43) where all the indices are different, as the argument for the other terms is the same once we bound them above by an appropriate minimally-connected-subnetwork count. Then the expectation of the summand equals E rMij Mik Mil Mja1 Mja2 Mja3 s ď ErMij s sup ErMik | Xk , αk s sup ErMil | Xl , αl s Xk ,αk

Xl ,αl

sup ErMja1 | Xa1 , αa1 s sup ErMja2 | Xa2 , αa2 s sup ErMja3 | Xa3 , αa3 s

Xa1 ,αa1

Xa2 ,αa2

Xa3 ,αa3

¸6

˜ ď

sup ErMij | Xi , αi s

.

Xi ,αi

The first inequality above is attained by iteratively conditioning on and taking the supremum over attributes of leaf nodes (nodes with only one connection), i.e. nodes k, l, a1 , a2 , a3 , and then using the assumption that tpXi , αi qu is i.i.d. The previous derivation yields » fi ˜ ¸6 ÿ 1 1 Mij Mik Mil Mja1 Mja2 Mja3 fl ď sup Nn ErMij | Xi , αi s , ENn – 3 j‰k‰l‰a ‰a ‰a 3 Xi ,αi 1

2

3

which is finite by Assumption 42. This proves the claim for K “ 2. For general K, we can follow the same line of reasoning, bounding ¸3

˜ ÿ

ÿ ¨¨¨

i2 PNn

Mi1 ,i2 prn q ¨ ¨ ¨ ¨ ¨ MiK´1 ,iK prn q

iK PNn

from above by the sum of minimally-connected-subnetwork counts and showing that these are Op p1q using (42). 26

By minimally connected we mean the removal of any link results in a disconnected subnetwork.

43

Michael P. Leung

D

Branching Process Lemmas

The proof of Theorem 4 relies on the following result, which shows that under Assumption 3, Dw,rn p1q-components are small, the latter defined in §B.1. Lemma 3. Let Fn “ tRd ÞÑ tnf pxqu; x P supppf qu. Under Assumption 10, for any x P Rd , there exists a finite random variable C and n ˜ P N such that |CpX, x ` rn´1 pPτn ´ xq Y txu, Dx,rn p1q, 1q| ă C, for any n ą n ˜ , τn P tnf p¨qu Y Fn , and X P x ` rn´1 pPτn ´ xq Y txu, under some coupling of tPτn ; τn P tnf p¨qu Y Fn , n ą n ˜ u and C. This is proven using a branching process argument that makes precise the intuition outlined in §2.2. The formal argument stochastically bounds Dw,rn p1q-component sizes by the total offspring of a certain branching process constructed following Meester and Roy (1996), Chapter 6. We then show subcriticality of the latter process using results for multi-type branching processes due to Bollobás et al. (2007). Lemma 3 bounds component sizes of Dw,rn p1q under a set of models with position sets given by X P tPτn ; τn P tnf p¨qu Y Fn , n ą n ˜ u. For τn P Fn , we obtain node positions distributed according to Pnf pxq for x P supppf q, which is of interest, since this corresponds to the limit model (11). For τn “ nf p¨q, the process Pτn is of interest due to the following d relationship with Xn . By Proposition 1.5 of Penrose (2003), x`rn´1 pXn ´xq “ x`rn´1 pPnf Y tx, Y u ´ xq, where Y is a random vector with density f . Lemma 3 bounds component sizes when X “ x ` rn´1 pPnf Y txu ´ xq, which drops Y from the process. However, it turns out that dropping Y is innocuous, since x ` rn´1 pPnf Y txu ´ xq closely approximates x ` rn´1 pPnf Y tx, Y u ´ xq on any finite ball centered at x (see Lemma 6). The next lemma expands on Lemma 3, showing that under the finite model, the size of the largest component of Dw,rn p1q grows at most logarithmically in n. This is used to assess the computational complexity of Algorithm 1. Lemma 4. Define µw,rn px, αq “ κ supz f pzq Suppose for some a ą 0 and n ˜ P N, inf

`ş

ş Rd

inf ż

nq wPsupppf q,ną˜ n px,αqPsupppXi ,αw,r i

sup

sup

wPsupppf q,ną˜ n xPsupppf q Rd

Rdα

˘1{2 Λw,rn px, y, α, α1 q2 dΦw,rn pα1 | yq dy.

µw,rn px, αq ą 0,

eaµw,rn px,αq dΦw,rn pα | xq ă 8.

(44) (45)

Then under Assumption 10, for any X P Xn , max

´1 Y PX`rn pXn ´Xq

|CpY, X ` rn´1 pXn ´ Xq, DX,rn , 1q| “ Op plog nq.

The proof builds on that of Lemma 3, which, for any x P supppf q and n ą n ˜ , constructs ř8 a multi-type branching process tBm px, nqu8 such that the total offspring m“1 m“1 Bm px, nq

44

A Weak Law for Network Moments stochastically dominates |CpY, x ` rn´1 pXn ´ xq, Dx,rn , 1q| for any Y P x ` rn´1 pXn ´ xq. In order to obtain a logarithmic bound on the maximum component size, it makes use of the following result showing that the distribution of the total offspring produced by this process has exponential tails. Lemma 5. Under the of Lemma 4, there exists b ą 1 and n ˜ P N such that “ assumptions ‰ ř supxPsupppf q supną˜n E b^t 8 B px, nqu ă 8. m m“1 The proof of this result is based on an argument due to Turova (2012).

E

Weak Law for Stabilizing Functionals

We prove a generalization of (27) that allows attributes to be correlated with node positions. This result extends Theorem 2.1 of Penrose and Yukich (2003) (PY) to allow ξ to depend on a collection of random vectors W that are non-identically distributed conditional on X and relaxes their translation-invariance assumption on ξ. These differences require nontrivial modifications of several steps of PY’s argument, most notably the definition of strong stability and the proof of convergence of means. See §A for an informal discussion of some of these differences. We maintain the coupling construction in §B.1. Recall that Fn “ tRd ÞÑ tnf pxqu; x P supppf qu. For any x P Rd , n P N, and r P R` , let ξpx, X , W ˝ px,r q be a real-valued functional, defined for all X Ď Rd locally finite.27 We omit the argument of W ˝ px,r with the understanding that it equals the second argument of ξ. Additionally, if x R X , we let ξpx, X , W ˝ px,r q ” ξpx, X Y txu, W ˝ px,r q. Let rn “ p nκ qd . We prove a law of large numbers for functionals ξ that satisfy the following stability property. Definition 1. The functional ξ is W -strongly stabilizing if for any X drawn from f , there exists n ˜ P N and a positive random variable R such that for all τn P tnf p¨qu Y Fn , n ą n ˜, and R ě R, ` ˘ ` ˘ ξ X, X ` rn´1 pPτn ´ Xq, W ˝ pX,rn “ ξ X, pX ` rn´1 pPτn ´ Xqq X BpX, Rq, W ˝ pX,rn with probability one. Theorem 6. Suppose ξ is W -strongly stabilizing; ξpx, X , ¨q is continuous for any x P Rd and X Ď Rd locally finite; and for X P Xn we have “ ‰ sup E ξpX, X ` rn´1 pXn ´ Xq, W ˝ pX,rn qp ă 8 (46) nPN

for some p ą 2. Then under Assumptions 9(b) and (c), as n Ñ 8, ż “ ` ˘‰ 1 ÿ L2 ´1 ξpXi , Xi ` rn pXn ´ Xi q, W ˝ pXi ,rn q ÝÑ E ξ x, Pκf pxq , W ˝ px,0 f pxq dx. |Nn | iPN n

(47) 27

A set X Ď Rd is locally finite if |S X X | ă 8 for any bounded S Ď Rd .

45

Michael P. Leung The proof of this theorem requires the following two lemmas. Lemma 6 (Coupling). For any R P R` , `` ˘ ` ˘ ˘ lim P X ` rn´1 pPnf Y tY u ´ Xq X BpX, Rq “ X ` rn´1 pPnf ´ Xq X BpX, Rq “ 1, nÑ8 ´` ¯ ˘ n lim P X ` rn´1 pPnf ´ Xq X BpX, Rq “ Pκf X BpX, Rq “ 1. pXq nÑ8

Proof. The first equation follows because Y R BpX, Rrn q with probability approaching one. The second equation is Lemma 3.1 of PY.

Lemma 7 (Convergence of Means). Under the assumptions of Theorem 6, ff ż « ˘ “ ` ˘‰ 1 ÿ ` ξ X, X ` rn´1 pXn ´ Xq, W ˝ pX,rn “ E ξ x, Pκf pxq , W ˝ px,0 f pxq dx. lim E nÑ8 |Xn | XPX n

Proof. We first show that ` d ξpX, X ` rn´1 pPnf Y tY u ´ Xq, W ˝ pX,rn q ÝÑ ξ X, Pκf pXq , W ˝ pX,0 q.

(48)

For any R ą 0, define the event n EX pRq “ pX ` rn´1 pPnf Y tY u ´ Xqq X BpX, Rq

) n “ pX ` rn´1 pPnf ´ Xqq X BpX, Rq “ Pκf X BpX, Rq . pXq n pRq Y tR ą Ru with n ą n ˜ , where R and n ˜ are For states of the world in the event EX defined in Definition 1, by W -strong stability,

ξpX, X `rn´1 pPnf YtY u´Xq, W ˝pX,rn q “ ξpX, pX `rn´1 pPnf ´XqqXBpX, Rq, W ˝pX,rn q n n “ ξpX, Pκf pXq X BpX, Rq, W ˝ pX,rn q “ ξpX, Pκf pXq , W ˝ pX,rn q.

Therefore, for any ą 0, ´ˇ ¯ ˇ n ˇ P ˇξpX, X ` rn´1 pPnf Y tY u ´ Xqq ´ ξpX, Pκf , W ˝ p q ą X,rn pXq n ď PpEX pRqc q ` PpR ą Rq, (49) n pRqc is the complement of E n pRq. Since R is a.s. finite by W -strong stability, we where EX X can take R sufficiently large such that PpR ą Rq ă 2ε . For such an R, by Lemma 6, we can n pRqc q ă ε . find n ą n ˜ such that PpEX 2 By Assumption 9(b) and (c) and Theorem 1.5.4 of van der Vaart and Wellner (1996), W ˝ pX,rn ù W ˝ pX,0 , where ù denotes weak convergence of the stochastic process.28

Here we view W as a random function that for any fixed X Ď Rd maps x, y P X to an attribute vector Wxy . 28

46

A Weak Law for Network Moments d

n Then by continuity of ξ in W and the fact that Pκf pXq “ Pκf pXq , we have d

n ξpX, Pκf pXq , W ˝ pX,rn q ÝÑ ξpX, Pκf pXq , W ˝ pX,0 q.

(50)

Combining (49) and (50) yields (48) by Slutsky’s theorem. By uniform square-integrability (46) and the Vitali convergence theorem, (48) implies “ ` ˘‰ “ ` ˘‰ E ξ X, X ` rn´1 pPnf Y tY u ´ Xq, W ˝ pX,rn Ñ E ξ X, Pκf pXq , W ˝ pX,0 . The left-hand side of this expression equals « ff ˘ 1 ÿ ` E ξ X, X ` rn´1 pXn ´ Xq, W ˝ pX,rn . |Xn | XPX n

To prove Theorem 6, we need a different coupling construction to establish concentration of the variance. Let X and Y be independently drawn from f , and let P2 be a Poisson process with intensity two on Rd ˆ r0, 8q, with P2 K K X, Y . Independently mark points of P2 “blue” with probability 0.5 and “gold” otherwise. Let P1 be the blue points and Q1 the gold points. Then by thinning, these are independent Poisson processes with unit intensity. The remainder of this construction follows PY. • Finite Model. Derive Pnf from P1 in the same manner as the coupling construction in §B.1, and similarly derive Qnf from Q1 . • X’s Limit Model. Let FX be the half-space of points in Rd closer to X than to Y and X Y FY its complement. Let Pnf pXq be the restriction of P1 to FX ˆr0, nf pXqs and Qnf pXq n,` X Y the restriction of Q1 to FY ˆ r0, nf pXqs. Let Pκf pXq be the image of Pnf pXq Y Qnf pXq d

n,` under pw, tq ÞÑ X ` rn´1 pw ´ Xq for w P Rd and t P R` . Then Pκf pXq “ Pκf pXq for all n. Y • Y ’s Limit Model. Analogously, let Pnf pY q be the restriction of P1 to FY ˆ r0, nf pY qs n,` and QX nf pY q the restriction of Q1 to FX ˆ r0, nf pY qs. Let Pκf pY q be the image of Y X ´1 Pnf pY q Y Qnf pY q under pw, tq ÞÑ Y ` rn pw ´ Y q. Then n,` n,` Pκf K Pκf pXq K pY q

(51)

for any n by the spatial independence property of Poisson processes. Construct attributes in the same manner as §B.1, with the only difference that we replace mentions of P1 with P2 . By construction, ´ ´ ¯¯ ´ ´ ¯¯ n,` n,` W pX,0 Pκf K K W p P . (52) Y,0 pXq κf pY q

47

Michael P. Leung Proof of Theorem 6. Given Lemma 7, it remains to show that the variance converges to zero. Using (49), W -strong stability, and Lemma 6, ˇ p ˇ ˇξpX, X ` rn´1 pPnf Y tY u ´ Xq, W ˝ pX,rn q ´ ξpX, P n,` , W ˝ pX,rn qˇ ÝÑ 0, κf pXq ˇ ˇ p ˇξpY, Y ` rn´1 pPnf Y tXu ´ Y q, W ˝ pY,rn q ´ ξpY, P n,` , W ˝ pY,rn qˇ ÝÑ 0. κf pY q Also, (50) yields d

n,` κ,` ξp`, Pκf p`q , W ˝ p`,rn q ÝÑ ξp`, Pκf p`q , W ˝ p`,0 q,

(53)

for ` P tX, Y u. By uniform-square integrability and the Vitali convergence theorem, “ ‰ E ξpX, X ` rn´1 pPnf Y tY u ´ Xq, W ˝ pX,rn q ξpY, Y ` rn´1 pPnf Y tXu ´ Y q, W ˝ pY,rn q ” ı κ,` κ,` Ñ E ξpX, Pκf , W ˝ p qξpY, P , W ˝ p q X,0 Y,0 . (54) pXq κf pY q κ,` κ,` Now, by (51) and (52), ξpX, Pκf K ξpY, Pκf pXq , W ˝ pX,0 q K pY q , W ˝ pY,0 q. Thus, the 2 right-hand side of (54) equals µ , where µ denotes the right-hand side of (47). Finally, by the coupling construction,

»˜ E–

1 |Xn | XPX

¸2 fi ξpX, X ` rn´1 pXn ´ Xq, W ˝ pX,rn q fl

ÿ

n

“ ‰ “ E |Xn |´1 ξpX, X ` rn´1 pPnf Y tY u ´ Xq, W ˝ pX,rn q2 “ ` E p1 ´ |Xn |´1 qξpX, X ` rn´1 pPnf Y tY u ´ Xq, W ˝ pX,rn q ‰ ˆξpY, Y ` rn´1 pPnf Y tXu ´ Y q, W ˝ pY,rn q . Since |X1n | “ Nn1`2 is uniformly bounded and converges in probability to zero, the right-hand side tends to µ2 by the arguments above, and hence, the variance converges to zero.

F

Proofs

Proof of Lemma 1. Let C be a strategic neighborhood. We show GpXC , WC , rq “ tGC : G P GpX , W, rq. Consider any i, j P C. There are two cases to consider. First, suppose Gij is a robust potential link. Then its pairwise stability only depends on pXi , Xj , Wij q. Second, suppose Gij is a nonrobust potential link. Then by local externalities, the pairwise stability of Gij only depends links formed by i and j and the attributes and positions of those connected to i or j. By the first case, the set of nodes connected to either i or j through a robust link is contained in C. For (b), by the definition of strategic neighborhoods, the set of nodes connected to either i or j through a nonrobust link is also necessarily contained in C. Since this argument is true for all pairs i, j P C, it follows that the pairwise stability of Gij only depends on the attributes and positions of nodes in C.

48

A Weak Law for Network Moments Proof of Theorem 4. We apply Theorem 6. Since ψ is uniformly square-integrable by (35), it remains to verify W -strong stability (Definition 1). For ease of notation, let X “ w ` rn´1 pPnf Y twu ´ wq and Ji pX , rn q “ Ji pX , W ˝ pw,rn , 1q (see definition (31)) for w P Rd . Also let Xi P w ` rn´1 pPnf ´ wq be node i’s position. We claim that there exists n ˜ P N and a positive random variable R such that for any ˜ such that n ą n ˜ and w P Rd , we have XJi pw`rn´1 pPnf ´wq,rn q Ď BpXi , Rq. First, choose n rn˜ ă mintt1 , t2 u defined in Assumptions 10 and 11. We have » sup sup E – n x,wPRd ną˜

ÿ Xj PX

fi ˇ ˇ Mijw,rn p1q ˇˇ Xi “ xfl

ż ď sup sup n Rd x,wPRd ną˜

” ı ˇ E Mijw,rn p1q ˇ Xi “ x, Xj “ y nrnd f ppy ´ wqrn ` wq dy ` 1 ż ď sup sup κf pwq Rd

n w,xPRd ną˜

ı ” ˇ E Mijw,rn p1q ˇ Xi “ x, Xj “ y dy ` 1,

which is finite by Assumptions 10 and 11. Hence, the M w,rn p1q-degree of any node i under the limit model X is bounded in probability over n ą n ˜ and w P Rd , from which it follows w,r that the K-neighborhood size of any node i in M n p1q under this limit model is also bounded in probability. Second, by Lemma 3, for any n ą n ˜ we have |CpXi , X , Dw,rn p1q, 1q| ă C. This and the fact that K-neighborhoods are bounded in probability imply that for any w P supppf q, there exists a finite random variable J such that |Ji pX , rn q| ă J for all n ą n ˜ . The existence of R follows. By a virtually identical argument, the same claims also hold for X “ w ` rn´1 pPnf pxq Y twu ´ wq for any x P supppf q. Since i’s node statistic only depends on its arguments through the positions and attributes of nodes in Ji by Lemma 2, the proof is complete.

Proof of Theorem 2. Let A Ď Nn with |A| “ K. We first prove that the theorem holds ˜ if we replace UA with UˆA “ tu P R|YA | : ||u|| ď 1u by verifying the conditions of Theorem 2.1 in Beresteanu et al. (2011) (henceforth BMM). Then the result follows from Theorem D.1 of Beresteanu et al. (2011). Assumptions 7 and 8 imply BMM’s Assumption 2.1, and their Assumptions 2.2-2.3 hold by construction of Qθ . Turning to their Assumption 2.4, for yA P YA , define selection mechanisms for polyadic outcomes of degree K (polyadic SMs), denoted σθ0 ,Nn pyA | XA , WA q, to be conditional distributions with support Sθ0 pA, rn q. The selection mechanism λ in Assumption 2 induces one such distribution: » fi ˇ ÿ ˇ ` ˘ σθ˚0 ,Nn py | XA , WA q “ ENn – PNn pλ Xn , W, rn “ g | Xn , W q ˇˇ XA , WA fl , (55) gPGA py;Xn ,W q

` ˘ where GA py; Xn , W q is the set of networks g such that gA , SA pg, Xn , W q “ y. The support

49

Michael P. Leung of this distribution is Sθ0 pA, rn q, as claimed, by Assumption 2. This establishes BMM’s Assumption 2.4. We next verify their Assumption 2.5, that there exists λ satisfying Assumption 2 such that (14) holds if and only if there exists a polyadic SM σθ0 ,Nn such that for any yA P YA , ˇ ‰ “ PNn pYA prn q “ yA | ZA q “ ENn σθ0 ,Nn pyA | XA , WA q ˇ ZA . (56) The “only if” direction holds by construction (55). To see the “if” direction, suppose that there is no σθ0 ,Nn for which (56) holds. Then with positive probability, there exists XA , WA such that the true polyadic SM, » fi ˇ ÿ ˇ ENn – PNn pG “ g | Xn , W q ˇˇ XA , WA fl , gPGA py;Xn ,W q

places positive probability on an unstable polyadic outcome. Hence, with positive positive probability there exists a set of Xn , W such that the true selection mechanism PNn pG “ g | Xn , W q places positive probability on some network g such that for some i, j P A, either • V prn´1 ||Xi ´ Xj ||, Sij pg, Xn , W q, Wij ; θ0 q ď 0, but gij “ 1, or • V prn´1 ||Xi ´ Xj ||, Sij pg, Xn , W q, Wij ; θ0 q ą 0, but gij “ 0. Such a network cannot be pairwise stable, which implies that the support of the true selection mechanism is not contained in GpXn , W, rn q. Thus, Assumption 2 fails to hold, which completes the proof.

Proof of Proposition 1. Part (a). Let A “ ti1 , . . . , iK u Ď Nn . Define ÿ ψ ˚ pXi1 , G, Xn , W q “ u1 IpYA prn qqhpZA q, i2 ,...,iK PNn

ř so the left-hand side of (22) equals |N1n | iPNn ψ ˚ pXi , G, Xn , W q. Because by definition u1 IpYA q is nonzero only when YA “ pGA , SA q is such that GA is a connected subnetwork, it follows that ψ ˚ satisfies K-locality. We next verify that ψ ˚ satisfies (9). First note that u1 IpYA prn qqhpZA q is uniformly bounded. Then because u1 IpYA q is nonzero only for polyadic outcomes corresponding to connected subnetworks, and G is a subnetwork of M prn q, there exists a constant C such that »˜ ¸2` fi Nÿ Nÿ n `2 n `2 “ ˚ ‰ fl . E ψ pXi1 , G, Xn , W q2` ď CE – ... Mi1 ,i2 prn q ¨ ¨ ¨ ¨ ¨ MiK´1 ,iK prn q i2 “1

iK “1

Hence, uniform square-integrability follows from (21). Therefore, (22) follows from Theorem 1. The expression for the limit uses Campbell’s theorem (see e.g. Kingman, 1993).

50

A Weak Law for Network Moments ˜n “ Nn ` 2. The expectation of the left-hand side of (23) equals (b). Let N ı ” Part ˜n 1 řN ξpX , X , W q for E N˜ i n i“1 n

ξpXi , Xn , W q “

˜n N ÿ

˜n N ÿ

¨¨¨ i2 “1

sup

u1 q hpZA q.

iK “1 qPQθ pA,rn q

This converges to the right-hand side of (23) by Lemma 7. (The regularity conditions are verified using the same arguments for (22).) It thus remains to show that the distance between the left-hand side of (23) and its expectation tends to zero. T.he left-hand side of (23) has the structure of a U-statistic. We will then first show that the left-hand side of (23) is well approximated by its Hajek projection and conclude by showing that its Hajek projection is close to the desired expectation. Let Un equal the ˜n and Hn be its (conditional on N ˜n ) Hajek projection, where left-hand side of (23) times N we project ξpXi , Xn , W q on Zi , the observed subvector of pXi , αi q. Define Hn “

˜n N ÿ

ENn rUn ´ ENn rUn s | Zi s ` ErUn s.

i“1

Note that the last expectation does not condition on Nn . Then by the Hajek projection lemma, „´ ¯2  1 ´1{2 ´1{2 ˜ ˜ pVarNn pUn q ´ VarNn pHn qq ENn Nn Un ´ Nn Hn “ ˜n N ˙ ˆ 1 VarNn pHn q “ . (57) VarNn pUn q 1 ´ ˜n VarNn pUn q N Since the summands of the left-hand side of (23) are uniformly bounded, by the arguments in, e.g., Theorem 1, section 3.7.2 of Lee (1990), ˆ ˜ ˙´2 Nn

ˆ ˜ ˙´2 Nn p VarNn pHn q{ VarNn pUn q ÝÑ 1. K K ‰ “ Then for Un pZA q “ ENn supq u1 qhpZA q | ZA ,

(58)

˜ ˜n N N ÿ 1 1 ÿn VarNn pUn q “ ¨¨¨ VarNn pUn pZA qq ˜n ˜n N N i1 “1 iK “1

`

K ÿ ` ˘ 2 ÿÿ CovNn Un pZti1 ,...,iK u q, Un pZtj1 ,...,jK u q , ˜n N

c“1

where the double sum in the second term is over index sets ti1 , . . . , iK u ˆ tj1 , . . . , jK u that have c elements in common. Because u1 IpYA q is nonzero only for polyadic outcomes corresponding to connected subnetworks, and because G is a subnetwork of M prn q, we can

51

Michael P. Leung bound the right-hand side by a constant times ‰ “ ˜nK´1 E ˜ ENn rMi ,i ¨ . . . ¨ Mi ,i | ZA s2 N 1 2 K´1 K Nn „ ˆ K ÿ 2K´c´1 ˜ ` Nn ENn ENn rMi1 ,i2 ¨ . . . ¨ MiK´1 ,iK | Zti1 ,...,iK u s c“1

 ˆ ENn rMj1 ,j2 ¨ . . . ¨ MjK´1 ,jK | Ztj1 ,...,jK u s ˇ ` ˇENn rMi

¨ . . . ¨ MiK´1 ,iK sENn rMj1 ,j2

1 ,i2

˙ ˇ ˇ ¨ . . . ¨ MjK´1 ,jK s ,

where ti1 , . . . , iK u ˆ tj1 , . . . , jK u are sets of node indices with c elements in common. By Jensen’s inequality and the Cauchy-Schwarz inequality, the right-hand side is bounded above by »¨ ENn –˝

i2 “1

˛2 fi

˜n N ÿ

˜n N ÿ

Mi1 ,i2 prn q ¨ . . . ¨ MiK´1 ,iK prn q‚ fl

¨¨¨ iK “1

`2

K ÿ

»¨ ˜ 1´c ENn –˝ N n

c“1

˜n N ÿ

˜n N ÿ

¨¨¨

i2 “1

˛2 fi Mi1 ,i2 prn q ¨ . . . ¨ MiK´1 ,iK prn q‚ fl ,

iK “1 p

which is Op p1q by (21). This and (58) establish (57) ÝÑ 0. Then by (21) and the Vitali convergence theorem, Er(57)s Ñ 0. p It remains to show that N˜1 Hn ´ Er N˜1 Un s ÝÑ 0. It is clear than the mean of the n n left-hand side is zero. Its variance equals ˆ ˙ „ ı ” 1 1 2 pξpXi , Xn , W qq Var H ďE E ˜n n ˜ n Nn N N » ˛2 fifi »¨ ˜n ˜n N N ÿ ÿ 1 ď E– Mi1 ,i2 prn q ¨ . . . ¨ MiK´1 ,iK prn q‚ flfl . ¨¨¨ E –˝ ˜ n Nn N i2 “1

iK “1

The last line is op p1q by (21).

Proof of Theorem 3. We only need to formalize the argument made in the second paragraph of the proof sketch. We do so for the case in which attributes are correlated with positions, as in §B.1, which entails replacing the assumptions of this theorem with those of Lemma 4. (The assumptions of this theorem imply those of the lemma.) Let C be the largest Dprn q-component and C ` its associated strategic neighborhood. ` The number of!possible networks ) ΓpC q satisfying the properties in step 2 of Algorithm 1 ř ` is at most exp i,jPC Dij prn q , since C only adds nodes to C that form robust potential

52

A Weak Law for Network Moments links with all nodes in C. We study the tails of the distribution of this quantity: ˜ P exp

+

# ÿ

Dij prn q

¸ ąn

i,jPC

˜ ďP

ÿ ÿ iPNn jPC

¸ ˇ ˇ Dij prn q ą log n ˇˇ |C| ă βn ` Pp|C| ą βn q,

where ą 0, and βn “ Oplog nq is given in the proof of Lemma 4. Label the two elements on the right-hand side rIs and rIIs, respectively. By Lemma 4, rIIs “ op1q. Also, 1 rIs ď E log n

« ÿ ÿ iPNn jPC

ff ˇ ˇ Dij prn q ˇˇ |C| ă βn

ˇ “ ‰ 1 E pNn ` 2q|C| E rDij prn q | Nn , C, j P Cs ˇ |C| ă βn log n ˇ ‰ “ βn ErpNn ` 2qDij prn qs βn E pNn ` 2qDij prn q ˇ |C| ă βn ď , (59) ď log n log n Pp|C| ă βn q

ď

where Nn is a Poisson random variable with intensity n that is independent of pXi , Xj , Wij q. By Lemma 4, Pp|C| ă βn q “ 1 ` op1q. Furthermore, « ErpNn ` 2qDij prn qs “ E

ff ÿ

Dij prn q

jPNn ,j‰i

ż ď κ sup f pxq x

ErΛpX1 , X2 , α1 , α2 q | X2 “ ys dy ` 1 Rd

by the calculation in §C.2. The latter is finite by (26). Thus, taking Ñ 8, we have that (59) tends to zero, as desired.

Proof of Lemma 3. Fix w P supppf q, and choose n ˜ such that rn˜ ă t1 , where t1 is defined in Assumption 10. Throughout the proof, the set of positions is realized according to X “ w ` rn´1 pPτn ´ wq Y twu. We first focus on the case τn “ nf p¨q. For x P X , we can explore Cpx, w `rn´1 pPnf ´wqY twu, Dw,rn , 1q using a breadth-first search on Dw,rn p1q starting at x. That is, we branch out to x’s 1-neighborhood (“offspring”), recording the number of such neighbors as B1˚ pnq. Then we branch out to the 1-neighborhood of each neighbor, not including nodes already explored, to obtain her 2-neighborhood, ř recording the total over all neighbors as B2˚ pnq. We repeat ˚ pnq ` 1 “ |Cpx, w ` r ´1 pP ´ wq Y twu, D w,rn , 1q|. this process indefinitely. Then 8 τn n m“1 B řm ˚ pnq is stochastically dominated by a certain We will show that for all n ą n ˜, 8 B m“1 m branching process and then show that the latter process is subcritical. This is a common argument used to estimate the size of giant components of random graphs. The branchingprocess construction in step 1 below follows Meester and Roy (1996), Chapter 6.

53

Michael P. Leung Step 1. Given an initial node of type px, αq, where x represents her position and α her attribute, let B1 pnq be the realization of an independent inhomogeneous Poisson process on Rd`dα with intensity ı ” ˇ w,rn ϕ1 py, α1 q “ nrnd f ppy ´ xqrn ` xq E Dij p1q ˇ Xi “ x, Xj “ y, αiw,rn “ α, αjw,rn “ α1 . (60) 1 This represents the set of nodes py, α q linked to a node with type px, αq in the network d Dw,rn p1q. By the mapping theorem (Kingman, 1993), B1 pnq “ B1˚ pnq, and by Assumption 10, |B1 pnq| ă 8 a.s. Thus, we can label the points of B1 pnq as py1 , α11 q, . . . py` , α`1 q. Next, given px, αq and B1 pnq, for node py1 , α11 q, let B21 pnq be the realization of an independent inhomogeneous Poisson process on Rd`dα with intensity ˇ “ w,rn w,rn p1qq ˇ Xi “ x, Xj “ y1 , p1qp1 ´ Dik ϕ2 pz, α2 q “ nrnd f ppz ´ xqrn ` xq E Djk ‰ Xk “ z, αiw,rn “ α, αjw,rn “ α11 , αkw,rn “ α2 . (61) This generates the set of nodes pz, α2 q linked to a node with type py1 , α11 q, excluding those w,rn p1q). Likewise, given px, αq, B1 pnq, linked to px, αq (hence the appearance of 1 ´ Dik and B21 pnq, for a node of type py2 , α21 q, let B22 pnq be the realization of an independent inhomogeneous Poisson process on Rd`dα with intensity ϕ3 pz, α2 q “ nrnd f ppz ´ xqrn ` xq ˇ “ w,rn n n p1qq ˇ Xi “ x, Xj1 “ y1 , Xj2 “ y2 , p1qqp1 ´ Djw,r p1qp1 ´ Dik ˆ E Djw,r 1k 2k ‰ w,rn w,rn 2 1 1 n “ α . (62) “ α , α “ α , α Xk “ z, αiw,rn “ α, αjw,r 1 2 j k 1 2 This generates the set of nodes pz, α3 q linked to a node with type py2 , αj1 2 q, excluding those linked to px, αq and py1 , α11 q. We can likewise generate B23 pnq, . . . , B2` pnq and define B2 pnq “ ř` d ˚ p“1 B2p pnq. Then B2 pnq “ B2 pnq. Similarly construct B3 pnq, B4 pnq, . . . in a completely d

˚ ˚ analogous manner. Then pB ř81 pnq, . . . , Bm pnqq “ pB1 pnq, . . . , Bm pnqq for any m. Step 2. We bound m“1 Bm pnq by the total offspring count of another branching process. Let f¯ “ supx f pxq. Notice that

(60) ď κf¯Λw,rn px, y, α, α1 q, (61) ď κf¯Λw,r py1 , z, α1 , α2 q, n

1

(62) ď κf¯Λw,rn py2 , z, α21 , α2 q. Hence we can couple B1 pnq to B1 , an inhomogeneous Poisson process on Rd`dα with intensity κf¯Λw,rn px, y, α, α1 q, such that B1 stochastically dominates B1 pnq. We can likewise couple B2p pnq to a process B2p with intensity κf¯Λw,rn pyp , z, αp1 , α2 q for p “ 1, . . . , ` such that B2 ” ř 8 stochastically dominates B2˚ pnq. Likewise construct B3 , B4 , . . . , so ultimately we p“1 B2p ř ř8 ř8 ˚ have that 8 m“1 Bm stochastically dominates m“1 Bm pnq and therefore also m“1 Bm pnq. Step 3. We show that tBm u is subcritical for all n ą n ˜ . Note that tBm u is a multi-type Galton-Walton branching process starting at px, αq, where the offspring distribution of a node py, α1 q is an inhomogeneous Poisson point process on Rd`dα with intensity κf¯Λw,rn px, y, α, α1 q.

54

A Weak Law for Network Moments Let ρpx, αq be the probability that the process tBm u survives indefinitely, conditional on the type of the node px, αq. Let H be the set of functions h : Rd`dα Ñ R` such that ş starting 2 supxPRd r hpx, αq dΦw,rn pα | xqs1{2 ď 1. Define the operator T : H Ñ R` such that ż ż κf¯Λw,rn px, y, α, α1 qhpy, α1 q dΦw,rn pα1 | yq dy. T hpx, αq “ Rd

Rdα

ş Lastly, let || ¨ || be the norm on H such that ||h|| “ supxPRd r hpx, αq2 dΦw,rn pα | xqs1{2 . We want to show that under Assumption 10, ρpx, αq “ 0 a.s. The argument is a slight modification of Lemma 5.11 of Bollobás et al. (2007). Consider the functional equation h “ 1 ´ e´T h .

(63)

By Lemma 5.6 of Bollobás et al. (2007), ρp¨, ¨q is a solution of this equation.29 Let hp¨, ¨q be any solution of (63), and suppose, to get a contradiction, that h ‰ 0 for some set S Ď Rd`dα with positive measure. Then by Lemma 5.8(ii) of Bollobás et al. (2007), T h ě h, with strict inequality on S. Hence, ||T h|| ą ||h||. But by the Cauchy-Schwarz inequality, ||T h|| is bounded above by ¨ ż ¯ ˝ ||h|| sup κf xPRd

˜ż

Rdα

ˆż Rd

Rdα

˛1{2 ˙1{2 ¸2 dΦw,rn pα | xq‚ , Λw,rn px, y, α, α1 q2 dΦw,rn pα1 | yq dy

which contradicts ||T h|| ą ||h|| for all n ą n ˜ by Assumption 10. Thus, any solution of (63) is zero for almost all px, αq, so ρpx, αq “ 0 a.s. This completes the proof for the case τn “ nf p¨q. When τn “ κf pxq for any x P supppf q, we can construct branching processes in the exact same manner, which are still stochastically dominated by the process constructed in step 3, simply because κf pxq ď κf¯.

Proof of Lemma 4. Label the components of DX,rn p1q under X “ X ` rn´1 pXn ´ Xq by C1 , . . . , CNn `2 . We seek to show that Ppmax1ďkďNn `2 |Ck | ą βn q Ñ 0 for some sequence βn “ Oplog nq. We have ˇ ˆ ˙ ˆ ˙ ˇ ˇ P max |Ck | ą βn ď P max |Ck | ą βn ˇ Nn ď n ` PpNn ą nq 1ďkďNn `2

1ďkďNn `2

ď

P pmax1ďkďn`2 |Ck | ą βn q ` PpNn ą nq. 1 ´ PpNn ą nq

(64)

Since the Poisson distribution has exponential tails, PpNn ą nq “ Ope´n q (see e.g. Penrose, 2003, Lemma 1.2). This fact and the union bound yield (64) ď pn ` 2qPp|C1 | ą βn q ` op1q, ş

29

ş

(65)

Bollobás et al. (2007) make the assumption that Rd Rdα dΦw,rn pα | xq dµpxq ă 8, where µ is a measure on Rd . This is violated in our setting, since µ is the Lebesgue measure. However, the Lebesgue measure is σ-finite, and Lemmas 5.6 and 5.8(ii) still hold for σ-finite measures.

55

Michael P. Leung d

where for some Z P X ` rn´1 pPnf Y tX, Y u ´ Xq, we have |C1 | “ |CpZ, rn q| ” |CpZ, X ` rn´1 pPnf Y tX, Y u ´ Xq, DX,rn , 1q|, where X and Y are independent draws from f and Pnf is an inhomogeneous Poisson process on Rd with intensity nf p¨q. This is due to the coupling construction in §B.1. Now let tZ Ü Y u be the event that the two nodes with positions Y and Z are not connected in DX,rn p1q and tZ Ø Y u its complement. Then by the law of total probability, P p|C1 | ą βn q “ P p|CpZ, rn q| ą βn X Z Ü Y q ` P p|CpZ, rn q| ą βn X Z Ø Y q ď P p|CpZ, rn q| ą βn X Z Ü Y q ` ˘ ` P |CpZ, X ` rn´1 pPnf ´ Xq Y tXu, DX,rn , 1q| ą βn ´ 1 . (66) Define the branching process tBm u8 m“1 as in the proof of Lemma 3. A coupling argument identical to the one in that proof establishes that |CpZ, X ` rn´1 pPnf ´ X Y tXuq, DX,rn , 1q| ř8 is stochastically dominated by m“1 Bm . A similar argument also shows that |CpZ, rn q| is stochastically ř dominated by the same sum in all states of the world where Z Ü Y . Therefore, (66) ď 2Pp 8 m“1 Bm ą βn ´ 1q. Combined with (65), this yields ˜ ¸ 8 ÿ (64) ď 2pn ` 2qP Bm ą βn ´ 1 ` op1q m“1

«

#

ď pn ` 2qE b^

8 ÿ

+ff Bm

b´βn ` op1q,

(67)

m“1

for“ anyř b ą 1, ‰by Markov’s inequality. By Lemma 5, there exists b ą 1 such that ˜ for some n ˜ P N.30 Set βn “ c log n E b^t 8 m“1 Bm u is uniformly bounded above over n ą n for some constant c. Then (67) is of asymptotic order pn ` 2qb´c log n “ since b ą 1. This tends to zero for c “

2 log b ,

n`2 , nc log b

establishing the result.

Proof of Lemma 5. Choose n ˜ such that rn˜ ă t1 , where t1 is defined in Assumption 10. Fix w P supppf q and n ą n ˜ . Define the functional T as in the proof of Lemma 3, and consider the functional equation h “ b exp tT ph ´ 1qu ,

(68)

for h : Rd`dα Ñ R` . We will show that under the assumptions of the lemma, there exists b ą 1 such that (68) has a solution h ě 1. The result then follows from Theorem 1.2 of Turova (2012). Our argument follows closely the proof of Theorem 2.5 in Turova (2012). It is enough to construct a function h ě 1 satisfying the conditions of Lemma 2.4 in Turova (2012). For x, y P Rd and α, α2 P Rdα , define ż ż 1 γpx, α; q “ κf¯Λw,rn px, y, α, α1 qpeµw,rn py,α q ´ 1q dΦw,rn pα1 | yq dy. Rd

30

Rdα

Note that Bm depends on n because the intensity of the Poisson process is a function of Λi,rn p¨q.

56

A Weak Law for Network Moments By the Cauchy-Schwarz inequality, ˙1{2 ż ˆż 1 1 2 ¯ γpx, α; q ď κf Λw,rn px, y, α, α q dΦw,rn pα | yq dy Rd Rdα loooooooooooooooooooooooooooooooooomoooooooooooooooooooooooooooooooooon µw,rn px,αq

˙1{2 ´ 1q dΦw,rn pα | yq . (69) ˆ sup pe Rdα yPRd looooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooon ˆż

µw,rn py,α1 q

2

1

Apq

Note that Apq is increasing and continuous on r0, a{4s by dominated convergence and (45). For small, by the Cauchy-Schwarz inequality, ˙1{2 ˆż ( 1 1 1 2 1 . A pq ď sup µw,rn py, α q exp 2µw,rn py, α q dΦw,rn pα | yq yPRd

Rdα

For P r0, a{4s, this is finite by (45). The previous equation yields ˙1{2 ˆż µw,rn py, α1 q2 dΦw,rn pα1 | yq ă c¯ ă 1, lim A1 pq ď sup Ó0

yPRd

Rdα

for some c¯, where the inequalities follow from Assumption 10. Then by the mean value theorem, for sufficiently small, Apq ă c¯, which, by (69), implies γpx, α; q ă c¯µw,rn px, αq.

(70)

Fix b ą 1 and define γ˜ px, α; q “ bT pexp tbµw,rn px, αqu ´ 1q . By (70), for sufficiently small, γ˜ px, α; q “ bγpx, α; bq ď b¯ c bµw,rn px, αq

(71)

Now define the function hb “ bpebt ´ 1q ` 1. To complete the proof, it suffices to show that b exp tT phb ´ 1qu ď hb .

(72)

! ) b exp tT phb ´ 1qu “ b exp bT pebt ´ 1q “ b exp t˜ γ p¨; qu ď b exp tb¯ c btu .

(73)

Using (71),

Suppose δ P p¯ c, 1q and b P p1, δ{¯ cq. Then b exp tb¯ c btu ď b exp tδ btu. Using this fact and (44), there exists b P p1, δ{¯ cq such that for all x P supppf q and α P supppαiw,rn | Xi “ xq, ˆ ˙ b´1 b exp tb¯ c bµw,rn px, αqu ď b exp t bµw,rn px, αqu ´ b “ b pexp t bµw,rn px, αqu ´ 1q ` 1 “ hb px, αq. This and (73) establish (72), which verifies the conditions of Lemma 2.4 in Turova (2012).

57

Michael P. Leung

References Agarwal, N. and W. Diamond, “Latent Indices in Assortative Matching Models,” working paper, 2015. Andrews, D. and X. Shi, “Inference Based on Conditional Moment Inequalities,” Econometrica, 2013, 81 (2), 609–666. and 2015.

, “Inference Based on Many Conditional Moment Inequalities,” working paper,

Bajari, P., H. Hong, and S. Ryan, “Identification and Estimation of a Discrete Game of Complete Information,” Econometrica, 2010, 78 (5), 1529–1568. Barabási, A., Network Science, Cambridge University Press, 2015. Barabási, A. and R. Albert, “Emergence of Scaling in Random Networks,” Science, 1999, 286 (5439), 509–512. Beresteanu, A., I. Molchanov, and F. Molinari, “Sharp Identification Regions in Models with Convex Moment Inequalities,” Econometrica, 2011, 79 (6), 1785–1821. Bisin, A., A. Moro, and G. Topa, “The Empirical Content of Models with Multiple Equilibria in Economies with Social Interactions,” working paper, 2011. Bollobás, B., S. Janson, and O. Riordan, “The Phase Transition in Inhomogeneous Random Graphs,” Random Structures and Algorithms, 2007, 31 (1), 3–122. Boucher, V., “The Estimation of Network Formation Games with Positive Spillovers,” working paper, 2016. and I. Mourifié, “My Friend Far Far Away: Asymptotic Properties of Pairwise Stable Networks,” working paper, 2015. Bramoullé, Y., H. Djebbari, and B. Fortin, “Identification of Peer Effects through Social Networks,” Journal of Econometrics, 2009, 150 (1), 41–55. Brock, W. and S. Durlauf, “Discrete Choice with Social Interactions,” Review of Economic Studies, 2001, 68, 235–260. Carrell, S., B. Sacerdote, and J. West, “From Natural Variation to Optimal Policy? The Importance of Endogenous Peer Group Formation,” Econometrica, 2013, 81 (3), 855–882. Chandrasekhar, A., “Econometrics of Network Formation,” in Y. Bramoullé, A. Galeotti, and B. Rogers, eds., Oxford Handbook on the Econometrics of Networks, 2016. and M. Jackson, “A Network Formation Model Based on Subgraphs,” working paper, 2015.

58

A Weak Law for Network Moments Chernozhukov, V., D. Chetverikov, and K. Kato, “Testing Many Moment Inequalities,” working paper, 2014. , H. Hong, and E. Tamer, “Estimation and Confidence Regions for Parameter Sets in Econometric Models,” Econometrica, 2007, 75 (5), 1243–1284. Christakis, N., J. Fowler, G. Imbens, and K. Kalyanaraman, “An Empirical Model for Strategic Network Formation,” working paper, 2010. Ciliberto, F. and E. Tamer, “Market Structure and Multiple Equilibria in Airline Markets,” Econometrica, 2009, 77 (6), 1791–1828. Comola, M., “The Network Structure of Mutual Support: Evidence from Rural Tanzania,” working paper, 2010. de Jong, Robert M. and Tiemen Woutersen, “Dynamic Time Series Binary Choice,” Econometric Theory, 2011, 27, 673–702. de Paula, A., “Econometric Analysis of Games with Multiple Equilibria,” Annu. Rev. Econ., 2013, 5 (1), 107–131. , S. Richards-Shubik, and E. Tamer, “Identification of Preferences in Network Formation Games,” working paper, 2015. Dzemski, A., “An Empirical Model fo Dyadic Link Formation in a Network with Unobserved Heterogeneity,” working paper, 2014. Epstein, L., H. Kaido, and K. Seo, “Robust Confidence Regions for Incomplete Models,” Econometrica, forthcoming. Fox, J., “Estimating Matching Games with Transfers,” working paper, 2016. Galichon, A. and M. Henry, “Set Identification in Models with Multiple Equilibria,” Review of Economic Studies, 2011, 78 (4), 1264–1298. Gilleskie, D. and Y. Zhang, “Friendship Formation and Smoking Initiation Among Teens,” working paper, 2010. Goldsmith-Pinkham, P. and G. Imbens, “Social networks and the identification of peer effects,” Journal of Business & Economic Statistics, 2013, 31 (3), 253–264. Graham, B., “An Empirical Model of Network Formation: Detecting Homophily when Agents Are Heterogeneous,” working paper, 2014. , “Homophily and Transitivity in Dynamic Network Formation,” working paper, 2016. Hellmann, T., “On the Existence and Uniqueness of Pairwise Stable Networks,” International Journal of Game Theory, 2013, 42 (1), 211–237.

59

Michael P. Leung Hochberg, Y., L. Lindsey, and M. Westerfield, “Partner Selection in Co-investment Networks: Evidence from Venture Capital,” working paper, 2012. Hsieh, C. and L. Lee, “Specification and Estimation of Network Formation and Network Interaction Models with the Exponential Probability Distribution,” working paper, 2013. Jackson, M., Social and Economic Networks, Princeton University Press, 2008. and A. Watts, “The Evolution of Social and Economic Networks,” Journal of Economic Theory, October 2002, 106 (2), 265–295. Jenish, N. and I. Prucha, “On Spatial Processes and Asymptotic Inference under NearEpoch Dependence,” Journal of Econometrics, 2012, 170 (1), 178–190. Karr, A., “Inference for Stationary Random Fields Given Poisson Samples,” Advances in Applied Probability, 1986, pp. 406–422. Kingman, J., Poisson Processes, Oxford University Press, 1993. Lee, A., U-Statistics: Theory and Practice, CRC Press, 1990. Lee, L., “Best Spatial Two-Stage Least Squares Estimators for a Spatial Autoregressive Model with Autoregressive Disturbances,” Econometric Reviews, 2003, 22 (4), 307–335. Leider, S., M. Möbius, T. Rosenblat, and Q. Do, “Directed Altruism and Enforced Reciprocity in Social Networks,” The Quarterly Journal of Economics, 2009, 124 (4), 1815–1851. Leung, M., “Two-Step Estimation of Network-Formation Models with Incomplete Information,” Journal of Econometrics, 2015, 188 (1), 182–195. McPherson, M., L. Smith-Lovin, and J. Cook, “Birds of a Feather: Homophily in Social Networks,” Annual Review of Sociology, 2001, 27, 415–444. Meester, R. and R. Roy, Continuum Percolation number 119, Cambridge University Press, 1996. Mele, A., “A Structural Model of Segregation in Social Networks,” working paper, 2011. , “A Structural Model of Segregation in Social Networks,” working paper, 2015. Menzel, K., “Consistent Estimation with Many Moment Inequalities,” Journal of Econometrics, 2014, 182, 329–350. , “Inference for games with many players,” The Review of Economic Studies, 2015, 83 (1), 306–337. , “Large Matching Markets as Two-Sided Demand Systems,” Econometrica, 2015, 83 (3), 897–941.

60

A Weak Law for Network Moments , “Strategic Network Formation with Many Agents,” working paper, 2016. Miyauchi, Y., “Structural Estimation of a Pairwise Stable Network Formation with Nonnegative Externality,” working paper, 2014. Penrose, M., Random Geometric Graphs, Oxford University Press, 2003. and J. Yukich, “Central Limit Theorems for Some Graphs in Computational Geometry,” Annals of Applied probability, 2001, pp. 1005–1041. and , “Weak Laws of Large Numbers in Geometric Probability,” Annals of Applied Probability, 2003, pp. 277–303. Politis, D., E. Paparoditis, and J. Romano, “Large Sample Inference for Irregularly Spaced Dependent Observations Based on Subsampling,” Sankhy¯ a: The Indian Journal of Statistics, Series A, 1998, pp. 274–292. Powell, W., D. White, K. Koput, and J. Owen-Smith, “Network Dynamics and Field Evolution: The Growth of Interorganizational Collaboration in the Life Sciences,” American Journal of Sociology, 2005, 110 (4), 1132–1205. Sheng, S., “Identification and Estimation of Network Formation Games,” working paper, 2014. Sherman, M., Spatial Statistics and Spatio-Temporal Data: Covariance Functions and Directional Properties, John Wiley & Sons, 2011. Snijders, T., J. Koskinen, and M. Schweinberger, “Maximum Likelihood Estimation for Social Network Dynamics,” The Annals of Applied Statistics, 2010, 4 (2), 567–588. Song, K., “Econometric Inference on a Large Bayesian Game,” working paper, 2014. Turova, T., “Asymptotics for the Size of the Largest Component Scaled to “log n” in Inhomogeneous Random Graphs,” Arkiv főr Matematik, 2012, 51 (2), 371–403. van der Vaart, A. and J. Wellner, Weak Convergence and Empirical Processes: With Applications to Statistics Springer Series in Statistics, Springer, 1996. Xu, H., “Social Interactions in Large Networks: A Game Theoretic Approach,” working paper, 2015. Xu, X. and L. Lee, “Estimation of a Binary Choice Game Model with Network Links,” working paper, 2015.

61

A combinatorial method for calculating the moments of ...