Dynamic Aspects of Teenage Friendships and Educational Attainment
Eleonora Patacchini
Edoardo Rainone Oxford, December, 2010
Yves Zenou
Purpose Theoretical and empirical investigation of the role of peers for understanding behavior in education using a network perspective We assess whether and to what extent the peers during the teenage play a role for the individual future attainment in the adulthood
Outline of the presentation • Motivation • Theretical model • Related empirical literature • Empirical model and estimation strategy • Data • Empirical results and concluding remarks
Motivation Growing awareness that social context matters for individual outcomes Observation that many individual outcomes vary much more between social groups than within them Theoretically: models of social interactions are widely used Empirically: convincing tests of such models are still quite limited
• identification and measure of such peer effects is a quite difficult exercise • appropriate data sets difficult to find
Crucial issues in the literature on peer effects
1. Definition of the peers
Peer effects: an average intra-group externality that affects identically all the members of a given group
Group boundaries: arbitrary and at a quite aggregate level
Peer effects in crime: neighborhood level using local crime rates
Peer effects at school: classroom or school level using average school achievements
2. Identification of the effects
• Reflection problem (Manski, 1993) • Endogenous network formation • Correlated individual unobservables
Does the ”social multiplier” really exist?
3. Mechanism
Theoretical model of individual behaviour with social interactions
Bridge theory and empirics
Identification of Peer Effects through Social Networks Models of Social Interactions Focus: effect of the average level of activity of the group
y = φy g +Xβ + u Models of Social Networks Focus: effect of the structure of such a group
y = φGy + Xβ + u G : the n−square adjacency matrix of a network g formalizes the structure of interactions of the agents in the social space
Empirical approach: assume a particular structure to the social interactions, use high quality data on social groups and draw inference based on that assumption (Bramoull´e, Djebbari, Fortin, J. Econometrics, 2009; Calv`oArmengol, Patacchini, Zenou, Rev. Econ. Stud., 2009)
Theoretical approach: assume a particular functional form for social effects (linear in the efforts of other friends in the network with a quadratic cost function), use a Nash equilibrium concept and conduct a full-fledged equilibrium analysis that relates topology to outcome (Calv`oArmengol, Patacchini, Zenou, Rev. Econ. Stud., 2009)
Theoretical model Network model of peer effects with ex-ante heterogenous agents in education Each agent i in network r selects an effort yi,r ≥ 0 and obtains a payoff given by the utility function ³
´
1 2 + ui,r (yr , gr ) = ai,r + η r + εi,r yi,r − yi,r | {z }| 2{z } Benefits from own effort Costs
where φ > 0
φ
n X
gij,r yi,r yj,r
j=1 | {z } Benefits from own and friends’ effort
Two nindividuals i and j are directly connected (i.e. best friends) in o G = gij,r if and only if gij = 1, and gij = 0 otherwise Individual outcomes results from both idiosyncratic characteristics and peer effects
Payoffs are interdependent and agents choose their levels of activity simultaneously ∗ If φμ1(G) < 1, unique Nash equilibrium of this peer effect game yi,r
∗ =φ yi,r
n X
∗ +a +η +ε gij,r yj,r i,r i,r r
j=1
Each individual provides effort proportional to that of her/his reference group of best friends and to her/his idiosyncratic characteristics
Empirical counterpart Theoretical model: behavioral foundation for the so-called spatial lag model
nκ X
nκ M X X 1 gij,r yj,r + β mxm γ mgij,r xm yi,r = φ i,r + j,r +η k +εi,κ gi,r m=1 j=1 m=1 j=1 M X
|
n
o
{z ai,r
}
outcome G = gij : n × n non row-normalized adjacency matrix
Related literature - empirics of social networks Peer effects can be separately identified from contextual effects using the variations in the reference groups across individuals (Calv`o et al. 2009, Bramoull` e et al. 2009, Lin 2010, Liu et al. 2010 and Patacchini and Zenou 2012)
The use of out-group effects, to achieve the identification of the endogenous group effect in the linear-in-means model has also been used by Cohen-Cole 2006, De Giorgi et al. 2009, Weinberg et al., 2004; Laschever, 2009
Most related papers: Lin 2010 and Liu, Patacchini, Zenou and Lee 2010
Using data from the first wave of the AddHealth survey, Lin 2010 and Liu et al. 2010 provide an assessment of peer effects in student academic performance (GPA) and in crime, respectively
Lin 2010 : no theoretical model, row-standardized social interaction matrix, maximum likelihood estimation approach Liu et al. 2010: theoretical model, non row-standardized social interaction matrix, 2SLS and generalized method of moments (GMM) approach Liu et al. 2010 derive identification conditions and show that information on individual centrality in the network improve the statistical performance of estimators
Both approaches (Lin 2010 and Liu et al. 2010) for the estimation of peer effects, however, are grounded on the same identification strategy
Assumption: link formation is correlated with observed individual characteristics, contextual effects and that any remaining (troubling) source of unobserved heterogeneity can be captured at the network level, through the inclusion of network fixed effects
They cannot deal with the possible presence of unobservable within group individual characteristics, like unobserved individual preferences, that drive both group choice and individual outcomes
This paper contribution In this paper, we exploit the longitudinal structure of the Addhealth that allows a more than 10-years time interval between when group choice is made and when outcome is realized
Possible unobserved student’s characteristics driving friends’choice at school (i.e. common interests in sports or other activities, cheap talking) are unlikely to remain important determinants of individual decisions later on in life.
Our identification strategy exploits two unique features of the Addhealth data: (i) the nomination-based friendship information, which allows us to reconstruct the precise geometry of social contacts and (ii) the longitudinal dimension, which provides a temporal interval between when friends are chosen and when the outcome is realized
To the best of our knowledge, our paper is the first to exploit this comprehensive set of information to assess peer effects in education in this dynamic perspective.
Are peer effects persistent over time?
Empirical model and estimation strategy Network model with network fixed effects and centrality
Yr = φGr Yr + Xr δ + G∗r Xr γ + η ∗r lnr + r , G∗r : Row-normalized Gr We can eliminate the network fixed effect by the network-mean transformation, that is by multiplying this equation by the matrix: Jr = Inr − n1r lr lr0 (Inr identity matrix, lr vector of 1). Model becomes:
Jr Yr = φJr Gr Yr + Jr Xr δ 1 + Jr G∗r Xr δ + Jr r
Identification The model can be identified if and only if E (Gr Jr Yr | Jr Xr ) is not perfectly collinear with the regressors (Jr Xr , G∗r Jr Xr ) Identification sligtly different and more demanding than in Bramoulle et al. (2009) Liu et al. (2010): precise conditions for the identificationof network model with network fixed effects and a non row-normalized adjacency matrix The conditions imply that Ir , Gr , G2r and G3r are linearly independent If Gr is block diagonal, Gr =G2r and we are back to the reflection problem
Estimation Matrix form:
Y = φGY + X∗β + η + u, G = [gij,r ]: n × n sociomatrix; G∗: Row-normalized G X∗= (X, G∗X); β = (δ , γ ) Liu and Lee (2010) basic instrumental matrix for the estimation of the model above (finite-IVs 2SLS):
Q1= J(GX∗, X∗) conventional expansion (X, GX, G2X) for the case of a row-standardized matrix without network fixed effects
Liu and Lee (2010) enlarged instrumental matrix (many-IVs 2SLS)
Q2= (Q1, JGι) The 2SLS could be asymptotic biased when the number of networks increases too fast relative to their size
Liu and Lee (2010) have also proposed a bias-correction procedure based on the estimated leading-order many-IV bias (bias-corrected 2SLS): Q3 Assumption: Q1, Q2, Q3 orthogonal to u
”Dynamic” network model with network fixed effects and centrality
Yt= φGt−1Yt+X∗t β1+X∗t−1β2+η t−1+ut, Modified instrumental matrices
Q1= J(GX∗, X∗) =⇒
t−1
Q1 = J(Gt−1X∗t−1, X∗t−1)
t−1 Q2= (Q1, JGι) =⇒ Qt−1 2 = (Q1 , JGι)
Q3=⇒ Qt−1 3 t−1 t−1 Assumption: Qt−1 orthogonal to ut 1 , Q2 , Q3
Data Dataset of friendship networks in the United States from the National Longitudinal Survey of Adolescent Health (AddHealth), roughly 90,000 students in grades 7-12 from roughly 130 private and public schools in years 1994-95 Richness of the information provided by the AddHealth data Pupils were asked to identify their best friends from a school roster Information on the characteristics of nominated friends A subset of roughly 20,000 adolescent are interviewed again in 1995—96 (wave II), in 2001—2 (wave III), and again in 2007-2008 (wave IV) Final sample: 1,319 pupils distributed over 138 networks followed from wave I to wave VI
Friendship networks
Friendship information: friends nominations at wave I
Pupils were asked to identify their best friends from a school roster (up to five males and five females)
The limit in the number of nominations is not binding (even by gender)
Less than 1% of the students in our sample show a list of ten best friends
Friendship relationships are not always reciprocal
Peer group definition Knowing exactly who nominates whom in a network, we exploit the directed nature of the nominations data We focus on choices made and we denote a link from i to j as gij,r = 1 if i has nominated j as his/her friend, and gij,r = 0, otherwise (outdegree)−→ directed unweighted networks On average, students in our sample declare to have 1.46 friends with a standard deviation of 1.4 The average network size is 4 pupils (standard deviation equal to 14), the minimum is 4 while the maximum is 100 We also exploit the nomination order−→ directed weighted networks
Target variable and controls
The questionnaire of wave IV contains detailed information on the highest education qualification achieved; those with high school and above qualification are also asked to report the exact year when the highest qualification was achieved
We measure education attainment in completed years of full time education
Range: 9-26; average 16.31, standard deviation 3.19
Proxies for typically unobserved individual characteristics
To control for differences in individual ability and leadership propensity: mathematics score, indicator of self esteem and the level of physical development compared to the peers (wave I)
School inputs are accounted for by the inclusion of network fixed effects
Table 1: Description of Data 1,319 individuals, 138 networks Variable definition
Mean
St.dev
Dummy variable taking value one if the respondent is female. Race dummies. “White” is the reference group. “ Response to the question: "In the past 12 months, how often did you attend religious services", coded as 4= never, 3= less than once a month, 2= once a month or more, but less than once a week, 1= once a week or more. Coded as 5 if the previous is skipped because of response “none” to the question: “What is your religion?” Dummies for scores in mathematics at the most recent grading period, coded ( A , B , C , D or lower , missing). ” ” ” ” Response to the question: "Compared with other people your age, how intelligent are you", coded as 1= moderately below average, 2= slightly below average, 3= about average, 4= slightly above average, 5= moderately above average, 6= extremely above average. Response to the question: "How advanced is your physical development compared to other boys/girls your age", coded as 1= I look younger than most, 2= I look younger than some, 3= I look about average, 4= I look older than some, 5= I look older than most
0,53 0,17 0,05
0,50 0,37 0,23
2,16
1,38
0,29 0,34 0,20 0,11 0,05
0,45 0,47 0,40 0,32 0,21
4,01
1,08
3,34
1,11
Wave I (grade 7 - 12) Individual socio-demographic variables Female Black or African American Other races Religion practice
Mathematics score A Mathematics score B Mathematics score C Mathematics score D or lower Mathematics score missing Self esteem
Physical development
Family background variables Household size Two married parent family Parent education
Parent occupation manager
Parent occupation professional/technical Parent occupation office or sales worker Parent occupation manual Parent occupation military or security Parent occupation farm or fishery Parent occupation other
Number of people living in the household. Dummy taking value one if the respondent lives in a household with two parents (both biological and non biological) that are married. Schooling level of the (biological or non-biological) parent who is living with the child, distinguishing between "never went to school", "not graduate from high school", "high school graduate", "graduated from college or a university", "professional training beyond a fouryear college", coded as 1 to 5. We consider only the education of the father if both parents are in the household. Parent occupation dummies. Closest description of the job of (biological or non-biological) parent that is living with the child is manager. If both parents are in the household, the occupation of the father is considered. “none” is the reference group ” ” ” ” ” ”
4,39
1,35
0,73
0,44
3,18
1,08
0,11
0,31
0,20 0,11 0,31 0,02 0,03 0,13
0,40 0,31 0,46 0,15 0,16 0,34
Protective factors School attachment
Response to the question: "You feel like you are part of your school coded as 1= strongly agree, 2= agree, 3=neither agree nor disagree, 4= disagree, 5= strongly disagree. Response to the question: “How often have you had trouble getting along with your teachers?” 0= never, 1= just a few times, 2= about once a week, 3= almost everyday, 4=everyday Response to the question: "How much do you feel that adults care about you, coded as 5= very much, 4= quite a bit, 3= somewhat, 2= very little, 1= not at all Dummy taking value one if the respondent reports that the (biological or non-biological) parent that is living with her/him or at least one of the parents if both are in the household cares very much about her/him
Relationship with teachers Social inclusion Parental care
1,90
0,92
0,89
0,91
4,47
0,74
0,92
0,28
Residential neighborhood variables Residential building quality
Residential area suburban Residential area urban residential only Residential area area other type
Interviewer response to the question "How well kept is the building in which the respondent lives", coded as 4= very poorly kept (needs major repairs), 3= poorly kept (needs minor repairs), 2= fairly well kept (needs cosmetic work), 1= very well kept. Residential area type dummies: interviewer's description of the immediate area or street (one block, both sides) where the respondent lives. “Rural area” is the reference group. ”
1,52
0,79
0,30
0,46
0,23
0,42
0,02
0,12
”
Wave IV (aged 24 - 32) Years in Education Years in Education of peers Married Age Son or Daughter Religion practice
Completed years of full-time education Aggregate value of years in education over nominated direct friends. Dummy variable taking value one if the respondent is married. Respondent’s age Dummy variable taking value one if the respondent has a son or daughter. Response to the question: "How often have you attended religious services in the past 12 months? ", coded as 0= never, 1= a few times , 2= once a month, 3= 2 or 3 times a month, 4=once a week, 5=more than once a week.
16.31 23.90 0,45 28,5 0,47
3.19 23.40 0,50 1,72 0,50
1,72
1,63
Results Different model specifications where different sets of controls have been added
(i) Standard individuals’ characteristics and behavioral factors (e.g., socio-demografic factors, family background, self-esteem)
(ii) Gradually introduce protective factors (e.g., relationship with teachers, social exclusion, school attachment, parental care)
(iii) Then include proxies aiming at capturing the quality of social interactions (derived characteristics of direct friends)
Table 2: Estimation Results –peer effects-unweighted networksTotal IV
Lagged IV
2SLS finite IVs
0.011** (0.005)
0.015** (0.006)
2SLS many IVs
0.008 * (0.005)
0.010** (0.005)
bias-corrected 2SLS
0.009 ** (0.005)
0.011** (0.005)
Individual socio-demographic variables Family background variables Protective factors Residential neighborhood variables Contextual effects Network fixed effects
yes yes yes yes yes yes
yes yes yes yes yes yes
1,319 individuals over 138 networks. Notes: Estimation has been performed using Matlab. Standard errors are reported in parentheses. *** p<0.01, ** p<0.05, * p<0.1
Table 3: Estimation Results –peer effects-weighted networksTotal IV
Lagged IV
2SLS finite IVs
0.020** (0.009)
0.027** (0.011)
2SLS many IVs
0.013* (0.008)
0.016** (0.008)
bias-corrected 2SLS
0.014* (0.008)
0.018** (0.008)
Individual socio-demographic variables Family background variables Protective factors Residential neighborhood variables Contextual effects Network fixed effects
yes yes yes yes yes yes
yes yes yes yes yes yes
1,319 individuals over 138 networks. Notes: Estimation has been performed using Matlab. Standard errors are reported in parentheses. *** p<0.01, ** p<0.05, * p<0.1
A standard deviation increase in the education attainment of the individual teenage peers translate into a roughly 8 percent increase of a standard deviation in the individual education attainment at wave IV (roughly 3 more months of education) if networks are unweighted
A standard deviation increase in the education attainment of the individual teenage peers translate into a roughly 10 percent increase of a standard deviation in the individual education attainment at wave IV (roughly 4 more months of education) if networks are weighted
The imprinting given by peers at school seems to be carried over time
Mechanisms -better study habits -more intellectual curiosity -choice to go to college is affected by the choice of peers
Results - sample splits by grade
Table 4: Estimation Results –peer effectsUnweighted networks
Grade 7-9 Total IV
Lagged IV
2SLS finite IVs
0.007 (0.006)
0.010 (0.007)
2SLS many IVs
0.008 (0.006)
0.009 (0.006)
bias-corrected 2SLS
0.009 (0.006)
0.011* (0.006)
Individual socio-demographic variables Family background variables Protective factors Residential neighborhood variables Contextual effects Network fixed effects
yes yes yes yes yes yes
yes yes yes yes yes yes
713 individuals over 80 networks Notes: Estimation has been performed using Matlab. Standard errors are reported in parentheses. *** p<0.01, ** p<0.05, * p<0.1
Table 5: Estimation Results –peer effectsweighted networks
Grade 7-9 Total IV
Lagged IV
2SLS finite IVs
0.010 (0.010)
0.011 (0.012)
2SLS many IVs
0.008 (0.009)
0.009 (0.009)
bias-corrected 2SLS
0.010 (0.009)
0.011 (0.009)
Individual socio-demographic variables Family background variables Protective factors Residential neighborhood variables Contextual effects Network fixed effects
yes yes yes yes yes yes
yes yes yes yes yes yes
713 individuals over 80 networks Notes: Estimation has been performed using Matlab. Standard errors are reported in parentheses. *** p<0.01, ** p<0.05, * p<0.1
Table 6: Estimation Results –peer effectsUnweighted networks
Grade 10-12 Total IV
Lagged IV
2SLS finite IVs
0.021* (0.011)
0.024** (0.011)
2SLS many IVs
0.016* (0.009)
0.016* (0.010)
bias-corrected 2SLS
0.018* (0.009)
0.018* (0.010)
Individual socio-demographic variables Family background variables Protective factors Residential neighborhood variables Contextual effects Network fixed effects
yes yes yes yes yes yes
yes yes yes yes yes yes
492 individuals over 55 networks Notes: Estimation has been performed using Matlab. Standard errors are reported in parentheses. *** p<0.01, ** p<0.05, * p<0.1
Table 7: Estimation Results –peer effectsweighted networks
Grade 10-12 Total IV
Lagged IV
2SLS finite IVs
0.039** (0.018)
0.044** (0.019)
2SLS many IVs
0.029** (0.014)
0.031** (0.015)
bias-corrected 2SLS
0.033** (0.009)
0.035** (0.015)
Individual socio-demographic variables Family background variables Protective factors Residential neighborhood variables Contextual effects Network fixed effects
yes yes yes yes yes yes
yes yes yes yes yes yes
492 individuals over 55 networks Notes: Estimation has been performed using Matlab. Standard errors are reported in parentheses. *** p<0.01, ** p<0.05, * p<0.1
Main message of the paper Peer effects at school are important and persistent over time
The information content of nomination order might be non-negligible
Relevant peers: friends in grade 10-12
THANK YOU FOR YOUR ATTENTION