Phonological categories in infant-directed speech: Supplementary Materials Alejandrina Cristia∗ August 18, 2011
Abstract I report on supplementary analyses to the manuscript entitled “Phonological categories in infant-directed speech”, which documented that tenseness contrasts are not enhanced in infant-directed speech (IDS) as compared to adult-directed speech (ADS). That tenseness is not enhanced (but on the contrary sometimes deteriorated) is also true for a non-parametric measure of separation and for Mahalanobis distances (Section 1). In addition, similar results ensue when each acoustic dimension is analyzed separately (Section 2). Lack of enhancement in our sample appears to be partially due to an increase in within-category variance, which is documented in Section 3.
1
Additional distance measures
The measures reported in the main paper are basic in that the raw distance connects current findings with those documented in previous research on IDS, and D(a) is more elaborate, and possibly a better predictor of adults performance [2]. However, it is possible that means and variances are not the best way to capture the present acoustic cue distributions, which need not be normal. Therefore, I recalculated distances using a non-parametric version of D(a) is used, namely the difference between medians divided by the average of the interquartile range of each category. In addition, most previous work was based on 1 or 2 dimensions, whereas here we used all of the dimensions that have been shown to be perceptually relevant, although they are all on different scales and they are not all independent. In the main paper, we got over the scaling problem by z-scoring, but covariance across dimensions remains a potential issue. Results for these two distances are shown in Figure 1. In no case is the red box (IDS distances) higher than the blue box (ADS distances). In contrast, significantly higher distances for ADS than IDS are found in /i-I/ in both age groups, ˜ for the non-parametric distance. with a trend in the same direction for /e-E/ and /æ-æ/ ∗
This work was supported by funds from NSF 0843959 to Amanda Seidl. Amanda Seidl and Kristine Onishi designed the elicitation material, collected the recordings and supervised the coders. I thank Titia Benders for stimulating comments to the main manuscript, that inspired me to expand the analyses here; and to Amanda for comments on this document. Please email comments to
[email protected].
1
6 5
●
4
●
●
3
●
2
●
1
●
●
●
●
●
0
Non−parametric distance
●
11mo
4mo
11mo
4mo
11mo
i
4mo
11mo
an
4mo
en
6
e
●
5 3
4
●
●
1
2
●
●
0
Mahalanobis distance
● ●
11mo
4mo
e
11mo
4mo
11mo
i
4mo
an
11mo
4mo
en
Figure 1: Non-parametric (top) and Mahalanobis (bottom) distance by contrast, age group, and register (blue=ADS, red=IDS.
2
2
Distances in each acoustic dimension separately
It is usually the case that listeners do not treat all cues equally, but rather one subset of the acoustic dimensions is more important perceptually than another. For example, /sa/ and /Sa/ differ both in the frequency distribution of the frication noise and in the F2 found early in the following vowel, but the former may be more important for adult listeners [3]. The cues used to represent tenseness in the present study were chosen because previous research shows that English listeners use them (see the main paper for citations). Nonetheless, if talkers do weight some cues more heavily than others, it is likely that they will enhance primarily these highly weighted cues [1]. In this case, the distances measured will still reflect the expansion, since expansion is additive (not averaged). However, one can postulate a variant of the hyperspeech hypothesis by which they will enhance the heavily weighted cues and deteriorate cues that have low weighting, in order to bias infants’ attention away from them. In this case, averages would not show enhancement, since distances along some dimensions will decrease and others increase. To deal with this possibility, I recalculated distances over each dimension separately. An additional advantage of this analysis is that there is no need for z-scoring, since the dimensions are not combined. As reported in section 2, results are basically the same. All of the boxplots for these analyses are provided online in a separate pdf. To make this more specific, if this interpretation is correct, one should find compensation, such that the average distance for IDS is higher than that for ADS in one dimension, but the opposite occurs for one or more of the other dimensions. For the raw measure in /e-E/, F1 and F2 at both points in the vowel show higher distances in IDS than ADS in the 11mo group; no change is clear in the 4mo group, or in duration for the 11mos. In terms of the raw measure /i-I/ are further apart in IDS along F1 at both points and F2 at 40% for 11mos; the other age group and the other dimensions do not exhibit large differences between the registers. Therefore, there is no compensation in terms of the raw distances. As for the normalized distance D(a), average distance between /e-E/ for IDS is significantly lower than in IDS for F2 at the 40% measurement point in both age groups, and for duration in the 4mo; other dimensions show no significant changes. For /i-I/, F1 at both points and F2 at 40% show significantly lower distances for IDS than ADS; the other cases do not show any signficant changes. Here as well, there is no evidence of compensation, with changes along different dimensions always patterning in the same direction, which is the same as evidenced by the composite measures in the main paper.
3
Possible increase in variability
To directly assess whether vowels are acoustically more variable in speech addressed to infants, I calculated the average standard deviation along the z-scored identifying acoustic dimensions within each talker, for each of the 8 vowel types elicited (2 tense, 2 lax, 2 nasal, 2 oral). The number of talkers who could be included are listed in Table 3; they all had at least 8 tokens of the vowel in IDS and another 8 or more in ADS. As shown in Figure 2, for most of the vowels, more than half of the talkers had larger within-category variability in IDS than ADS, with the overall 3
average being 70%. The proportion is somewhat higher in Tenseness (78%) than Nasality (62%) ˜ seems to pattern categories, but this difference was driven by /E-˜E/. Thus, in these analyses, /æ-æ/ 1 with the tenseness contrasts. Finally, to ensure that these effects were not due to more tokens, or more diverse types, having been uttered during the IDS than the ADS portion, caregivers who produced the exact same number of tokens in IDS and ADS for a given type were selected, and the variability was calculated for each caregiver, type, and register. One-hundred and four caregiver-words could be thus matched, and greater variability was found in infant- than in adult-directed speech both in binary classifications [66 out of 104 cases had greater variability in IDS, p = .004] and continuous measures [average variability was greater in IDS, t(103) = 2.85, p = .005]. Figure 3 illustrates this with the instantiations of the word “beetle” by 5 different caregivers who spoke this word 3 or more times. Table 1: Number of caregivers that could be included in the calculation of the variability of each vowel. Contrast Tenseness
Nasality
Vowel /e/ /E/ /i/ /I/ /E/ /˜E/ /æ/ ˜ /æ/
4mo 23 19 21 23 17 19 22 24
11mo 15 13 15 12 13 14 15 15
References [1] Kyoung-Ho Kang and Susan G. Guion. Clear speech production of Korean stops: Changing phonetic targets and enhancement strategies. The Journal of The Acoustical Society of America, 124:3909–3917, 2008. [2] Rochelle S. Newman, S. A. Clouse, and J. Burnham. The perceptual consequences of acoustic variability in fricative production within and across talkers. The Journal of the Acoustical Society of America, 109:3697–3709, 2001. [3] Susan Nittrouer. Learning to perceive speech: How fricative perception changes, and how it stays the same. The Journal of the Acoustical Society of America, 112:711–719, 2002.
1
I find this particularly interesting because, perceptually and acoustically, æ undergoes a much larger quality change when nasalized: many non-native listeners (e.g., French) report hearing a nasalized /E/, and this quality change was also evidenced in this corpus; see Supplementary analyses to the English and French comparison in the project website.
4
Figure 2: Proportion of caregivers for whom a given category was more variable in IDS than ADS.
250
ADS beetle
300 ●
●
350
● ●
450
450
400
●
F1 (Hz)
350
●
400
F1 (Hz)
300
250
IDS beetle
3200
2800
2400
3200
F2 (Hz)
2800
2400
F2 (Hz)
Figure 3: Illustration of the increase in acoustic instantiation found in IDS as compared to ADS. Each shape and color represents an individual caregiver, and each point a token spoken by that caregiver in the relevant register.
5