SPEECH SOUND CATEGORIES IN LANGUAGE ACQUISITION AND LEARNING A Thesis Submitted to the Faculty of Purdue University by Alejandrina Cristia In Partial Fulfillment of the Requirements for the Degree of Master of Arts December 2006 Purdue University West Lafayette, Indiana


A la memoria de José María y Rodrigo Cristiá



My deepest appreciation to the members of my Committee, Diane Brentari, Alexander Francis and Amanda Seidl, who provided me with three outstanding models as researchers, which I can only hope to have merged consistently in this and future work. I would also like to thank those who have kindly lent me their (and their students’) ears: Amanda Seidl, Donny Vigil, Elizabeth Strong, Ivana Hrbud, Jennifer Doyle, Najeong Kim, and Ricard Vinas de Puig; or their voices: Amanda Seidl, Charles Smith, Heather Rushton, John Spartz, and Sarah Pope. Finally, my recognition to the staff of Purdue Infant Research Labs and the many parents and babies who help us unravel the mystery of language acquisition.


TABLE OF CONTENTS Page LIST OF TABLES......................................................................................................... vi LIST OF FIGURES ..................................................................................................... vii ABSTRACT ................................................................................................................ viii CHAPTER 1:



Preliminary typological and psycholinguistic evidence of features........................ 4


Features: linguistic or phonetic? .......................................................................... 6


Considerations on the relationship between phonology and phonetics ................ 7

1.3.1 Phonological is cognitive, phonetic is physical ..............................................8 1.3.2 Phonology is cognitive but influenced by phonetics ......................................9 1.3.3 The distinction between phonological and phonetics is superfluous ............10 1.3.4 Phonetics and phonology overlap ...............................................................13 1.4

Overview .......................................................................................................... 14


THE ORIGIN OF FEATURES.........................................................16


Features are linguistic ....................................................................................... 17


Features are originally phonetic ........................................................................ 21

2.2.1 Diachronic features ....................................................................................22 2.2.2 Emergent features ......................................................................................24 2.3

Conclusions on the origin of features................................................................. 26




Development of production .............................................................................. 29


Infants’ and children’s perception and memory ................................................ 31

v Page 3.3

Learnability, complexity and groundedness ...................................................... 36


Conclusions on infants’ representation of speech sounds ................................... 38




Features in production...................................................................................... 41


Features in perception....................................................................................... 43


Phonological learning in adults ......................................................................... 47


Conclusions on adults’ representations of speech sounds ................................... 52

CHAPTER 5: 5.1


Experiment 1: Adults’ learning ......................................................................... 55

5.1.1 Methods.....................................................................................................55 5.1.2 Results .......................................................................................................58 5.1.3 Discussion ..................................................................................................60 5.2

Experiment 2: Young infants’ learning.............................................................. 63

5.2.1 Methods.....................................................................................................63 5.2.2 Results .......................................................................................................65 5.2.3 Discussion ..................................................................................................67 5.3

Experiment 3: Learning in older infants............................................................ 68

5.3.2 Preliminary results......................................................................................69 5.3.3 Discussion ..................................................................................................71 5.4

General discussion ............................................................................................ 72








Table 1: Initial consonants in test items for experiment 1...............................................57 Table 2: Means and Standard Deviations of ‘heard’ responses to test item types for Experiment 1.........................................................................................................59 Table 3: Initial consonants in familiarization and testing in Experiment 2 .....................64 Table 4: Means and Standard Deviations (between parenthesis) of looking times to test item types in Experiment 2 ....................................................................................66 Table 5: Pair-wise comparisons between orders and conditions in Experiment 3 ...........70 Table 6: t and p values for each condition and order in Experiment 3............................70 Table 7: Means and Standard Deviations by test item type in Experiment 3..................70





Figure 1: Phonology distinct from phonetics ....................................................................9 Figure 2: Phonology distinct from phonetics, but influenced by it. .................................10 Figure 3: Phonology and phonetics overlap....................................................................14 Figure 4: Systems mediating language acquisition and change. ......................................16 Figure 5: Proportion of 'heard' responses by test item type in experiment 1....................59 Figure 6: Looking times by condition and test item type in Experiment 2 (error bars represent standard error)........................................................................................66 Figure 7: Looking times to legal and illegal items in Experiment 3 .................................71



Cristia, Alejandrina, M.A., Purdue University, December, 2006. Speech Sound Categories in Language Acquisition and Learning. Major Professor: Amanda Seidl. Phonological processes and patterns in spoken languages often involve groups of sounds. It has been assumed that this is because the primitives on which phonology operates are abstract features shared by a group of sounds. There are two opposing views as to how features arise. One view ascribes them to a Universal Grammar that is part of the genetic endowment of humans (the linguistic hypothesis), while the other proposes that language users induce features from their linguistic experience (the phonetic hypothesis). This thesis tests the predictions that these views make on language learning using an artificial grammar paradigm to test learning of a linguistic set of sounds and a phonetic set of sounds by three groups of subjects: seven-month-old infants, fourteen-month-old infants and adults. The results of these experiments suggest that learning of phonological patterns involving groups of sounds through a perceptual task is possible only in the youngest age. These results seem to support the linguistic hypothesis, since only the linguistically-based grouping was learned. Further, a developmental trend is suggested, such that younger infants group sounds into phonological classes, while fourteen-month-old infants, like adults, tend not to rely on phonological classes when learning phonotactic constraints.



Most children master the key aspects of the phonology of their language in the first few years of life. By twelve months of age, most hearing infants have become attuned to the speech sound contrasts that are present in their language and have lost the ability to perceive most non-native contrasts (Werker & Tees, 1984; Vihman, 1996). In addition, infants at 12 months exhibit a preference for sequences of sounds that constitute possible words in the input language (Jusczyk, Friederici, Wessels, Svenkerud, & Jusczyk, 1993; Juszcyk, 1997). Even younger than that, infants are able to pick up on distributional frequencies in both of these domains (phonetic tuning and phonotactic learning) after a short exposure to an artificial language (Chambers, Onishi & Fisher, 2003; Maye, Werker & Gerken, 2002; Seidl & Buckley, 2005; among others). How do they do it? It has been an assumption of much phonological work that infants are able to learn the phonology of their language so quickly because they are aided by Universal Grammar (Chomsky & Halle, 1968), a component in the mind/brain of all human beings which typically includes the set of primitives and the principles of organization of each linguistic module, including the phonological module. Thus, Universal Grammar guides the infant by providing him or her with the basic units on which all languages build their phonological system: the universal set of linguistic features. These features serve two functions: They provide a basis for distinguishing all units (in the case of spoken languages, speech sounds) that may be used contrastively in some language, and they allow the language learner to know which units may group together in phonological patterns. Alternatively to the Universal Grammar proposal, it has been hypothesized that firstlanguage learners rely on general cognitive tools (perhaps in conjunction with their perceptual and articulatory systems) in order to establish which bits of the signal should be relied upon to (re)construct their phonological system. Infants would thus develop

2 categories that fit the distributions present in their input language, for which reason these may be called emergent features. These emergent features would also allow the classification of sounds (or signs, in signed languages) into groups. Within the latter type of explanation, some researchers (Fleischhacker, 2005; Steriade, 2000, 2001; among others) propose that similarity plays a key role in phonology, directly affecting the grammar even in adulthood. Both explanations assume that there is a level of phonological representation which deals with features, relatively abstract units that encode characteristics of speech sounds or signs. However, psycholingustic studies on speech perception and production in adults do not overwhelmingly support such units (see below). Thus, some researchers propose other units of representation for speech: the dyphone (e.g. Nearey, 1997), the syllable (for example, Eimas, 1975), or the segment (for instance, Roelofs, 1997), in an attempt to better capture the behavior of language users. Others further propose that features are an historical residue, merely a construct that helps capture how languages evolve, but that they do not have a status in the synchronic, adult grammar (Ohala, passim). In this thesis, I focus on the predictions that these views make about the cognitive status of features in both the developed and the developing (infant) grammars of users of spoken languages. These predictions are laid out in Chapter 2 below, and they are tested against data from experimental research with infants in Chapters 3 and 5, and adults in Chapters 4 and 5. A close inspection of previous psycholinguistic and neurolinguistic research (for example, Studdert-Kennedy, Shankweiler, & Pisoni, 1972; Wickelgren, 1965, 1966; Eulitz & Lahiri, 2004; Phillips et al., 2000; but see Phillips, 2001) reveals that the definition of features in this work is sometimes vague. It is argued in Chapters 3 and 4 below that, when results are used to contend for abstract phonological features, one must ensure that they cannot be explained using alternative levels of representation. For example, Phillips, et al. (2000) used an oddball paradigm, in which subjects heard varied tokens of syllables with voiced stops from different places of articulation as onsets and, infrequently, a syllable with a voiceless stop or a novel syllable with a voiced stop in the onset. They found that only the former were detected and concluded that the abstract feature of

3 voicing was at play. However, voicing in initial stops relies on a few phonetic cues, which are reliable across places of articulation, so that the response could be attributed simply to the detection of a change at the phonetic level of the distribution of Voice Onset Time, for example, without necessitating reference to the abstract feature [voice]. Further, certain characteristics of the tasks used may have a bearing on the results achieved. For instance, phonological learning experiments in which subjects are asked to produce novel forms consistently show an effect of phonetic groundedness (e.g. Wilson, to appear). If these kinds of experiments are used to argue that phonetically grounded patterns are favored in the cognitive grammar, then the same should apply to a hypothetical experiment in which subjects are presented with glottal stops that have been manipulated so as to present the characteristics of voiced stops, and then asked to produce words containing this (impossible) segment. In this hypothetical experiment, subjects would undoubtedly fail to produce these words, though this would not follow from a cognitive constraint against voiced glottal stops, but rather from this sound being anatomically impossible to produce (cf. Lindblom, 1990). Furthermore, learning experiments arguing against features commonly rely on stimuli that exhibit very little variability: for example, Onishi, Chambers, and Fisher (2002) use a small set of pseudo-words. Increasing variability may aid generalization, as suggested by research on second language learning and training (Hardison, 2003; Kingston, 2003; Wang, Yongman & Sereno, 2003). Additionally, the constraints in these experiments typically affect a set of phonemes that cannot be viewed as either a linguistic or phonetic class (for example, /p,z/ occur in word-initial position and /b,t/ in word-final position). Alternatively, constraints may affect a group of sounds that is both a phonological and a phonetic class by virtue of having reliable common perceptual characteristics, such as voiced stops (Endress & Mehler, under review). In Chapter 5, the results of an experiment that attempts to overcome these shortcomings are presented. In particular, a more varied set of words was used. In addition, the sounds involved in the phonotactic constraint could only be expressed in terms of abstract phonological features, thus avoiding a confound between phonetic and phonological units. Finally, a perception rather than a production task was used.

4 The next section presents the main theoretical arguments in favor of features from the phonological literature. Section 1.2 elaborates on four models of the relationship between phonology and phonetics, which is crucial for defining features. Finally, section 1.3 establishes three main hypotheses that would explain the relevance of features to language, and presents an overview of the rest of the thesis.


Preliminary typological and psycholinguistic evidence of features

Features are conceptually useful to phonology for two reasons. First, they allow easy encoding of the sound contrasts within an inventory. Jakobson, Fant and Halle (1963) point out that two types of features should be distinguished in a given language: distinctive features are those used to encode a difference in meaning in some language, whereas redundant features of speech sounds may be systematically present but will never be linguistically relevant. For example, many languages contrast oral and nasal vowels. On the other hand, numerous languages display vowel-to-vowel coarticulation, a process by which the actual realization of a given vowel varies depending the vowels in the preceding and following syllables. The former variation, but not the latter, can be encoded using features, correctly predicting which variation can be linguistically relevant.1 Secondly, features provide a straightforward explanation for the fact that many (if not most) phonological distributions, alternations and processes seem to target groups of phones rather than isolated segments. For example, in English, /p,t,k/ are aspirated syllable-initially (when not preceded by /s/). Furthermore, all vowels are shorter before these phones than when placed before voiced obstruents like /b,d,g/. Both of these processes are peculiar to English and are not universal phonetic effects. In order to

We will return the issue of which features are or can be linguistically relevant in Section 2.1 below. It will be shown, first, that the line between them is not so clear (Kiparsky, 1995; Clements, 2001), and that the features which are phonologically and phonetically active need not coincide with the features used to encode lexical distinctions (Goldsmith, 1995; Keating, 1988). Also, vowel harmony has been analyzed as the result of phonologization of this vowel-to-vowel coarticulation phenomenon. In fact, Flemming (2001) summarized below borrows from such parallels to argue against the separation between phonetics and phonology. 1

5 describe these processes using only the level of the segment, we would have to assume that the native speaker has encoded a variety of rules or generalizations like the following: 1. aspirate [p] syllable-initially, 2. aspirate [t] syllable-initially, 3. aspirate [k] syllable-initially, 4. clip [o] before a homosyllabic [p], 5. clip [e] before a homosyllabic [p], And so on for each vowel and stop. Alternatively, we could propose that /p,t,k/ participate in these two alternations because they constitute a class, the class of voiceless stops that is defined by the abstract features [-cont,-voice].2 The example points out that the same segments /p,t,k/ trigger clipping of a preceding vowel and undergo aspiration. These two processes are unrelated to each other, and yet they seem to delineate the same grouping of oral stops. This would provide a theoretical motivation for features, since positing a feature in common allows to more parsimoniously account for why these three segments are targets or triggers of those processes. If language learners had access to these features, their task of establishing the above mentioned generalizations would also be greatly simplified, having to learn two processes being triggered or targeting more abstract features (e.g., [-cont,-voice]) instead of many more specific processes. Thus, given that features would allow a more parsimonious definition of phonologically active classes, the next question is whether they are active in the speaker/hearer’s grammar, that is, whether they are part of the conceptual system that language users possess. Early proponents of the distinctive feature theory argued that this indeed was the case. For example, Kenstowicz and Kisseberth (1979:31) point to evidence from second language learning. Specifically, they show that this account correctly predicts that native English speakers project the kinds of rules mentioned above, such as aspiration of voiceless stops, onto their pronunciation of plosives in a foreign language like French, in spite of the fact that in this language stops are unaspirated. Converging evidence comes

These two facts would not be independent from each other if they are both correlated to the same articulatory gesture (compare Lisker and Abramson, 1971). 2

6 from loanword adaptation. Borrowed words having non-native phones seem to respond to the same constraints as native phones. For example, English native-speakers use the voiceless alternation of the genitive in a phrase like ‘Bach’s music’, in spite of the fact that ‘Bach’ is pronounced with the non-English velar voiceless fricative (Halle, 1964). Hence, features appear to be active even for sounds that do not belong to the speaker’s language.


Features: linguistic or phonetic?

The last section concluded that phonological processes target groups of segments and that these groupings recur. In the example above, unrelated processes such as clipping and aspiration demarcated the same grouping of /p,t,k/ as the trigger or target, respectively. Two different hypotheses could explain how these coinciding groupings arise: the same segments would participate in a given process because they share linguistic features, or groupings could be determined by the phonetic grounding of speech. Linguistic features would determine sound groupings from a perspective that assumes that these features are part of the Universal Grammar. From this point of view, groupings of sounds will be either natural or arbitrary, depending on whether they share a common linguistic feature or not. For example, /p,t,k/ would be a natural class, but /p,l,a/ would be an arbitrary class, since there is no abstract feature in common. This perspective predicts a difference between the two kinds of classes, so that phonological processes should target natural classes and not arbitrary ones, and that arbitrary groupings may only occur as a result of random historical changes. The first section of chapter 2 summarizes the arguments and predictions of this view. Alternatively, if features arise from the phonetic grounding of speech, they need not be part of the initial grammar of the language-learner. This viewpoint predicts that the ease with which sounds will be grouped together is related to phonetic similarity.3 According

It is an open question whether phonologically ambiguous sounds are also those which are variable across languages. For example, the phonological specification of rhotics is debated (Mielke, 2004a) and they are also very different in different languages (for instance, /r/ is an approximant in English, but a trill in Spanish). A counterexample comes from the classes focused in this thesis: nasals seem phonetically straightforward, but their phonological classification as continuant or interrupted is questionable according to Mielke (2004b). 3

7 to this account, there is no clear-cut difference between natural and arbitrary classes, and ambiguous sounds are predicted to be classified differently in diverse languages. There are two distinct hypotheses that rely on features being grounded on the phonetics of speech. One of them suggests that the language learner posits features that account for the patterns present in the input language. Once this is achieved these features are part of the learner’s phonological system. In other words, although initially grounded on the phonetics of speech, features become cognitive categories in the developed grammar. I will call this hypothesis emergent. Section 2.2.2 elaborates this account. A second hypothesis also assumes that features are grounded on the physics of speech. Unlike the emergent account, it does not assume that features are cognitive categories, part of the phonology. I will call this hypothesis the diachronic account, because it assumes that features are evident over time, as they are only present in the perception and production of speech, but not in the phonology. I discuss the diachronic view in more detail in Section 2.2.1.


Considerations on the relationship between phonology and phonetics

These conceptions of the origin of features are necessarily couched in theories on what ‘phonology’ and ‘phonetics’ are and how they are similar or different from each other. In this section, I summarize four main positions on this question. Traditional phonological theories (e.g. Chomsky & Halle, 1968; Troubetzkoy, 1969) assume that phonology and phonetics are distinct modules in the grammar and only interact insofar phonology transmits a set of phonological representations to be implemented by the phonetic module. More recent developments (e.g. Archangeli & Pulleyblank, 1994; Calabrese, 2005) argue that phonetics also constrains the primitives of phonology and influences the shape of phonological processes. Yet others argue against this classical separation of phonetics and phonology and propose a unified framework for dealing with both (Flemming, 2001a, b; Hayes, Kirchner & Steriade, 2004; Steriade, 2000, 2001). A fourth theoretical stance is that phonology and phonetics are distinct but that they are more like the two endpoints of a continuum, so that the processes and representations of phonetics and phonology can be prototypically different and yet much variation can be found along

8 the way (Cohn, 2003; Pierrehumbert, 2000, 2001; Scobbie, 2005). These four positions are explored in more detail below.

1.3.1 Phonological is cognitive, phonetic is physical According to a traditional account (Chomsky & Halle, 1968; more recently, Hale, Kissock & Reiss, in press), phonology and phonetics belong to different modules, study different phenomena and propose different representations. While phonetics belongs to the realm of physics, phonology is better viewed as a study of cognition. Phenomena addressed by phonetics is often gradient, measured with concrete, continuous parameters, whereas phonology only deals with categorical phenomena, for which reason phonological representations are symbolic. Finally, phonetics concerns itself with the details of speech production and perception, whereas phonology is only interested in contrasts that are linguistically meaningful and do not depend on individual variation or variation produced by rate of speech or style, so that phonology, unlike phonetics, is systematic.4 Since phonology and phonetics are distinct, interfaces must be postulated between them. In particular, phonologists have often been concerned with the implementation of phonological representations in the phonetics of speech. For example, Halle (1964) argued that a universal interpretation module was part of the Universal Grammar, which predicts that a given feature would be implemented in the same way across all languages. Also along these lines is the model of transduction proposed by Hale, Kissock and Reiss (in press), in which phonology feeds the symbolic features it handles to a pair of transducers, one for perception and one for production. These transducers transform phonological features into, for example, a gestural score that has a completely different nature from those features and serves only to encode the instructions that the motor system will receive. According to this dualist view, phonology should only be concerned

A view that draws the line between phonology and phonetics based on systematicity would be hard put to explain language-specific phonetics that are wholly systematic, but do not depend on varying featural specifications; for example, speakers of languages having voiced and voiceless stops may use different cues for this distinction. 4

9 with how the phonological grammar is organized. The only way in which phonology and phonetics relate is in the phonetic implementation, and this is done through an innate transducer, so that phonetics and phonology never interact. Finally, the strict division between phonology and phonetics is deemed to coincide with the conceptual divorce between competence and performance. Figure 1 presents a representative diagram of this position.

Figure 1: Phonology distinct from phonetics

1.3.2 Phonology is cognitive but influenced by phonetics Other researchers, however, argue that phonetics indeed interfaces with phonology in the “substantive grounding” of language (e.g., Archangeli & Pulleyblank, 1994; Calabrese, 2005; Cohn, 2003). First, phonetics would contribute to define the set of features that are available for a language to encode contrasts. For example, the modality that a given language uses (spoken or signed) would influence the hierarchy of features that language will use (Brentari, 1998; Mielke, 2004a; Sandler & Lillo-Martin, 2006). Factors related to ease of perception and production may also account for the shape of inventories and the relative markedness of some featural combinations (Archangeli & Pulleyblank, 1994; Ohala, 1983; Padgett, 1995). Thirdly, it may be that phonetic factors influence phonological processes as well. For example, perceptual factors could explain why vowel harmony commonly targets back and round vowels at once, since these two featurevalues both contribute to lowering the second formant (e.g. in Turkish, where vowels agree in backness and, if they are [+back] they must also agree in roundness: Kenstowicz,

10 1994; see Ohala & Ohala, 1993, for more arguments on the phonetic grounding of phonological processes). Figure 2 represents this alternative viewpoint.

Figure 2: Phonology distinct from phonetics, but influenced by it. An example of sound change that illustrates this interaction is discussed in Kiparsky (1995:662-666). Kiparsky points out that, in general, tense vowels tend to be raised, while lax vowels tend to fall over time. However, this does not occur in all languages equally, as one would expect if only articulatory and acoustic factors were at play. On the contrary, these processes have occurred frequently in the history of English but rarely in Japanese. In fact, they are persistent in languages that have both tense and lax vowels in their inventory, as if the presence of the feature [tense] in some level of phonology was necessary to ‘activate’ the change. In other words, only specified feature values can feed phonological rules, and although a sound will of necessity be assigned a value in a phonetic continuum, this value will only be targeted by a phonological rule if it is part of the phonological representation of the sound.5

1.3.3 The distinction between phonological and phonetics is superfluous Other current perspectives argue against the strict separation of phonology and phonetics. For example, Flemming (2001b) presents several arguments in favor of

Two phonetic explanations to this fact may be put forward. First, it is possible that the vowel sets in these languages differ also in phonetic characteristics, such that the cues used to identify ‘tense’ vowels are actually different in the two languages. Second, a perceptually-oriented process that maximizes dissimilarity between neighboring sounds would be more liable to act in a more crowded acoustic space (with both tense and lax vowels) than in a vowel space where not as many distinctions are made. 5

11 knocking down the barrier between phonology and phonetics. He points out that implementation is not straightforward, given that the particular phonetic realization of phonological representations varies cross-linguistically, a phenomenon referred to as language-specific phonetics. A conspicuous example comes from coarticulation, a process that has been assumed to be straightforwardly phonetic and yet variable across languages (Flemming, 2001b). Flemming (2001a, 2001b, 2004) proposes that neither the representations nor the principles governing phonology and phonetics are distinct enough to justify a separation of the components. Instead, parallels between phonetic and phonological phenomena will be better explained within a unified model, in which the precise realization of a representation will respond to a specific language weighting of three goals: maximization of contrasts, minimization of effort, and maximization of distinctiveness. He argues that the distinction between phonetics and phonology based on one being gradient and the other categorical is conceptually weak –after all, a categorical representation can approximate a gradient one by defining smaller steps.6 He also alleges that the intuition behind this gross characterization is that phonological ‘categorical’ phenomena are those which refer to distinctive categories; for example, vowel nasalization is phonetic in English because oral and nasal vowels are not contrastive, but phonological in French because they are. In this case, then, inspection of the inventory of distinctive features would reveal whether a given process is phonological or phonetic, with which this distinction becomes derived rather than primitive. Steriade (2000) also takes this unifying point of view, presenting arguments against the common division between phonology as systematic and phonetics as fortuitous. She shows that phonetics can also be rule-governed, speaker-controlled and language-specific, even when involving features that are not even potentially contrastive. As noted above, phonologists’ interest has been focused on distinctive or potentially distinctive features,

Interestingly enough, the most symbolist of researchers currently addressing the question of features in acquisition (Hale, Kissock and Reiss, 2006) propose to use a finer set of categorical features, much finer than the one currently in use. They argue that features should be able to describe the precise realization of any sound in any language, regardless of whether or not that sound contrasts with some other sound. It certainly seems that the authors are covertly accepting phonetic definitions of sounds. For arguments against the innateness of such fine-grained features, see Ladefoged (2005). 6

12 that is, features that might be redundant in a number of languages but which are distinctive in some language. Steriade (2000) argues that systematicity will also affect features that never are contrastive. Here she examined paradigm uniformity (the tendency of language to make all the derivatives of a word more similar to each other), and, in particular, how certain phonetic details were leveled or transferred to other words within a paradigm.7 Steriade (2000) points out that in phrases like “d’rôle”, where a schwa is lost, the realization of the coronal stop and the uvular rhotic is the same as it would be if the schwa was still there. For example, dental closure in “de rôle” and “d’rôle” is much longer than that in “drôle”. If we were to assume strict ordering, then the rule that generates the variant of a phoneme should occur before the one that deletes the schwa, in which case phonetic details should be assigned before the output of phonology, a prediction that contradicts the strict separation of phonology and phonetics. A seemingly different approach that nonetheless reduces phonology to phonetics can be found in gestural theories such as that proposed in Goldstein and Fowler (2003). These approaches propose that both speech perception and speech production rely on gestural representations. In either the perceptually- and the articulatory-based theories, the basic units of representation are not features in the sense mentioned above. These units may be much ‘smaller’ (for example, Flemming, 2001, deals with formant frequency) and much more complex (including information as to the exact perceptibility of a certain segment in a given context, as in the Perceptual Map proposed in Steriade, 2001). Ohala (1995 and other work) could also be seen to support this view, in which phonology is wholly constrained by the phonetic grounding of speech. Unlike Flemming (2001a, b) and Steriade (2001), however, Ohala does not support the idea that the individual grammar will encode these phonetic facts. This position is further detailed below, in the section dedicated to diachronic features.

I summarize here only the second case she discusses, since the case of flapping is arguable. According to Steriade, the difference between the coronal flap and coronal stops is a function of closure duration, and its extra-short duration of the flap makes it unlikely it will ever be used contrastively. This claim is not easy to interpret, since [t] and [|] are contrastive in Spanish: cf. /pato/ ‘duck’ and [pa|o] ‘strike’. 7

13 1.3.4 Phonetics and phonology overlap Finally, another perspective (e.g. Scobbie, 2005; Pierrehumbert, 2000, 2001) also rejects the dualist perspective on phonetics and phonology, arguing that such a rigid model leaves many aspects unanswered. However, they concede that phonology and phonetics do target prototypically distinct phenomena, lying at opposite ends of a continuum, but the line between these two ends cannot be easily drawn. An example that illustrates the ambivalent character of some phenomena comes from Cohn (1993), an article that discusses nasal deletion in English using experimental data. Cohn tentatively proposes that nasal deletion is phonological, given that one of the speakers she recorded exhibited ‘categorical’ deletion of the nasal as well as ‘categorical’ nasalization of the preceding vowel (and here categorical stands for non-gradient). However, she notes that the second speaker recorded did not display the same patterns, exhibiting short but present nasal stops. Cohn (1993) attempts to explain this other piece of evidence as stylistic or rate-ofspeech variation, but this is not a viable option, unless we relax the definition of phonological to include phenomena that might depend on such ‘merely phonetic’ factors. From the phonology-phonetics overlap point of view, the same phenomenon may be phonological in one speaker but phonetic in another, even within the same dialect. An extension of this example would be the case of ‘phonologization’ or ‘reanalysis’ (Kiparsky, 1995; see also Labov, Karen & Miller, 1991). This is a phenomenon by which a redundant feature is reinterpreted and becomes an active part of the phonology of the language. The example of nasal deletion just discussed would illustrate the intermediate stage where two speakers interpret the same input as either phonetic or phonological. In this English example, a vowel followed by a nasal (and homorganic stop) coalesce into a nasalized vowel while the nasal consonant disappears. A similar process would have occurred in the history of French nasalized vowels, which currently contrast with oral vowels in the absence of a conditioning environment. This perspective also suggests that phonemic and allophonic status are simply two extremes of a cline (Goldsmith, 1995), and that phonetic consistency and phonological categorization need not go hand in hand. For example, Keating (1988) shows that sometimes segments appear to be specified for a

14 non-contrastive feature: /s/ in English blocks vowel-to-vowel coarticulation for the feature [high], even though there is no [-high] coronal fricative to contrast with it. Moreover, the assumption that phonology and phonetics overlap but do not coincide completely may explain why distinctive features are not easily ‘dephonologized’. Although fine-grained and phonetic-based in language acquisition, representations become more abstract categories that are not so easily influenced by phonetics (Pierrehumbert, 2000; Scobbie, 2005:22). This last point of view on the relationship between phonetics and phonology is represented in figure 3 below. 8

Figure 3: Phonology and phonetics overlap. This view proposes that phonological categories are constructed from the signal rather than specified in an innate module. It is, therefore, most compatible with the emergent view of features.



It is apparent from the evidence presented in section 1.1 that features would be theoretically and empirically desirable in phonology. However, we do not know how it is that features shape the groupings that occur in natural languages. While the linguistic hypothesis places them in Universal Grammar, the diachronic hypothesis ascribes them primarily to the phonetic conditioning of language. Finally, the emergent-features hypothesis proposes that features are not part of the Universal Grammar but induced

Although the distinction is clear conceptually, in practice it is difficult to foresee what predictions this position makes, and how they can be tested. 8

15 through language exposure. Unlike both the emergent and the linguistic views, the diachronic explanation of features does not assume that language users have cognitive categories in their grammars that correspond to featural categories. Chapter 2 lays out the arguments and predictions that these views on features make. Chapter 3 contrasts them with results in research in language acquisition and chapter 4 with results in adult studies. It will be shown that there is some evidence in favor of features being active in infants. For adults, however, the panorama is more complex. In fact, both production and perception tasks fail to reveal conclusive evidence for abstract features. On the other hand, tasks involving learning seem to provide a way to settle this question. Chapter 5 presents further evidence bearing on this question. We carried out a perceptual task in which the stimuli exhibited a phonotactic pattern. Second, based on a crosslinguistic study that suggested that some sounds may be grouped in more than one way, we selected two sound groupings that should be matched in terms of phonetic similarity or dissimilarity. Third, these two groupings could only be classes using abstract features. It was hypothesized that by controlling all these factors, we would be able to better ascertain whether features were part of the adult synchronic grammar. Finally, chapter 6 discusses the results of this experiment and draws lines for further research.



In this chapter, I will summarize the main details of each of the aforementioned hypotheses. Their predictions will be laid out and they will be exemplified with some of their current proponents in the literature. Each explanation can be related to a step in the following figure 4, where sound change and language acquisition are schematized:

Perceptual system

Articulatory system

Phonological System

Phonological System

Articulatory System

Perceptual System

Figure 4: Systems mediating language acquisition and change. The linguistic view of features places them in the phonological system of both the speaker and the hearer. The diachronic view ascribes them to the arrows that represent the act of transmission to the perceptual and articulatory systems. The emergent-features account grounds features in the individual’s perception and articulation, but proposes that they are then developed into abstract categories that belong to the phonological system.

17 Before proceeding, however, it is imperative to define some terminology. The linguistic hypothesis assumes that phonological categories are abstract in the sense that variation within a category is judged irrelevant in two ways.9 First, variation beyond that necessary for encoding lexical contrasts will not be considered. For example, the fact that the [t] in ‘tip’ has a different burst and formant transition from the [t] in ‘tap’ is linguistically irrelevant, given that this variation is not used to signal a different category. Second, the fact that voicing is (primarily) signaled through the length of the lag in (wordinitial) stops, but by the presence of a low frequency voicing bar in (word-initial) fricatives is also irrelevant to the phonological category of [voice]. I will call this abstract kind of features linguistic.10 Theories within the emergent-hypothesis’ umbrella are not so clear about the abstractness they attribute to features. However, they do contend that these features may evolve from being phonetic, low-level and tied to their particular realizations, to being part of a module distinct from the articulatory and perceptual systems.11 For this reason, the emergent account is compatible with cognitive/phonological features, although emergent features need not be symbolic.


Features are linguistic

The linguistic view of features places them in the innate grammatical system of the speaker, as part of Universal Grammar (to mention a few, Chomsky & Halle, 1968; Another sense of abstractness comes from the debated necessity of postulating more than one representation. For example, the German devoicing rule may cause two words with different underlying representations to sound the same,. This type of abstractness will plague any account of features, since it speaks of the need for rules. In this thesis, I am only dealing with surface-true representations (for example, phonotactic constraints) and not processes which require derivations. The necessity for the latter should be assessed independently of the way in which the phonology acquires its primitives, although it bears in the question of higher order categories such as allophonic variation. 10 In fact, these two examples show two different levels of abstractness. The different burst and formant transition in ‘tip’ versus ‘tap’ is irrelevant both according to the phonetic and linguistic hypotheses, but only the linguistic hypothesis assumes an a priori similarity between voiced stops and voiced fricatives. 11 It is possible to view Jusczyk’s WRAPSA model (Jusczyk, 1992, 1994, 1997) as an emergentist account insofar it proposes that infants start from highly specified representations, and develop more abstract ones. I take up this question again in Section 3.4. 9

18 Clements & Hume, 1995; Durand & Laks, 2002; Halle, 1964; Hale, Kissock & Reiss, in press). All human language learners acquire phonological processes and distributions by interpreting the linguistic input they are exposed to in terms of these features, and thus languages across the world do not vary widely in their phonological patterns and sound inventories (e.g., Clements & Hume, 1995: 245).12 Since features are part of the language learner’s grammar, learning will necessarily be constrained to those sounds or groups of sounds that can be expressed in terms of features, the primitives that allow the language learner to interpret sound patterns. Likewise, in those cases where more than one sound is involved in a phonological pattern, this view of features predicts that learning something about a set of phones that cannot be defined in terms of a set of features is difficult. Such groupings are thus called ‘unnatural’ classes, whereas those that can be described with a set of features are ‘natural’ classes. If a process is more complex, it will be more difficult to learn, and, as a consequence, will tend to be less common across and within languages. Thus, natural groupings are predicted to be widespread, whereas unnatural ones should emerge only as an effect of historical accidents. For example, vowels, nasals and labial fricatives pattern together in Edoid, according to Mielke (2004b:8), but this should be explained diachronically, as a confluence of separate changes rather than a single change that targets the three kinds of segments.13 It is obvious that it is impossible to falsify the claim that features are part of Universal Grammar, since contradictory evidence only speaks against a given feature set or feature definition, but never against the hypothesis itself. Nevertheless, this approach does make some falsifiable predictions in terms of acquisition, provided a learning pathway is described. According to a common view, infants have fully specified initial representations and must later trim down to the actual set of features that are contrastive

I will assume that features are binary rather than unary. As Hall (2001: Introduction) points out, it is difficult to predict how natural classes are defined with unary features. This assumption is not problematic here given that acquisition models that build on linguistic features take this position and further assume that the initial representation is fully specified. A discussion of underspecification in relationship with unary feature definitions can be found in Steriade (1995). 13 However, this pattern may have a very sensible phonetic explanation. Ohala and Ohala (1993) state several phonetic reasons for why some fricatives may act in the phonology as sonorants. 12

19 (e.g. Hale, Kissock, & Reiss, in press).14 Representations are fully specified because firstlanguage learners are endowed with a grammar that includes not only the set of abstract features but also their universal phonetic interpretation.15 Such a view can be found in Halle (1964) and Chomsky and Halle (1968), where, for example, a [+continuant] segment is produced without a complete obstruction in the vocal tract. Thus, in all languages, nasals and stops, which share an oral closure gesture, are [-continuant] and fricatives, vowels and glides [+continuant]. An alternative (articulatory) definition, not present in Chomsky and Halle (1968), would propose that [-continuant] sounds entail complete blockage of air flow through both the oral and nasal cavity, so that only oral stops would be [-continuant], whereas nasal stops, which allow flow of air through the nose, would be classed together with fricatives and vowels, since airflow is not obstructed. From the point of view of the linguistic-features hypothesis, either definition should hold cross-linguistically, since the implementation (as explained in Section 1.3.1) is thought to be the same in all speakers of all languages. In other words, natural classes should include the same segments in all languages. It should not be the case that the class

According to another view, mostly found in those studying production, children acquire features as they acquire contrasts (e.g. Dresher, 2003a, 2003b, and van der Hulst, 1995; Brown, 2000). Thus, first-language learners would start by assuming that all segments are allophones of the same phoneme until they establish that a lexical contrast exists between two segments. Then, learners would pose a feature to underlie that contrast. This kind of explanation has been criticized by other supporters of the linguistic features hypothesis on account of its circularity, in the sense that it assumes that the child will notice that two segments are contrastive because he or she has the phonological structure that differentiates them, while at the same time arguing that the child acquires the structure by detecting that the segments are contrastive. Indeed, Brown (2000:14) writes: ‘The child is able to contrast the two phonemes in his or her grammar once that phonological structure has been acquired and a representation has been constructed’, but yet ‘once a child notices that two segments are used contrastively (i.e., are distinct phonemes), the phonological structure that differentiates the two segments is added to his or her grammar’. In order to escape this circularity, some proponents of the linguistic features theory assume that the language learner will parse his or her input in terms of distinctive features, in a fully specified manner that must, therefore, be universal and part of the Universal Grammar. Also, for reasons explained in section 3.1 below, hypotheses based on children’s production data will not be explored. Thus, accounts like that of Dresher and Brown will not be further addressed in this thesis. 15 This approach can be traced back to the Natural Phonology Theory (Stampe, 1973; Donegan & Stampe, 1979), which proposed that not only representations but also phonological processes were innate. 14

20 of sonorants includes nasals and vowels in one language, but nasals, vowels and labial fricatives in another. A specific example of a theory of acquisition that espouses this view of features is proposed by Hale and colleagues (Hale & Kissock, 2005; Hale & Reiss, 2003; and Hale, Kissock & Reiss, in press). As supporters of a rich Universal Grammar, they build on the ‘poverty of stimulus’ argument. The speech signal is supremely complex, showing the interaction of many linguistic and non-linguistic factors (e.g., differences in the precise realization of sounds depending on the speakers vocal tract size), commonly referred to as the invariance problem. Hale and Reiss (2003) consequently claim that children will not be able to parse the signal at all, unless they encode their phonological input in terms of linguistic features.16 Approaches like this one assume that speech perception is purely symbolic, in the sense that it must always take place in a plane of abstract, linguistic representations. In other words, this view assumes that the synchronic phonology is a computational system that manipulates abstract categories, which do not take into account phonetic naturalness. As Buckley (2000) points out, even if all the rules in a given language are phonetically grounded, they will always be a subset of all possible grounded rules, for which reason the Universal Grammar need not encode such restrictions. On the other hand, theories incorporating markedness into Universal Grammar (e.g. Kager, Pater, & Zonneveld, 2004; but see Hayes & Steriade, 2004) will make a different

This model of acquisition makes several false predictions. Their model (detailed further in Hale and Kissock, 2005; Hale, Kissock and Reiss, 2006) predicts that all features will be available to infants, who should be able to tell apart all sounds that are contrasted in any language of the world. Further, infants should be able to compensate for individual variation with minimal exposure, since the Universal Grammar provides them with all the features and therefore does not provide them with anything else with which to parse the signal. Finally, they predict that infants will establish native categories when their lexicon is large enough to hold minimal pairs for every feature. Each of these predictions is refuted by experimental evidence: first, unlike what the authors report, not all infants are able to distinguish all contrasts at birth, and even infants who are exposed to a contrast are able to detect them when they are even ten months of age (for example, the difference between velar and coronal nasal, Narayan, 2006). Second, infants habituated to a word by a female speaker can recognize the same word by another female speaker by nine months of age, but can only do so with a male speaker at ten months (Houston and Jusczyk, 2003). Third, infants begin to lose their ability to tell apart sounds which are not present in the target language before their first birthday, when they still have a very small lexicon (Werker and Tees, 1984). 16

21 prediction. If that innate module incorporates markedness notions, we would expect marked or unattested processes to be harder to learn.17 In conclusion, this section presented the main predictions of the view that assumes that features are linguistic: they are abstract and part of the innate linguistic endowment. An alternative view will be explored in the next section, a view that proposes that features are based on the medium of language, and are thus phonetic, rather than linguistic.


Features are originally phonetic

The second explanation for the presence of features in language proposes that features encode phonetic similarities between phones. This phonetic similarity operates on language either directly or mediated by the synchronic grammar. According to the first view, features are the similarities between phones. In other words, processes and alternations affect groups of phones because processes of language change tend to affect any and all sounds bearing certain characteristics. Thus, features are not part of the grammar of the language users at any time. A second view supposes that language users extend the processes that affect speech to groups of sounds based on those characteristics based on categories that they develop from the signal. In other words, both accounts assume that the origin of classes of sounds lies not in Universal Grammar, but in the physical grounding of speech, such that classes of sounds arise from language change. However, according to Ohala (1983, 2005), the physical factors affecting language change alone are best able to account for phonological distributions. Contrarily, Hume and Johnson (2001) argue that grounded phonological representations mediate these changes, because the language learner has abstracted certain categories from the distributions present in the input language. The specific predictions of each account in terms of acquisition, learning and configuration of the synchronic grammar are laid out in the following two subsections.

I am using the term ‘markedness’ here following the literature that defends this notion, although, as Haspelmath (2006) shows, this concept may actually be a misnomer. 17

22 2.2.1 Diachronic features As mentioned above, one of the main motivations for features is that they allow the determination of classes of sounds that pattern together in phonological processes. According to the diachronic-features hypothesis, these ‘natural’ classes are merely ‘common’ classes. Certain segments tend to be classified together because they are all prone to undergo or trigger certain processes whose cause lies in the physics of speech articulation or perception. Features are thus theoretical constructs that describe common groupings of sounds, but do not exist in the synchronic grammars of language users. To develop this with an example, obstruents become voiced or spirantize intervocalically in many languages as a kind of lenition (e.g. Spanish and Tiberian Hebrew); become voiceless at the end of the word (e.g. Russian and German); and tend to shorten preceding vowels (e.g. American English) (Kenstowicz, 1994). The phonetic view of features suggests that these processes are common because of the physical properties of human vocal tracts. The fact that, when they recur, they define the same set of phones as triggers and undergoers in different languages is explained through the precise articulation or perception of the phones. For instance, word-final devoicing is unlikely to target vowels or nasals because these sounds are produced without a constriction that causes turbulence or stoppage of the airflow, and thus vocal cords vibrate easily, even before a pause. This anatomic fact provides the grounding for a feature such as [sonorant], but the positive and negative poles of [sonorant] are extremes on a continuum, perhaps represented by open vowels and the velar stop respectively (Wright, 2004; cf. Pulleyblank, 2003). Other sounds would occupy intermediate positions depending on the complex acrobatics necessary to hold the vocal folds in modal vibration: /b/ would be the stop that is closest to the [+sonorant] end, close vowels would be the ones nearer to the [-sonorant] side, and so forth. Such a view can be found in Blevins (2004). In Blevin’s evolutionary model of phonology, recurrent sound patterns are a consequence of common sound changes which are the result of indirect transmission in which human physiology plays a vital role. In other words, recurrent sound patterns are a result of a ‘natural selection’ by which only some of the set of random changes are phonologized. Synchronic grammars, which are assumed

23 to be a model of the purely grammatical knowledge of speakers, do not include any groundedness factors. In chapter 9 of Blevins (2004), she further proposes that the majority of linguistic knowledge may be learned from data-driven analyses.18 Similarly, Ohala (2005) argues against phonological models that attempt to encode naturalness using discrete categories and assume them to be them to be? psychologically real. He compares these theories to phonetic models, whose primitives are continuous and are not assumed to reflect psychological reality. He then tests these two kinds of theories as explanations for sound patterns (e.g., emergent stops). Emergent stops arise in English whenever a nasal consonant is followed by an obstruent, as in the word ‘empty’. This phenomenon has a straightforward explanation if the vocal tract is viewed as a system of tubes through which the air flows. For the production of a nasal consonant, the soft palate must be lowered, allowing the passage of air through the nose, and there must be an oral closure. For the obstruent, the soft palate should block the passage of air through the nasal cavity. If these two gestures are not properly timed, with the velum being raised before the initial oral closure is modified, a stop of the same place of articulation of the nasal emerges. Following Ohala’s reasoning, a phonological theory would assume that this process occurs as a result of the spreading of the [place] feature of the nasal consonant onto the following obstruent; in doing so, this theory disregards the fact that it is the anatomical distribution of the vocal tract where the process originates. Thus, this featural representation is not primitive, but derived from the phonetic grounding. Ohala (2005) consequently argues that conditions that arise from the configuration of the articulatory and auditory systems should not be encoded in the speaker’s grammar. Doing so would be doubly undesirable. First, it is undesirable on empirical grounds, since such theories make erroneous predictions. For example, these theories predict that all language will group sounds in natural classes, while Mielke (2004a) shows that languages vary as to which sounds pattern together. Second, it is redundant and, therefore, theoretically undesirable to encode phonetic constraints in the grammar, given that those constraints

Nevertheless, Blevins concedes that distinctive features (as well as prosodic categories), or plausibly ‘the learning strategies that converge on them’ (p. 218), may be innate. 18

24 are already present in the medium of speech. Finally, he points out that phonological features (as any other theoretical construct developed to represent patterns of sounds) should only be considered a part of the speaker’s grammar insofar as there is psycholinguistic evidence for their existence. This purely phonetic account of classes of sounds predicts that groundedness will influence diachronic change, but should have no weight in either the developing or the developed grammar. That is, infants and adults should be able to learn a phonetically unlikely distribution in which consonants are palatalized before back vowels, provided the consonants and vowels were sufficiently distinct. The fact that the latter rule is physically unlikely should have no bearing on learning.

2.2.2 Emergent features According to the emergent variety of the phonetic-features hypothesis, groupings of phones arise when language users rely on the articulatory and/or perceptual similarity of sounds to learn phonological distributions and to extend the class that triggers or undergoes a process (e.g., Hume & Johnson, 2001; Pierrehumbert, 2001; Hayes & Steriade, 2004; Mielke, 2004a).19 Both the diachronic and the emergent views of features predict that groups of sounds may vary across languages. The difference, however, lies in that the latter assumes that these features are incorporated onto the developed grammar. Pierrehumbert (2001) spells out how this might occur. Here she defines a category as a mental construct which relates two levels of representation, a discrete level and a parametric level. A category defines a density distribution over the parametric level, such that phonological categories are labels over a phonetic map, with the frequency distribution for each label being updated through ongoing exposure. Following Pierrehumbert (2001), it can be argued that representations must be tied to the signal for

Some linguistic accounts of features resemble this emergentist perspective. Clements (2001:72), for example, argues that phonological features are the subset of universal features that can be discovered by language learners as they notice that a feature is contrastive or plays a role in patterns or alternations. Since the ‘universal’ feature hierarchy can also be learned (Clements, 2002:84), at least at some point in the linguistic development, this account makes the same predictions as an emergentist one. 19

25 at least the following reasons. First, languages vary in unpredictable levels of detail, especially because of the fact that the same phonetic input should be interpreted differently according to the language. For example, a stop that is voiced in English should be deemed voiceless in Spanish. Second, the phonological system continues to evolve in time, as shown in studies of perceptual learning (e.g. Francis & Nusbaum, 2002). Thirdly, linguistic feature models seem to be based primarily on the notion of contrastivity. As noted, features are necessary to encode contrasts between speech sounds. However, a theory purely based on contrast does not account for the fact that elements that vary unpredictably lack linguistic relevance, in a similar way to those that do not vary at all. For example, in English, all sonorants are voiced, so that for this language voicing in nasals is linguistically irrelevant by virtue of being invariant. Likewise, a phonetic feature that varies unpredictably would be irrelevant, as is for example the variation in Voice Onset Time within a given category. The emergent model, thus, correctly predicts that these two types of behavior will have the same null effect.20 Theoretical developments espousing the presence of phonetic-based features in the synchronic grammar are numerous. Efforts to find invariant correlates to features, like those of Stevens (1972, 1989) and Fant (1973), can be read as early attempts to ground phonological features in the phonetic medium of speech (although they might have been guided by the idea that the phonetic realization of features would be universal). Furthermore, the earliest definitions of features (e.g. Troubetzkoy, 1939/1969) proposed that they arise from the contrasts that phones establish with one another in each particular language, and thus assumed them to be language-dependent. More recently, Mielke (2004a, 2004b, 2005) provides strong empirical arguments of why sound classes would be emergent. In his dissertation, Mielke (2004a) constructed a Phonological theory also has ways in which to deal with these types of behavior. Specifically, the feature[voice] in sonorants would be underspecified because it is predictable. Further, differences as to how redundant features operate in phonology have been argued in favor of underspecification and rule-ordering. For example, both Spanish and Russian do not have phonemic voiceless sonorants, so that in both languages sonorants should be underspecified for voicing. Both languages also have a regressive voicing assimilation rule, but this rule is ordered differently in the two languages. The question of underspecification has not been explored in this thesis, although recent discussions of the matter can be found in Steriade (1995) and Calabrese (2005). 20

26 database of the phonological processes, alternations and patterns in over five hundred languages. He thus defined all groups of sounds that patterned together in some process as a phonologically active class. He then compared these active classes with the natural classes predicted by three distinctive feature systems: Jakobson, Fant and Halle (1963), Chomsky and Halle (1968), and Clements and Hume (1995). He found that up to a quarter of these phonologically active classes where unnatural in all three feature systems, which suggests that the assumption that sound groupings are only rarely unnatural might be too strong. In a more detailed analysis of nasals, laterals and rhotics (Mielke, 2004b, chapter 4 in 2004a), Mielke advanced the hypothesis that phonetically ambiguous sounds were also phonologically ambiguous. For example, nasals pattern with [-continuant] sounds, such as stops, in 55% of the languages (including languages like Catalan and Comanche), but with [+continuant] sounds like fricatives in the remaining 45% (including unrelated languages, like Korean and Russian). A second prediction the emergent-features hypothesis makes is that adults’ learning will be influenced by the phonetic likelihood of a distribution or process. Thus, adults without prior exposure to palatalization should find it more difficult to learn that consonants palatalize before back vowels than when this process occurs before front vowels. However, this hypothesis does not predict that infants will be likewise influenced by the phonetic naturalness of changes.


Conclusions on the origin of features

To summarize, both phonetic-feature hypotheses attribute the existence of features to the physical medium of speech, whereas linguistic-features hypotheses place them in the innate Universal Grammar.21 The linguistic-features hypothesis assumes that features

Notice that, in order to account for language acquisition across modalities, the linguisticfeatures hypotheses must assume that either the same feature sets are to be found in spoken and signed speech, or that the Universal Grammar holds both sets of features. If features are, on the other hand, based on the physical medium of speech, they are naturally expected to vary across media. 21

27 must have the same phonetic realization in all languages22 and, therefore, that the same segments must have the same featural interpretation in all languages and for all speakers. With respect to language acquisition, the linguistic-features hypothesis predicts that phonological patterns that constraint natural classes will be favored over patterns constraining arbitrary groupings. If markedness is part of the Universal Grammar as well, infants (but not necessarily adults) are expected to more easily learn phonetically grounded distributions than phonetically arbitrary ones. Both phonetic-features hypotheses predict cross-linguistic variation in groupings of phones, and do not predict that natural classes will be favored in learning. In addition, the emergent features account proposes that features are induced into the synchronic grammar. The diachronic features view, on the contrary, makes no claims with regards to the synchronic status of features. In terms of language acquisition, both the emergent and the phonetic accounts predict that infants will rely on the phonological patterns present in the input language to group segments into classes. Thus, although it is important to keep in mind that these two views make different predictions for the adult grammar, they do not differ in their predictions for infants’ acquisition. For this reason, from this point onward I focus mainly on contrasting a phonetic hypothesis, which predicts that infants will groups segments on the basis of distributions present in the ambient language, and a linguistic hypothesis, that predicts learning of phonological patterns concerning groups of sounds to be constrained by Universal Grammar. The next chapter will review some evidence for the presence of features in the developing (infant) grammar and the impact of phonetic naturalness on learning. Chapter 4 targets these questions in the developed (adult) grammar. If, as predicted by the linguistic accounts, features are present in the developed and developing grammar, we would In theory, it would be possible to interpret the linguistic hypothesis as stating that phonetics is irrelevant rather than invariant. However, at least in the literature reviewed here (prominent examples being Chomsky and Halle, 1968; Halle, 1964; Hale, Kissock and Reiss, 2006), it is stated that phonetic realization is invariant. For instance, Chomsky and Halle (1968) define continuant sounds as those produced without complete blockage of airflow in the vocal tract. That is a very specific phonetic definition, and thus entails that phonetics is relevant for determining the featural category of sounds. A purely linguistic (though admittedly coarse and insufficient) definition might run as follows: Continuant sounds are those that, in some phonological phenomena, pattern with vowels. 22

28 expect psycholinguistic experiments to show evidence of their activity; if features are emergent, their effects might not be evident in infancy but should be present in adulthood. However, it will be argued that results from previous experiments do not provide conclusive evidence mainly because they have not tested abstract features, but rather phonetic classes. Finally, the experiments reported in chapter 5 attempts to more directly address the question of whether abstract phonological features play a role in learning.



This chapter presents a review of literature that addresses three main questions. First, whether there is evidence for a featural level in infants’ perception, memory and learning. Second, if such a level is found, there remains the need to establish the nature of these features, whether they are concrete and tied to the phonetic realization or if they are more abstract, closer to the phonological definition. Finally, I discuss results that suggest that phonetic naturalness of a given distribution does not affect learning in infancy. In section 3.1 I argue that methodological as well as conceptual problems lead me to abandon data from children’s production as a possible source of answers to the questions addressed here. The literature on infants’ perception and memory, which is summarized in section 3.2, either plainly supports a phonetic features level or can be reanalyzed in terms of phonetic categories. On the other hand, positive evidence in favor of features comes mainly from learning studies where a feature-based condition was compared with a segment-based one and (better) learning was found in the first one (at least for young infants). However, the feature-based condition used segments that could be related to each other at the phonetic level (for example, voiceless fricatives, all of which share distinct acoustic-perceptual characteristics), so that their results need not resort to an abstract level of phonology. Finally, infant learners do not seem to be biased to learn phonetically-grounded patterns. These learning studies are reported in section 3.3.


Development of production

There is no little disagreement as to whether children’s productions are relevant to phonology or whether their production would be best studied from a different perspective. As it has been frequently noted (for example, Camarata & Gandour, 1984; Fey & Gandour, 1981; Vihman, 1996), that children’s output is strikingly systematic. For

30 example, Camarata and Gandour (1984) describe the case of an English-learning child, G.G. (who was two years and ten months old at the beginning of treatment). G.G. used alveolar stops before high vowels and velar stops before non-high vowels, treating these two segments as allophones of the same phoneme. A variety of such cases, where a pair of segments that are contrastive in the target language are produced in complementary distribution, are addressed in Dinnsen and Chin (1995). These authors note that, many times, remediation is immediately followed by correction in the whole of the lexicon, which would only be possible if the child had the ‘right’ underlying representation to begin with. One such example is the one described by Bedore, Leonard and Gandour (1994). The child in this study produced all sibilants as clicks. Upon starting treatment for words beginning with /s/, the child not only started producing the trained sound correctly, but all other sibilants as well, and in all relevant positions. It would seem, then, that this child’s productions, in a sense, were governed by constraints which arose from her own developing system, for which reason they are both systematic and distinct from the characteristics of the ambient language. From this point of view, therefore, developing systems contribute to the study of phonology. With regard to features, many researchers (Rice & Avery, 1995; Beers, 1996; Ingram, 1989) argue that children’s output testifies to those features present in their phonology. In Jakobson’s (1968) seminal work, all children are said to display the same pattern of development, regardless of their input language and without individual variation. Their initial productions always represent the most basic contrast between consonants and vowels. Later on, further additions to the repertoire would show how the child activates a further feature as encoded in a contrast. For example, a child producing [pa] and [ta] would have acquired the features [consonant] and [place]. Although more recent work abandons the assumption that the order of acquisition will be the same across languages and individuals, the method of establishing which features are active in a given child’s phonology remains the same for many researchers in this line of work. To take an example, Beers (1996) attempts to determine which are the variations possible for children learning Dutch in terms of order of acquisition in features, for which reason he transcribed spontaneous utterances of a large number of children subdivided

31 into age groups and assumed a segment to have been acquired if it were produced correctly 75% of the time by a given age group and half its members used it correctly this percentage of times. By analyzing the segments which were present and comparing them to a feature hierarchy, he determined which features would have to underlie the contrasts established between the segments that had been acquired. Three types of criticisms have been leveled against this line of work. The first one is conceptual, and it concerns the fact that such research assumes either that a child who cannot produce a given contrast is also unable to perceive it, or that one set of features is present in perception and a different one active in production. The second is methodological, and relates to the fact that a considerable proportion of this research relies on analyses of transcriptions. Taking the child’s production at ear value might be problematic, in view of a growing body of literature that suggests that children might be producing a contrast that the transcriber cannot perceive, a phenomenon called covert contrast (for example, Faber & Best, 1994, cited in Hale & Reiss, 1998; Hewlett & Waters, 2004). Thus, analyses of transcriptions of children’s production risk assuming the absence of a contrast that is in fact present. Finally, it has been noted that the motor apparatus mediates the grammar’s command to produce a contrast and the production itself. Accordingly, a child may fail to produce a target whose underlying representation is actually well formed but for which the motor routines (or, conceivably, the vocal tract itself) are not fully developed. For example, the size tongue body relative to that of the vocal tract is much larger in young children than in adults, which appears to have consequences for the ease of production of certain sounds over others (Inkelas & Rose, 2003) It would appear wise, therefore, not to resort to young children’s output for confirmation on the featural level present in their phonological systems (Hale & Reiss, 2003; Maye, 2000) since evidence garnered from this domain is, at the very least, controversial.


Infants’ and children’s perception and memory

The present section will address the question of whether there is evidence for features in the way children and infants perceive and remember spoken language. In all, little

32 support for abstract features can be drawn from perceptual and memory experiments with infants and children. In an early confusability study, Graham and House (1971) tested 30 English speaking girls between the ages of 3 and 4.5 years by producing pairs of syllables such that 16 English consonants were contrasted with each other. The pattern of errors, such as responses of ‘same’ when the pair was actually different, was judged to be similar to that exhibited by adults in the classical confusability study by Miller and Nicely (1955). However, the phonological features proposed by Chomsky and Halle (1968) do not seem to fit this pattern. Interestingly, neither do phonetic-based features such as those espoused by Wicklegren (1965, 1966; see discussion in section 4.2 below). In other words, although errors seem to be more frequent when the pair of sounds differed in one or two features, pure number of features shared was not predictive of the pattern of errors regardless of the featural system chosen.23 The authors suggest that this might be an effect of using live presentation, for children might have been responding to uncontrolled variables. However, data from Gierut (1998) suggests a different explanation. In this article, Gierut assumes that children’s confusion patterns reveal their knowledge of distinctive features as phonological categories. In this paper Gierut attempts to test the effects of treatment on the perceptive and productive knowledge of six language-impaired children between 3 and 6 years of age. The focus of the study was the voiceless interdental fricative. Children were asked to put a chip on a bank every time this sound was played. Other sounds played were /s,t,p/ among others. Of interest to us, the interdental voiceless fricative differs from /s/ in one feature, distributedness; from /t/ in two, distributedness and continuancy; and from /p/ in three, adding place to the other two. We would thus expect that children would confuse the interdental voiceless fricative with /s/ more often than with the latter two. Yet, inspection of the data on the six subjects in Gierut (1998, Interestingly, a phonological theory incorporating markedness would not fare any better. In particular, Graham and House (1971)’s analyses show that the highest percentage of errors involved, for instance, the feature coronal, which would be unmarked, and thus easier. Feature geometry does not solve the problem either, since continuant, which is a higher class node, was among the features with more errors, while strident, a dependent of continuant, was amongst the lowest. Finally, although the authors do not remark upon it, the distribution of errors was not the same across classes. For example, labial and coronal fricatives were highly confusable, while labial and coronal nasals were not. 23

33 Appendix) reveals a panorama not unlike the one Graham and House (1971) found, such that the difference between times that /s/ and /p/ were judged to be the interdental fricative was not significantly different and /p/ was confused with the interdental fricative significantly more often than /t/ was (in fact, only one child in only one session confused the interdental fricative with /t/ more than with /p/). These patterns of results actually speak against at least one of the two assumptions underlying these experiments: that children’s responses reflects how confusable two sounds are, or else that these confusion matrices echo their phonological knowledge. Of long-standing interest, the phenomenon of categorical perception was originally thought to provide evidence in favor of a language-specific device. Categorical perception is a phenomenon in which ease of identification predicts discrimination and peaks in discrimination occur at category boundaries. In other words, a listener’s ability to discriminate between two sounds depends on the listener identifying them as belonging to different categories. For voicing of stops, for example, Lisker and Abramson (1964) found that speakers of several languages produced segments differing in voicing and aspiration with Voice Onset Times (VOT) that clustered in two or three areas, depending on how many categories each language had. Thai speakers, whose language had three categories, produced their voiced stops with voicing lead (in which vocal fold vibration preceded release), the voiceless ones with a short lag (in which the vocal folds begin vibrating shortly after release of the obstruction), and the voiceless aspirated stops with a long lag. English speakers, on the other hand, would produce voiced sounds with a short lag and voiceless sounds with a long lag. Furthermore, when speakers were asked to identify and discriminate stops along a VOT continuum, they showed better discrimination of items that spanned a boundary between categories in their language than those that had a VOT value within a category (Abramson & Lisker, 1968). Categorical perception, however, does not seem to be determined by the phonology, that is, it does not occur if and only if a feature is active in a given language. First, it has been found that both adults (Pisoni & Lazarus, 1974) and infants (McMurray & Aslin, 2005) are in fact much more able to discriminate within a category than earlier results

34 predicted.24 Secondly, some of these experiments have been replicated with both nonspeech sounds (e.g., Aslin & Pisoni, 1980) and non-human animals (e.g., Chinchillas, Kuhl, 1981), showing that this phenomenon is not linguistically-based. Third, categorical perception does not seem a necessary condition for phonologically relevant discrimination and categorization as it occurs only for some contrasts, with notably vowels being discriminated quasi-continuously (e.g. Fry, Abramson, Eimas & Liberman, 1962). Furthermore, some featural contrasts in sign languages do not seem to be perceived categorically (Newport, 1982; although see Emmorey, McCullough & Brentari, 2003), suggesting that languages need not base their contrasts solely on perceptual discontinuities. A second interpretation of categorical perception hypothesizes that speech is simply making use of natural discontinuities arising from the interacting factors of production and perception. In his Quantal Theory of Speech Perception (Stevens, 1972, 1989), notes that, given the shape of the vocal tract, variation in the percept is not monotonic with respect to variation in production, with some variation in production causing very little change in perception, and other causing great changes in the percept. For example, Spanish (among many other languages) does not place its voiced-voiceless boundary so as to coincide with the short lag/long lag boundary, region of natural sensibility. This suggests that not everything is given by the auditory system, but, in fact, languages can place their precise boundaries relatively freely (see Ladefoged & Cho, 2001, for an example from Voice Onset Time). Other examples of contrasts that are not as easily discriminated are voicing, place of articulation and stridency in fricatives (Eilers, Gavin & Wilson, 1979, 1980; Aslin & Pisoni, 1980; Juszcyk 1997:53 and references therein). Indeed, the perception of these fricative contrasts develops well into the school years (Hazan & Barrett, 2000). Further support for the hypothesis that the developmental trend in infancy goes from easily perceivable contrasts to those that are less perceptually distinct comes from the fact that vowel categories seem to be established before consonantal ones. Kuhl, Williams,

In fact, sub-phonemic variation has been found to affect lexical decision tasks (cf. Utmal, Blumstein and Burton, 2000). 24

35 Lacerda, Stevens, and Lindblom (1992) found that adults exhibit a prototype effect in vowels, such that the best exemplars of a native vowel acts as a ‘magnet’, by virtue of which other vowel sounds in its vicinity are perceived as more similar to the prototype than would be expected as predicted by acoustic space alone. They found the perception by six-month-old Swedish and English learning infants to coincide with that of adults when tested on a round front vowel, which was a good prototype for Swedish adults, and a front unrounded vowel, considered a good exemplar by English adults, while infants do not reliably demonstrate native consonant categories until closer to their first birthday.25 In fact, although eight-month-old infants have been found to be beginning to shift their attention away from non-native consonantal categories (Werker & Tees, 1984), they do not seem to lose sensibility to all non-native contrasts at the same time (Werker & Tees, 1984; Werker & Lalonde, 1988). For example, Werker and Tees (1984) report that English-learning eight- to ten-month-olds found a contrast between uvular and velar ejectives quite difficult to perceive, whereas they could still distinguish between retroflex and coronal stops.26 By twelve months of age, however, infants seemed also less capable to perceive the latter contrast. Finally, some especially salient (perhaps perceived as extralinguistic) contrasts seem to be perceived even by adults. Best, McRoberts and Sithole (1988) established that English adults could easily distinguish between several clicks, used in several African languages but absent from English. Further support for such a signal-based approach comes from recent experiments by Maye, Werker, and Gerken (2002). In this experiment, infants were exposed to a continuum with one or two peaks in the frequency distribution of its members. Infants thus trained have been found to have developed one and two phonological categories,

Kuhl (1991) further demonstrated that Rhesus monkeys did not display a perceptual magnet effect, which led her to speculate that this effect was a human-specific mechanism for speech perception. However, the perceptual magnet theory does not work so well with consonants, and thus it wouldn’t be a necessary condition for speech sound categories. 26 Anderson, Morgan and White (2003) argue that there are also frequency effects that might impact on the order in which sensitivity to non-native contrasts is lost. They found that English learning eight-month-olds’ performance on a non-native velar contrast was better than that on a non-native coronal contrast. They reason that the coronal category would be established before the velar one because coronals are more frequent than velars in English, a conclusion that is most compatible with an input-based theory of the formation of categories, 25

36 respectively. The phonetic feature manipulated in these experiments was voicing, and Maye and Weiss (2003) further show that infants trained on one place of articulation are able to generalize to a different place of articulation. To conclude this section, results from perceptual training experiments with infants, as well as the way in which perception to contrasts develops, strongly suggest that not all contrasts are in place from the youngest age. On the contrary, phonetic categories seem to develop over time, constrained by the development of the auditory system and possibly guided by the distribution of speech sounds present in the language to which the learner is exposed. Crucially, however, the evidence in favor of features as a level of representation is scanty.


Learnability, complexity and groundedness

The fact that experiments involving perception and memory have not been able to establish a featural level of representation might be due to the methodologies employed. However, indirect evidence in favor of features can be found in learning experiments with infant subjects. This featural level, however, need not be abstract in the sense discussed in chapter 1, that is, that abstracts away from the particular contextual realization and that encompasses several types of sounds. Hillenbrand (1983) used a habituation-dishabituation paradigm to test discrimination across categories in six-month-old infants. Infants were habituated with a series of syllables uttered by several speakers, and were expected to dishabituate when a syllable outside the trained series was presented. There were two conditions, a phonetic and a non-phonetic condition. Infants in the phonetic condition were habituated with syllables in which all the onsets were voiced stops and they dishabituated when presented with syllables beginning with nasals. Infants in the non-phonetic condition heard a group of syllables such as /ba, !a/ and were then presented with /na, ga/. Crucially, infants in the non-phonetic condition did not detect the addition of new syllables, while infants in the phonetic condition were able to do so. This presents a first indication that phonetic classes are relevant for infants this age, although the stimuli do not allow any conclusions as to the abstractness of the features present in infants’ representations of the pattern.

37 A second indication comes from a more recent study by Saffran and Thiessen (2003), who use an artificial language to test whether eight-month-old infants would be able to discover some regularities in the phonotactic pattern of pseudo-words after a short exposure. The results of their second and third experiments are most relevant to the question at hand. In their second experiment, infants were exposed to a phonotactic regularity concerning a phonetic class of sounds: for one condition, all initial stops were voiced and all final ones voiceless, whereas the second group of infants listened to the opposite pattern. The constraint in the third experiment concerned arbitrary sets of sounds, such that syllables could begin with /p/, /d/ or /k/, and would end with /b/, /t/, or /g/. Unlike the second experiment, infants showed no difference between test items. Hence, eight-month-olds were not able to learn a constraint on an arbitrary group segment, but could do so when a phonetic or phonological class was affected. The authors concluded that perhaps the fact that these patterns were arbitrary or inconsistent made the rule harder to learn, adding that phonotactic constraints only rarely ban only one segment and not a class.27 Nevertheless, rules involving arbitrary sets of sounds were learned by 16.5 month olds in Chambers, Onishi and Fisher (2003). In this experiment, one group of toddler heard C1VC2 words where C1 could only be /b, k, m, t, f/ and C2 only /p, g, n, t!, s/. The other group listened to words of the form C2VC1. Each group was tested on both templates, and they both exhibited a preference for the pattern that was illegal judging from the pattern to which they had been exposed. For this reason, the authors conclude that infants, like the adults tested in Onishi, Chambers and Fisher (2002), are able to represent phonotactic constraints in terms of individual segments. In contrast, some support for the idea that infants can learn feature-based distributions comes from a set of experiments which intends to challenge the view that infants are predisposed to learn phonetically grounded patterns, that is, patterns that are favored by This is not necessarily true. English has numerous constraints on single segments; for example, the velar nasal only occurs word-finally while /h/ only word-initially. However, Saffran and Thiessen’s results do indicate that infants in the second experiment cannot have been learning a set of constraints on single sounds, that is, they did not learn that /p/ could occur word-initially, /t/ word-initially, /k/ word-initially, since, if they had, they should have found the constraints in experiment 3 equally easy to learn. 27

38 the physics of speech. Seidl and Buckley (2005) show not only that phonetic groundedness has no discernable effect on infant phonotactic learning, but also that this learning can indeed be expressed in terms of features. In these experiments, nine-month-olds were familiarized with either a grounded or an ungrounded pattern. In one of the grounded patterns, fricatives occurred only between vowels and stops at the beginning of the word, whereas the corresponding arbitrary pattern restricted stops to intervocalic positions, a rule both unattested and arbitrary from the point of view of ease of articulation, though not from the point of view of the segments grouped. Regardless of this, infants in both groups learned the familiar patterns, as demonstrated by the fact that they oriented longer to test trials consisting of illegal items (for example, in the arbitrary condition, where fricatives occurred between vowels). Furthermore, test items were constructed using both familiar and novel segments in the phonetic classes of interest. Thus, infants in the arbitrary condition where familiarized with /s,z/ in word-initial position and tested with pseudo-words beginning with these segments or with /f,v/. In other words, it would seem that infants encoded the experiment-wide constraints in terms general enough not to treat the new segments as illegal. However, since novel and trained segments were interspersed in test trials, true generalization could not be assessed.


Conclusions on infants’ representation of speech sounds

The evidence in favor of features in infancy is unclear. Most results can be reinterpreted in terms of other levels of representation, like the syllable or the segment. For example, Pierrehumbert (2001) proposes that language acquisition entails the construction of several kinds of knowledge. In particular, parametric phonetics is a quantitative map of the acoustic and articulatory space, over which the phonetic encoding involves low-level, concrete categorization, such that the minimal units are positional variants of phonemes. Werker and colleagues (Werker & Fennell, 2004; Werker & Pegg, 1992) propose that there four factors to speech perception: the first one is acoustic; second, younger infants exhibit a broad-based phonetic sensitivity; third, one-year-olds exhibit language-specific perception; and finally, older children and adults, who have developed these categories based on minimal contrasts present in the language, would exhibit a phonemic-based

39 perception. These factors need not be taken as stages, since adults still show some sensitivity the acoustic and phonetic levels when probed with an adequate task (for example, in discrimination experiments such as those mentioned in reference to categorical perception, a very short interstimulus interval allows participants to discriminate tokens within a single category). Both Pierrehumbert and Werker’s models assume that infants tune into featural categories, while toddler and adults pay attention to segments or phonemes. A different account is put forward by Juszcyk (1992, 1994, 1997), who notices that it has not been established that it is the level of phonological features that infants are using to effect discriminations. When a voicing contrast is tested, for example, that contrast is necessarily embedded in a syllable, and there is some evidence that infants encode syllables and not features or segments by themselves (Jusczyk, 1997). For instance, a series of experiments on two- to three-month-old infants (Juszcyk, Jusczyk, Kennedy, Schomberg, & Koenig, 1995; Houston, Jusczyk & Jusczyk, 2003) demonstrated that a repeated syllable was encoded and remembered, whereas an equivalent repetition of segments was not. Thus, one group of infants heard [ba.mit], [ba.zi], and so on, whereas another group heard [ma.bit], [la.bo], where the segments /a/ and /b/ are also repeated. However, only the first group noticed the inclusion of a word like [pa.mal], which changed in just one feature for both conditions. They therefore concluded that the repeated syllable, not the repeated segment, was encoded by the infants. In Jusczyk, Goodman and Baumann (1999) this problem was explored with 9-month-olds in a series of seven experiments. In these experiments infants listened to a list of syllables that shared a common aspect and their performance was compared to a control group that listened to unrelated syllables. Of interest to us, infants preferred listening to lists of syllables sharing manner of articulation and voicing over unrelated lists, but not to lists sharing place of articulation.28 The authors speculate that place of articulation is less

Another interesting finding was that infants responded differentially only to lists where the first segment exhibited the regularity. That is, infants preferred listening to lists of syllables whose onset shared a characteristic (was the same segment, or shared manner of articulation and voicing), but did not display such a preference when the regularity was found in the coda or when the vowel was repeated. 28

40 salient for infants, or perhaps it is less robustly represented in the speech signal. In synthesis, the results from these experiments are at odds with the linguistic-features hypotheses, but are highly compatible with a perception-based theory of features. However, as Jusczyk et al. point out, these results cannot be taken as evidence in favor of a featural level of representation. In fact, infants might have been responding to the similarity in the acoustic characteristics of the first portion of the syllables, without needing to access a separate phonetic level. It would thus seem that most psycholinguistic models have no place for features, apart from models which are based on gestural theories (e.g. Direct Realism Theory, Goldstein & Fowler, 2003; the Motor Theory, Liberman & Mattingly, 1985; Perceptual Assimilation Model, Best, 1994). Furthermore, even these gestural theories would have difficulty in explaining infant perception based on gestures, since infants’ production is certainly limited.29 An alternative is to assume that infants are endowed with an innate representation of these motor scores, in which case these theories make the same predictions as the linguistic-features hypothesis. In conclusion, given that scarcity of evidence cannot be taken as counterevidence, most experiments reviewed in this chapter have not provided a definitive answer to the question of whether infants access a featural level of representation and what kind of features this level holds. However, the experiments reported in Seidl and Buckley (2005) suggest that further research might shed more light on the matter.

It is compelling to hypothesize that the Direct Realism Theory would predict an advantage of place of articulation as a feature, given that this feature, unlike voicing, can make use of visual cues as well as auditory cues. The fact that infants in Jusczyk, Goodman and Baumann (1999) did not respond to place of articulation, but did exhibit a preference for manner and voicing, might run counter to this prediction. It would certainly be of interest to replicate Jusczyk et al.’s experiment incorporating visual cues. 29



This chapter summarizes experimental results that bear on the question of whether features are psychologically real. It will be shown that the evidence for features acting on either perception or production (or both) is not particularly compelling. On the other hand, experiments in which adult subjects are required to learn a phonological pattern seem to provide indirect evidence for features being active (Pycha, Nowak, Shin & Shosted, 2003; Wilson, 2003, in press). Further, learning tasks involving production (Wilson, in press; but see Morrison, 2005) found an effect of grounding. However, it will be argued below that previous studies have confounded groundedness and complexity (for instance in Pycha, et al., 2003), the phonology and the production system (as in Wilson, in press; and most studies on phonotactic learning, as in Warker & Dell, 2006), or drew conclusions on features based on relationships of identity or nonidentity (as in Wilson, 2003). Further, some studies whose results seem to suggest that phonotactic constraints on classes of sounds are easier than phonotactic constraints on random sets of sounds do not in truth provide evidence for abstract features, and may be explained through low-level acoustic or phonetic similarity between sounds. Finally, research on phonotactic learning through speech errors overwhelmingly argues against featural representations (with one exception discussed below), yet they only test on the same phones that speakers were trained on.


Features in production

Regardless of their status for perception, features would play a role in language through language change if they were active in speakers’ production. I will focus particularly on a recent experiment involving induced speech errors that attempts to argue for the presence of a featural level (contra most of the literature in both experimentally-induced and

42 spontaneous speech errors). However, the results of this experiment do not seem entirely convincing. Fromkin (1971), analyzing spontaneous speech errors, showed that they are not random, and seem, on the contrary, to be quite systematic, consisting mainly of substitution, deletion or exchange of one segment. Interestingly, errors tended overwhelmingly to respect the phonotactic constraints of the language they are produced in, that is, rules restricting the position of a certain phoneme or class. For example, [h] would never be pronounced at the end of a syllable in a speech error in English, where it is an illegal segment, though it might appear there in Arabic speech errors where no phonotactic constraint forbids it. More recent research (Onishi, Chambers & Fisher, 2002) has established that subjects can learn experiment-wide phonotactic constraints after brief exposure, and experimentally-induced speech errors tend to preserve these constraints as well (Dell, Reed, Adams & Meyer, 2000; Warker & Dell, 2006). However, most work on this area based the constraints on the segmental level or the gestural level. Roelofs (1999), for example, argues that a segmental level of encoding is able to explain a further regularity in speech errors like ‘glear plue sky’, where the interchanged segments share a segmental environment, the [l]. This effect occurs only when the segments in the context are exactly the same, but not when they are very similar, even differing in a single feature. On the other hand, Goldstein and Fowler (2003) review several experiments in which most speech errors can be accounted for if gestures, not features, are basic planning units. Against this backdrop, Goldrick (2004) represents an attempt to prove that features are indeed active in speech production. In order to do so, he had his participants read words on a screen, first at a comfortable speed and afterwards at a faster speed. Subjects internalize the constraints present in the words in the normal reading and are later made to produce speech errors when their reading rhythm is accelerated. In this experiment, the words presented featural, gradient (rather than categorical) constraints on labiodental fricatives, such that /f/ occurred always in the onset and /v/ half the time in the onset and the other half in the coda; in other words. Thus labiodentals as a class occurred 75% in the onset and 25% in the coda. Goldrick hypothesized that, unlike segmental constraints, featural constraints would induce violation of the target position, as would

43 gradiency. The results matched these hypotheses, and Goldrick concluded that constraints had thus been represented at the featural level. Even if one accepts these hypotheses, the conclusion does not seem straightforward. Indeed, stronger evidence for a featural constraint would arise from eliciting its effect on a novel segment (one on which the subject was not trained), given that, being trained on the same segments they were tested on, subjects might well have encoded a number of constraints, one for each segment, and the interference would have been caused by the phonetic-articulatory similarity between segments (and not through a more abstract featural level). Furthermore, a second experiment in the same paper relates how those results could not be replicated with velar stops. It seems unreasonable that constraints could operate at the featural level for labiodental fricatives but not for velar stops, suggesting that if features are at play, they are certainly not wholly abstract.


Features in perception

This section presents some evidence that can be easily accommodated in a phoneticbased view of features. Experiments involving confusability of segments, short-term memory and dichotic listening show that phonemes are not the ultimate level of representation, but something akin to phonetic features must exist. They do not, however, provide much evidence for abstract, linguistic features. Yet, in some recent papers reporting automatic brain reaction experiments, it has been claimed that phonological representation is underspecified, a finding that is consistent with an abstract view of features. In confusability experiments, subjects listened to a variety of phones masked with noise and with distorted frequencies, and the patterns of confusion between phones were taken to be an index of similarity. In general terms, these studies (e.g. Klatt, 1967; Miller & Nicely, 1955) provided some evidence that the more features two sounds hold in common, the more confusable they would be. However, sheer number of features in common did not predict how often a sound would be confused with another one, since some features seemed to be much more robust in noisy conditions than others. For example, Wang and Bilger, (1973) reanalyzed the confusion matrices of Miller and Nicely

44 (1955) as well as several other studies and concluded there was little support for features being a unified category in perception, finding that some features were well perceived both in noise and in the open (specifically, nasal, voice and round); while others not at all (strident and low). Thus, confusability experiments’ results are most compatible with features being based on the realization of the sounds, that is, phonetic and not necessarily a part of the phonology. In addition, results from short-term memory experiments are more compatible with phonetic, rather than phonological, categories. Wickelgren (1965, 1966) carried out experiments in which the participants heard lists of words (or pseudo-words) in which only the vowel (Wickelgren, 1965) or the consonant (Wickelgren, 1966) varied. The subjects first wrote down the list as they heard it and then attempted to recall the words in the order presented. The working assumption was that, if the varying element was encoded in features in short-term memory, then these features could be forgotten independently. For example, presented with /p/, the features [labial], [-voice], [continuant] etc. might be forgotten independently, such that some subjects will recall /b/ (voicing forgotten), /f/ (continuancy forgotten), and so on. The matrices of mistakes in recall were then assessed to see which system of features best fitted them. Wickelgren compares several phonetic systems against a few phonology-based ones, in particular, Chomsky and Halle (1968) and Jakobson, Fant and Halle (1963) for vowels, and Chomsky and Halle (1968) and Halle (1964) for consonants. The results suggest that the most fine-grained phonetic system fares much better than any of the phonologically-based ones, which leads the author to conclude that short-term memory is closely related to the articulatory or perceptual system but likely not to an abstract, phonological level. Similarly, some dichotic listening experiments, in which subjects are presented with a different stimulus in each ear and must identify the segments or syllables they heard, render further support to the possibility that phonetic categories are active in the developed grammar. Studdert-Kennedy and Shankweiler (1970) found that syllables were more easily identified the larger number of features their initial segment shared. StuddertKennedy, Shankweiler and Pisoni (1975) followed up on that study by varying the vowels in each pair of syllables presented, so that acoustic similarity was affected but not featural

45 similarity. Thus, for example, at one time the subject might be presented with [bu] and [pi], a pair of segments which share the feature of place of articulation but, crucially, do not have the same formant transition. Studdert-Kennedy et al. found that identification was higher when the segments shared a feature of voicing or place (the two features that varied) than when neither was shared. These results indeed support a differentiation of the auditory and the phonetic levels, given that the segments that shared the phonetic feature of place of articulation were indeed identified more accurately, despite the acoustic variability. Notice, however, that this latter level need not be phonological; in other words, if listeners were compensating for coarticulation, they need not have accessed phonological features to notice similarities in the segments presented to them. On the other hand, several neurolinguists claim to have found evidence for features in studies involving an automatic brain response to change (using MMN, mismatch negativity, or its magnetic counterpart). In this kind of study, the subject is typically not attending to the speech sounds being played, but reading a book or watching a movie. The stimuli are usually a series of sounds that are repeated or vary within a category, called frequent or ‘standards’, with a ‘deviant’ sound occasionally being presented. The electric or magnetic changes in brain activity are measured and in this event-related design, one can predict that if the difference between standard and deviant is detected, then the average of the evoked response to the deviant sounds will diverge from the one elicited by the frequent sounds within a certain latency range. This method has shown that a change from the /d/ category to the /t/ category is thus detected, even though the tokens in each category were acoustically varied (Phillips et al., 2000). Interestingly, similar results were elicited when the standard category was voiced stops, so that /bae, dae, gae/ were used as ‘frequent’ sounds and /pae, tae, kae/ as deviants. Yet, the category of voiced stops can nevertheless be phonetically based, since regardless of place of articulation, the voicing distinction always uses the same set of perceptual cues (Lisker, 1978). As a consequence, it is not necessary to resort to abstract phonological features to explain this result. Similarly, Eulitz and Lahiri (2004) claim to have found evidence for underspecification in vowels. Their stimuli comprised three German vowels, the front unrounded [e], the front

46 rounded [ø] and the back rounded [o]. In the featural terms the authors use, [e] and [ø] will be unmarked for place, since they are coronal, and [e] is unmarked for roundness. On the other hand, if [e] is the standard, a change to [ø] should result in a conflict, given that the novel vowel is specified for roundness. Notice further that the acoustic distance in both changes is the same, so if the evoked response is to acoustic dissimilarity, results ought to be the same for both exchanges. After playing one of the vowels for some time, they presented the listeners with a different vowel. The authors found that the mismatch negativity was of higher magnitude and peaked earlier in conflict situations than in nonconflict ones. They conclude that there is a mental representation of place-of-articulation features, and that these representations may be underspecified. Further, Dupoux, Fushimi, Kakehi, & Mehler (1999) argue that this methodology may target the phonological representation, between the acoustics and the lexical access.30 However, further research suggests a reinterpretation of these results. For example, Mitterer, Csepe, Honbolygo, and Blomert (2006) summarize other studies that show that the lack of response found in experiments by Lahiri and colleagues may have to do with compensation patterns rather than with phonological representations. Whereas sequences like ‘lea[m#b]acon’ did not trigger a response, mismatching sequences like ‘lea[!#b]acon’ did. If the place of articulation of the nasal was truly a symbol that is absent from the representation, both situations should have been processed the same. Mitterer et al. suggest that Lahiri’s model speaks about the lexical or the perceptual levels, but not the phonological one. The experiments reported in Mitterer et al. (2006) provide evidence of compensation for assimilatory rules not present in the native language of the subjects, possibly because assimilation is perception-oriented.31 In short, even strong evidence for features seems to support phonetic similarity rather than phonological features. However, absence of evidence is not evidence of absence: it Dupoux et al’s study focuses on syllabic constituency. French and Japanese subjects in an oddball paradigm are exposed to a sequence in which ‘ebzo’ is the standard and ‘ebuzo’ the deviant (or the reverse). Only French subjects show a larger evoked response. Japanese subjects would ‘fill in’ the vowel necessary for ‘ebzo’ to be a well-formed word, according to Japanese phonotactics, so that Japanese listeners would be unable to detect this change, even below the level of consciousness. 31 It should be noted, however, that the latency of the MMN in Mitterer et al.’s was near 396 ms, which is too late according to the traditional definition of this component. 30

47 may still be the case that these experiments tap only into perception at a phonetic level, but not the phonological system.


Phonological learning in adults

In this section, I will focus on experiments in which subjects are required to learn a pattern or variation involving features. Some of these experiments also provide evidence towards establishing whether phonetic groundedness (such as knowledge of ease of production or perception) is relevant for adult learning. As pointed out above, the linguistic-features hypotheses predict that groundedness has no effect, provided phonological complexity is matched. On the other hand, phonetically-driven phonology accounts predict that grounded distributions will be easier to learn than arbitrary ones. Some attention will be paid to the particular procedure being used, since it is this aspect which might account for differential results. Studies on laboratory training can be classified in two groups: those that require participants to produce a verbal response, and those who only ask them to classify the stimuli in testing using button presses. It is possible that those measuring subjects’ production find a different pattern of results as compared to perception-only studies only by virtue of engaging the production system (Moreton, 2005). I will begin by summarizing three studies that did not properly control for complexity. In the first one, Wilson (2003) exposed subjects to a pattern in which trisyllabic pseudowords ended either in ‘la’ or ‘na’. In the ‘assimilation’ condition, ‘na’ appeared only if the second syllable of the word already exhibited a nasal consonant, whereas in the ‘dissimilation’ condition ‘la’ occurred in that same environment. These two conditions were matched to a ‘random’ or ungrounded one, such that, for example, the ‘random’ distribution matched with the assimilation condition restricted ‘na’ to pseudo-words where the second syllable contained a velar stop. After hearing 20 familiarization items twice, participants heard an item and had to decide whether they thought they had heard that item in the familiarization or not. Crucially, all test items were novel, but some (the grammatical items) followed the same pattern as the familiarization items, and others did not. If participants encoded the distributions to which they had been exposed, they should

48 be more likely to (mistakenly) guess that they have heard an item when it is grammatical than when it is not. Wilson (2003) found an equal advantage of both the assimilation and dissimilation patterns over the random ones, arguing that this provided evidence in favor of a bias in favor of phonetically-grounded distributions. However, it may be argued that the rules were not matched in complexity, since, within each paradigm, the allomorph of the last syllable was determined by the identity (in the assimilation rule) or non-identity (in the dissimilation one) of the initial segment of the preceding syllable. Second, Pycha, Nowak, Shin and Shosted (2003) asked their subjects to learn rules involving alternations in an artificial language. In the first condition, the alternation was a phonetically-grounded vowel harmony rule, in which the suffix vowel agreed with the stem vowels in backness. Subjects in the second condition learned a ‘disharmony’ rule, equally simple in formal terms: the suffix exhibited a back vowel when the stem had a front vowel, and a front one when preceded by a back vowel in the stem. This second condition was therefore matched to the first in terms of complexity but, unlike the first one, is not phonetically grounded. The third condition was arbitrary, with a back vowel in the suffix when the stem bore, for example, a high front tense vowel or a back high lax one, but if the vowel in the stem was high front lax or high back tense then the vowel in the suffix had to be front. Hence, subjects in the complex condition could not describe the segments participating in the alternation in terms of a class, probably having to memorize the suffix corresponding to each segment. The results clearly show an advantage for the first two, simple, conditions over the third one, although the phonetically grounded rule was not significantly easier than the ungrounded one. Using a similar design, Morrison (2005) arrives at a comparable conclusion. In this study, some subjects had to learn one pattern that was grounded on ease of production and others a pattern that was arbitrary from this point of view. The procedure involved producing a set of forms and receiving feedback when the form was not appropriate to the rule being learned. For the grounded pattern, a bilabial fricative occurred between two low vowels, and a bilabial stop elsewhere. This rule is grounded in the sense that in order to produce the full closure a stop requires that the jaw moves the most between two low vowels. The arbitrary rule restricted the fricative allophone to the environment

49 between a low vowel and a mid one, having a bilabial stop elsewhere. Although in general subjects found both alternations hard to learn, the subjects in the phonetically grounded group fared significantly better than those in the ungrounded group. However, the second condition is also more complex than the first one, since the first one involves an alternation between two identical vowels, and the second an alternation between two different vowels. Interestingly, when complexity was controlled for, the grounded pattern was not easier than the ungrounded one. Finally, Wilson (in press) proposes a specific model to account for how phonetic factors may affect phonology: the substantively biased framework. In this model, phonetic factors hinder learning of rules that are not phonetically grounded, but do not impede it, whereas phonetic factors aid learning of grounded rules. In order to test this prediction (namely that learning of phonological rules would be biased by phonetic factors), Wilson taught his subjects a language game in which stops were palatalized. In these experiments, subjects would sit in front of a computer which guided them through the study. In the training phase, the computer displayed the message ‘I say’, and then play the ‘original’ (unpalatalized) pseudo-word. Then, it would display the message ‘you say’ and play the palatalized version of the pseudo-word. Crucially, the training stimuli were organized so as to train subjects on some environment and targets and then test their generalization to untrained environments or targets. For example, in two conditions, some subjects were trained only with stops followed by [e] and others heard only [i] as environment in which the palatalization occurred. During the training, subject did not hear words in which the stop that was palatalized was followed by the vowel used in the other condition. The variable of interest, therefore, was to see whether the pattern of generalization (to the untrained vowel) followed cross-linguistic patterns: palatalization before the front mid vowel is rarer than that before front high vowel, and palatalization before [e] implies palatalization before [i] but not viceversa. In the testing phase, the displays remained unchanged, and the computer played the unpalatalized version, whereas subjects had to produce the ‘game’ word. Results confirmed the hypothesis that the extension of palatalization would follow crosslinguistic patterns, so that the tendency in subjects trained on [e] to palatalize before [i]

50 was more robust than the tendency to palatalize before [e] by subjects trained with [i].32 These results constitute an indication that adults’ production in laboratory training is consistent with cross-linguistic generalizations and exhibits some effect of phonetic factors. Further, they suggest that, under some conditions, adults seem to be able to generalize to new segments, which would be consistent with features being active in the developed grammar. And yet, they would also be consistent with features being gestural. A crucial piece of evidence comes from two almost identical studies reported in Peperkamp, Skoruppa, and Dupoux (2006) and Peperkamp and Dupoux (in press). The training phase across these two studies consisted in learning ‘numerals’ in a made-up language, so that subjects were shown e.g., pictures of animals and short phrases describing them. The phrases evidenced allophonic variation; for example, in the ‘natural’ condition stops were voiced between vowels, and in the ‘unnatural’ condition the intervocalic allophones were completely unmotivated (e.g., the allophone of /k/ between vowels was /v/). For example, ‘two rabbits’ in the ‘natural’ language was ‘na bevi’, while the word ‘rabbit’ on its own was ‘pevi’. The descriptions were presented graphically and not orally. The two studies differ, however, in the testing phase. In Peperkamp and Dupoux (in press) subjects were presented with a phrase and had to match to one of two pictures. With this test, subjects did not generalize to other segments within the same class, nor did they find it more difficult to learn the arbitrary, ‘unnatural’ rule. In order to account for this latter pattern of results, the authors propose that subjects might have learned the distributions based on orthography. However, the results of Peperkamp, Skoruppa and Dupoux (2006) argue against this possibility. In this study, the test phase consisted of being shown a picture and having to produce the correct phrase. Participants in the ‘natural’ condition of this study showed better learning than those in the unnatural condition. Peperkamp et al. interpret this different result in terms of resources: picture matching would be an easier task than producing the phrase, and thus all participants would be at ceiling. Nevertheless, the better learning found in the ‘natural’ condition in

These pattern of results can also be explained in a purely phonological theory that incorporates markedness. Specifically, [e] in English is more marked than [i], having a larger number of features, and a rule that affects a more marked segment is liable to affect the less marked one within the natural class. 32

51 the more stringent, production, task cannot be attributed solely to groundedness, since the ‘unnatural’ condition was not only ungrounded but also more complex. Some support for features, however, comes from Endress and Mehler (under review). This paper focuses on perceptual primitives, structures that are particularly salient and thus more amenable to being learned. Endress and Mehler evaluate the hypothesis that phonology acts as a symbolic computation system, whereby input is processed simply in terms of symbols. They compared data gathered from subjects in two conditions, one in which the phonotactic pattern appeared in the word edges and the other where the pattern appeared in the middle of the word. Specifically, participants in the first condition had to learn a phonotactic regularity restricting a set of arbitrary elements to the onset of the first syllable and the coda of the second in words of the shape Cvc.cvC (where ‘C’ stands for a consonant within a constrained group, and ‘c’ and ‘v’ are filler consonants and vowels). In the second condition, these consonants could only occur in the coda of the first syllable and onset of the second (cvC.Cvc). In other words, if phonology operates as a purely symbolic system, word edges should not be more salient than word middles (when metrical conditions are controlled for). After a short familiarization with 36 ‘Martian’ words, subjects had to choose which of a pair of minimally different novel words sounded Martian. Subjects in the word-middle condition performed at chance, which was significantly worse than those learning the same constraint on word edges. In a follow-up experiment, the authors found the performance at word-middles to improve dramatically if the phonotactics constrained a natural manner class (so that, for example, words must be of the form CVF.SVC, with a fricative in the coda of the first syllable, and a stop as onset of the second). It appears, then, that with a very short familiarization the task of learning a constraint on a segment (in certain positions) is indeed more difficult than encoding a regularity in terms of classes. This, in turn, implies that classes of sounds are easier to encode, although it does not presuppose that they are encoded in terms of features. That is, subjects in the latter group may have fared better because of a perceptual salience of classes, where it is only necessary to attend to simple cues (say, the presence of frication in the coda of the first syllable) and resources are liberated for submitting this pattern to memory. Likewise, even if features are assumed to underlie this

52 encoding, the question of whether they are linguistic or phonetic shall remain, given that, once again, the classes chosen were rather concrete (voiceless fricatives and voiceless stops). In summary, although there seems to be some evidence of phonetic groundedness affecting learning when the task involves production (Wilson, to appear), most experimental results become difficult to interpret given that conditions should be matched in terms of perception as well as complexity. Further, while it appears that constraints on phonetically related groups of sounds are easier to encode (at least in certain positions; Endress & Mehler, under review), it has not been proven that abstract features underlie these groupings.


Conclusions on adults’ representations of speech sounds

In this chapter, we have reviewed the results of a number of experiments that attempt to answer the question of whether linguistic features are relevant in adult perception and production, short-term memory and learning. Positive evidence seems to be limited to the results of some neurolinguistic experiments in perception, and in production to the arguable conclusions of one paper documenting experimentally-induced speech errors. Finally, the results from the short-term memory experiments summarized are best accommodated by phonetic-features. The strongest support for perceptual categories that can be identified as features comes from neurolinguistic studies. However, in a review of the literature concerning findings of speech categories in electrophysiology, Phillips (2001) notes that this kind of experiments have only proved that such categories display one of the characteristics of abstract, phonological features, that is, that variation within a category is irrelevant. These studies have not yet shown, for example, that these features are able to group sounds that are phonetically quite distinct. In the experiment by Phillips et al. (2000) summarized in section 4.2, a response was elicited when the standard was a voiced stop and the deviant a voiceless one, but this experiment would have to be expanded to show that the same results would be obtained if the standards were to include voiced fricatives and the

53 deviants, voiceless fricatives, in spite of the fact that voicing in fricatives uses different perceptual cues from those used in stops. Finally, laboratory learning of phonological patterns provides evidence for phonetic features, but, due to the stimuli chosen, they cannot decide on the question of whether language-users were accessing abstract or only phonetic features. In provisory conclusion, then, it would appear that the diachronic view of features is not supported insofar as some evidence of features was found both in infants and in adults (Saffran & Thiessen, 2003; Seidl & Buckley, 2005; Endress & Mehler, under review; Wilson, in press). The fact that the features involved in these experiments were rather concrete cannot be taken to imply, however, that abstract features are nonexistent, since the tasks and stimuli chosen may not have been sensitive to this particular level. The experiments reported in the following chapter attempt to address this question directly.



In this chapter, I report the results of two experiments that specifically assessed the question of whether adults and infants are able to learn phonotactic constraints affecting an abstract class of segments after a short exposure. At least when production is not involved, subjects seem to restrict phonological learning to the particular segments they were trained on. However, the stimuli in these experiments consisted of a small set of words, focusing on an equally small number of segments. Furthermore, the segments involved often could not be grouped into a class, thus forcing a narrow representation of the constraint. Experiment 1 addresses these problems. In this experiment, adult participants were exposed to a phonotactic pattern involving a class of sounds that could only be expressed in abstract phonological terms. In order to avoid confounding the effects of the phonological and production system, a perceptual task was chosen. Also, so as to prevent orthography from affecting subjects’ responses, no written input was presented from the moment in which the experiment began. Test items included both trained and untrained segments, which would allow us to establish whether generalization within the class occurred. Experiment 2 addresses the question of learning a phonotactic constraint on a class in infants. As seen in chapter 3, previous research with infants has shown that, when directly comparing a phonetic and an arbitrary class, young infants fared better in a task where the phonotactic rule concerned a phonetic class rather than an arbitrary set of segments. Experiment 2 attempts to establish whether infants can learn a distribution concerning a class that can in principle only be defined in terms of abstract features. The phonotactic patterns chosen were nasals and stops, for one condition, and nasals and fricatives, for the other. Although nasals do not pattern with either fricatives or stops in English, these patterns were chosen because they occur roughly with equal frequency in languages across the world (Mielke, 2004a,b). It is possible, however, that there be a

55 difference in how difficult these two patterns are, since only nasals and stops are considered a natural class according to the linguistic-features account.


Experiment 1: Adults’ learning

In this experiment, I sought to establish whether adults could learn a phonotactic constraint involving a phonological class after brief auditory exposure.

5.1.1 Methods Participants heard a set of pseudo-words exhibiting a phonotactic regularity in the initial segment for a short period of time. They were then tested on items following or breaking this pattern, and were asked to judge whether these items had been presented in the familiarization or not. Following Wilson (2003), if participants respond ‘yes’ significantly more often to legal test items (that is, to items that respect the phonotactic pattern) than to illegal test items, this suggests that they have learned the regularity. In particular, if participants have encoded the regularity as affecting only the trained segments, they will report having heard test items beginning with these particular consonants significantly more often than they do with untrained legal or illegal segments. On the other hand, if subjects generalize the constraint to the class, they will respond similarly to pseudo-words beginning with either trained or untrained legal (from here on, called ‘novel’) segments, while they will treat illegal onsets differently. The dependent measure, therefore, was the proportion of ‘heard’ responses, which was expected to vary across test item types (trained, novel and illegal). Participants Thirty-two native speakers of English (average age was 20.5, range 19-22; 30 female, 28 right-handed) participated in this study. Participants were randomly assigned to one condition and order. They received extra credit in a class for their participation in this experiment.

56 Equipment and apparatus A female speaker of American English recorded all stimuli in infant-directed speech in a sound-proof booth (IAC, model 403a), using a Marantz Professional Solid State Recorder (PMD 660) and a hypercardioid microphone (Audio-Technica D1000HE). The stimuli were then redigitized at 48 kHz and 16 bits. Amplitude was normalized to 65-70 dB and one-second silences were inserted between sentences recorded in a randomized order. Small silences (0.1 seconds) were inserted between words so that total time was the same for both conditions in familiarization, as well as across testing trials in the infants’ experiment presented in the next section. These sound manipulations were achieved using Praat (Boersma and Weenik, 2005) and Amadeus II (http://www.hairersoft.com). Familiarization and testing were controlled using Psyscope 9, running on a Macintosh PowerPC G4 (2.1). Participants heard all stimuli through Sennheiser HD 580 Precision headphones and responded using the keyboard. Position of buttons was counterbalanced across subjects taking handedness into account. Response time was collected, but will not be reported here due to the fact that the equipment used had a low level of accuracy for measuring this variable. Stimuli There were four possible familiarizations, two in which participants heard the natural class pattern of stops and nasals (which I will call the natural condition) and two involving the arbitrary pattern of fricatives and nasals (referred to as the arbitrary condition). The segments involved in the patterns in each were counterbalanced, so that half of the subjects in condition 1 were familiarized with certain stops, which the other half did not hear in syllable-initial position during familiarization. All items were generated as follows. Pseudo-words were closed syllables (CVC), where the segments respected the following restrictions: a) The initial consonants of words in familiarization were drawn from the following sets for one group: nasals /m, n/ and either fricatives /f, z/ or oral stops /t, g/; the other group was familiarized with words beginning with /m, n/ and either /v, !/ or /b, k/.

57 b) The only vowels were /i, a, !, u/; c) The final consonant was drawn from the following set: /m, !, l, r, s, z, !, f, v, !, t!, p, b, t, d, k, g/. The pseudo-words were generated randomly, although minimizing the variation across conditions. For that reason, the words of interest had approximately the same rhymes (VC) across sets, but the onset varied across conditions; while in the natural condition a given word began with a stop, in the arbitrary condition the corresponding word began with a fricative. Items that interfered with common English words were discarded and replaced by others. Care was taken to ensure that there were no other highly likely distributions apart from the one of interest. The familiarization stimuli were then grouped in 19 strings of 3 pseudo-words each, which were played once for a total of about two minutes of exposure (1’40”). There were forty test items, all of them novel pseudo-words for all the conditions. Further, the same test items were used for all conditions and orders, so that test items that were novel for one familiarization, were trained or illegal for another. This design is represented in the following table 1: Table 1: Initial consonants in test items for experiment 1 Familiarization

Order A







/f,z/ /v,!/


Order B /f,z/ Illegal

Trained Novel

/b,k/ Novel Trained Illegal

/v,!/ Illegal Novel Trained

Thus, for a given condition and order, the participant heard some items which began with trained segments, others that began with untrained but legal segments, and finally others outside the trained class. It should be noted that after running Order A with one set of initial consonants, it was brought to our attention that one of the test words was a common slang term. The item was excluded from further analysis, but so as not to introduce a spurious interaction between conditions and orders, one item was excluded from the other test types as well.

58 Procedure Subjects were asked to sit in front of a computer. They were presented with oral and written instructions that stated that they would hear a set of non-words (words in Martian) and that they would then see a fixation symbol on the screen, which signaled the beginning of the testing phase. In this phase, they would hear a word, and would have to decide whether they thought they had heard that word in the first part of the experiment or not. If they thought they had heard the word in the first part, they would press a button marked with a drawing of an old man (because it would be an ‘old’ word). If they thought they had not, they would press on the button marked with a baby, signaling a ‘new’ word. The drawings were designed controlling for number of lines and general shape, and the position of key location was counterbalanced across subjects. In order to reduce the cognitive load of learning how the experiment would proceed, the experiment was preceded by a pretest which followed exactly the design of the experiment, but whose stimuli were English words. After this pretest, subjects had the opportunity to ask questions. No feedback was provided during either the pretest or the experiment. All test items were in fact new and had not appeared in the familiarization of any of the groups. All groups were tested on the same items. The test items were blocked by kind of initial segment and familiarity, yielding four groups across conditions (stops familiar, stops novel, fricative familiar, fricative novel – although notice that for the natural condition the distinction between novel and familiar fricatives would be irrelevant, and viceversa). Items within blocks were randomly selected (without replacement) by the program controlling the experiment. Order of presentation of the blocks was counterbalanced across subjects.

5.1.2 Results A repeated measures ANOVA was computed with Condition and Order as factors, and Proportion of ‘Heard’ Responses as the dependent measure, which revealed a main effect of Test Item Type (F (1,31) = 9.646, p <.0001), no other effects and no interactions. The following chart plots the interaction between Proportion of ‘Heard’ Responses and Test

59 Item Type. ‘Trained’ item types are those that begin with a trained segment, ‘novel’ are words that begin with a legal but untrained segment, and ‘illegal’ are items whose initial consonant violates the phonotactic constraint that was trained.

Figure 5: Proportion of 'heard' responses by test item type in experiment 1 Two-tailed paired t-tests showed a significant difference between trained and novel items (t(31) = 3.61, p < 0.002; all t-tests reported in this thesis are two-tailed), a significant difference between trained and illegal items (t(31) = 4.07, p < .001) and no significant difference between novel and illegal items (t(31) = 1.28, p > .2). Table 2 below reports means and standard deviations by test item type. Table 2: Means and Standard Deviations of ‘heard’ responses to test item types for Experiment 1 Test Item Type Trained







Std. Dev.




60 Trained items elicited a higher proportion of ‘heard’ responses in 22 of the 32 subjects as compared to novel items, and in 26 participants as compared to illegal items.

5.1.3 Discussion This experiment replicates previous results in non-production tasks. Subjects learned the phonotactic distribution they were exposed to, as shown by the significantly higher level of ‘old’ responses to items beginning with a trained segment. However, participants did not interpret the distribution as a constraint on a class, since they did not treat items beginning with segments within the trained class any differently from the items outside the class. In other words, participants failed to generalize to novel segments within the class. The results also suggest that this particular procedure is able to reveal learning, as a significant difference was established between types of test items. Finally, the fact that there was no difference between the natural and arbitrary conditions is not significant given that there was no learning of classes. This lack of generalization can be attributed to three factors: the procedure, the stimuli and the speech sound representations. As to the procedure, it can be argued that the present task does not tap into phonological representations, given that participants were not asked to attend to the familiarization in a phonological mode, but were asked whether they had heard the test items or not. In fact, many artificial grammar learning studies simply ask of participants to rate grammaticality of test items, or to judge whether the item presented belongs to the trained language or not (e.g. Dienes, Broadbent & Berry, 1991). The latter procedure would induce a ‘phonological’ mode, while the one used in this experiment would not. On the other hand, the procedure used in this experiment tried to target implicit learning. It is reasonable that referring to grammaticality or asking subjects to learn the rules of the language may induce subjects to pursue explicit strategies that focus on the phonological material. These strategies could have produced overlearning, which would have been detrimental to the objective of attaining generalization. Further, there is

61 evidence from semantic categorization that suggests that memory can be tricked by presentation of closely related tokens. For example, in Shacter et al. (1996) participants heard a set of items that were semantically related (for example, they might hear ‘sofa, armoire, table’, etc.) and were then presented with items that were part of the familiarization, items that had not been included in the familiarization but were semantically related (for example, ‘couch’), and unrelated novel words (such as ‘dog’). Participants responded similarly to the first two kinds of items, responding that they had heard the items significantly more than they reported having heard the unrelated targets. The authors report both behavioral and functional results, neither of which shows that real and illusory memories differ. It was, thus, reasonable that a similar method used for phonological material might have elicited a pattern of responses that showed evidence of categories. Nevertheless, it is true that unlike Shacter et al’s, this experiment does not allow a comparison to a baseline of truly heard items. Further, by instructing participants to pay close attention to the words as a whole, irrelevant aspects of the enunciation might have biased their behavior.33 It would be of interest to carry out another experiment that attempted to replicate this pattern of results using a different procedure. A second drawback of this design resides in the stimuli. Although this experiment improves on previous ones by presenting training items only once, and by exposing subjects to a wide range of exemplars, it is nonetheless limited in that the number of segments in the fricative and the stop conditions was reduced. The phonemic inventory of English comprises six stops /p,b,t,d,k,g/ and six fricatives /f,v,s,z,!,!/ (not counting the voiced interdental, which has a limited distribution, or palatal, which occurs only word medially), but only two of either were used in training. Perhaps the presence of only a third of the members of a class is a sign, to the learner, that a constraint pertains not a class, but the particular segments. Furthermore, participants only heard one talker producing the familiarization stimuli. Hearing just one speaker may also lead to less abstraction. In fact, there is evidence from second language learning that a better transfer

For example, several of the participants ventured a guess that the objective of the study was related to the intonation of the words. 33

62 of knowledge is attained when there is higher variability (Hardison, 2003; Kingston, 2003; Wang, Yongman & Sereno, 2003). Finally, the stimuli in this experiment were recorded in infant-directed speech, which is characterized, among other things, by being relatively longer. Both words and pauses in the stimuli in this experiment were long, possibly allowing a very concrete representation of the pseudo-words. In an experiment I do not report here, these problems were resolved by having four speakers of different American English dialects, raising the number of trained onsets to four, and varying the register that speakers used. Nevertheless, subjects failed to generalize even with these conditions, thus arguing against this second explanation. A third possibility is that adults do not access phonological features in learning, but rely only on segmental categories. One way of assessing this hypothesis would be by using a training paradigm and comparing learning outcomes in two conditions. In both, subjects would have to learn a phonotactic constraint on a group of segments, but in one of them this group would be arbitrary and in the other it would be phonetically or phonologically based. Learning outcomes would be measured by the amount of training that is necessary to learn the regularity. Albeit indirect, the training method might provide some evidence as to whether classes are useful in learning. A second possible design would be to replicate Endress and Mehler (under review) but using phonological rather than phonetic classes. Nevertheless, the present experiment lines up with most previous results in showing that learning of classes is not necessary; on the contrary, adults seem to be biased to learn on the basis of segments. The fact remains, however, that classes seem to be important and widespread in languages. Since adults seem not to be sensitive to them (at least in these perceptual learning tasks), one way of explaining the pervasiveness of features is through language acquisition. If infants, unlike adults, do base their speech sound categorization on features, classes of sounds would arise as generations map the regularities of its input language onto feature-based classes. Experiments 2 and 3 assess infants’ sensibility to sound classes.

63 5.2

Experiment 2: Young infants’ learning

In this section, I present the results of an experiment very similar to that reported in the previous section, albeit here the participants were seven-month-old infants. Preliminary results of an identical experiment on fourteen-month-old infants are presented in Experiment 3, Section 5.3 below. It was expected that these two age-groups would behave differently, since they are at different points in development. Specifically, sevenmonth-olds are still universal listeners (Werker & Tees, 1984) and they are still in a canonical babbling stage, producing a very limited range of sounds. In contrast, fourteenmonth-olds have started producing their first words (though still with a limited segment inventory), and are perceptually already attuned to their language.

5.2.1 Methods In this experiment, infants were exposed to the same familiarizations as adults in Experiment 1. To recap, in one condition, infants listened to words beginning with stops or nasals, a pattern that would affect a natural class according to phonological theory. Infants in the other condition heard fricative- and nasal-initial words, which constitute an acoustic or articulatory class, but not a phonological class. Infants in both groups were subsequently tested on words that began with new segments that either belonged to the trained class (legal) or not (illegal). If infants learned the pattern and generalized it to the whole class, they would attend differently to illegal (fricatives, for stop- and nasal-trained babies; stops, for fricative- and nasal-trained babies) and novel legal items. Participants Twenty-four seven-month-old (M = 6.19, range 6.5-7.2; 9 female) infants were tested. A further 13 infants were tested whose results are not reported for the following reasons: 3 for being more than 4 weeks premature and/or having a birth-weight below 6 pounds; 6 for fussing or crying; 2 for being exposed to languages other than English more than 20% of the time; and 2 for having looking times for difference scores (illegal-legal) more than 2.5 standard deviations above or below the mean.

64 Equipment and Apparatus The same recording and editing equipment was used as in Experiment 1. For familiarization and testing, stimuli were stored in a Macintosh G4 and presented through a Yamaha audio-amplifier on two Cambridge Soundworks Ensemble II. Stimuli Testing consisted of three different orderings of three items, all of which began with untrained segments, both legal and illegal. For example, infants familiarized with the stops /b, k/ heard items beginning with /t, g/ and /f, z/ in testing. The following table summarizes the stimuli used. Table 3: Initial consonants in familiarization and testing in Experiment 2 Familiarization Testing

Order A /t,g/


Order B /b,k/













Legal Procedure The testing room consisted of a small, three-sided enclosure, whose panel walls were approximately 5 ft. tall. The caregiver sat down on a chair in the center of this enclosure, with the infant seated on his or her lap. White curtains hung from the ceiling and met the walls of the enclosure, concealing from the sight of the subject the equipment and experimenter, who coded the infant’s head-turns using a button box. In the wall in front of the infant there were three small holes, two that allowed the experimenter and a second observer, if there was one, to see the experiment and a third, slightly larger one, through which a camera videotaped the session. On this wall there was also a green light located approximately at the infant’s eye level. The lateral walls also bore lights, although

65 these ones were red. Hidden behind the panel, a speaker was placed behind each of these red lights. The light of the room was attenuated to allow the lights on the walls to stand out more. During the familiarization period, speech was played through both speakers without interruption. Lights were contingent on the infant’s looking times, as follows. At the beginning of the experiment, the green light at the front began flashing. When the infant looked at it, this front light was extinguished and one of the side lights began flashing. The light was extinguished when either the maximum playing time was reached or until the infant looked away for longer than two seconds. If the infant looked away for less than two seconds, the light continued flashing, although this time was not counted towards the total tally. When a side light was extinguished, the light at the front started flashing again, to get the infant reoriented to an unbiased position. During the test phase, both the lights and the sound were contingent on infants’ looking. After the baby reoriented to the green light, this was turned off and one of the side lights would begin to flash. When the infant oriented towards one of these lights, speech began to play from this side only, following the same pattern as the light flashing described above. When speech from this side stopped playing, the green light at the front began flashing again, starting over the procedure until all the test trials were completed. The order of presentation of the test trials as well as the order in which the side lights flashed were randomized and controlled by a computer program. Both the caregiver and the experimenter listened to masking music through tight-fitting earphones.

5.2.2 Results An ANOVA with Condition (Stops and Nasals, Fricatives and Nasals) and Order (A, B) as factors and difference looking times (illegal-legal) as the dependent measure showed a main effect of condition (F(1,23) = 7.26, p < 0.02), no effect of order (F(1,23) < .04) and no interaction between order and condition (F(1,23) < .14). For this reason, I collapsed across orders and calculated t-test values comparing orientation to legal versus illegal items within each condition. For the Stops and Nasals condition, there was a significant effect of test item type, t(11) = 3.06, p < .01, with longer

66 orientation times to illegal items. Ten out of twelve infants in this condition followed this pattern. In the Fricatives and Nasals condition, there was no effect of test item type, t(11) = 1.39, p > .19. Only five out of twelve infants oriented longer to illegal test items. Means and standard deviations can be found in Table 4. Table 4: Means and Standard Deviations (between parenthesis) of looking times to test item types in Experiment 2 Condition



Stops and Nasals

12.19 (6.16) 9.08(5.17)

Fricatives and Nasals 9.71 (4.18) 11.94(6.4) Finally, average looking times to illegal and legal test item types for both conditions are plotted in Figure 6.

Figure 6: Looking times by condition and test item type in Experiment 2 (error bars represent standard error)

67 5.2.3 Discussion These results suggest that infants are able to learn a phonotactic constraint affecting a class of sounds, rather than particular segments, given that the test items comprised only novel phonemes. However, not all classes are equally easy to learn. On the contrary, while seven-month-olds familiarized with a constraint affecting stops and nasals exhibited the expected novelty preference, infants in the fricatives and nasals condition did not show such preference. Nevertheless, this pattern of results need not be interpreted as a proof in favor of the linguistic-features hypothesis on two counts. First, it may be the case that fricatives are (for some reason) inherently difficult. This caveat receives some support from multiple studies that show that contrasts in fricatives take longer to learn; for example, while voicing in stops is in place by one month of age, even school-age children make mistakes when discriminating voiced and voiceless fricatives (Hazan & Barrett, 2000). However, nine-month-old babies in Seidl and Buckley (2005) learned a rule concerning stops and one concerning fricatives with equal ease.34 It should nevertheless be shown that this is also the case for seven-month-olds. Specifically, it would be crucial to conduct a control experiment in which babies hear the exact same familiarization and testing of this experiment minus the nasal-initial words. If it is the presence of these nasal-initial words (that force babies to learn a more general class) what prevents them from learning, infants in this control experiment should display a novelty preference. A second control experiment should also be carried out in order to demonstrate that the constraint truly concerned the natural class of nasals and stops. In this second control, infants would be exposed to the same familiarization as above, but would be tested on novel words beginning with trained segments. One test item type would be nasal-initial words, and the other stop-initial words. Infants should not display a preference for either of these items. An alternative analysis of this patterns of results that is consistent with the phoneticfeatures hypothesis also suggests itself. It is possible that it is not only the perceptual

It should be noted, however, that infants in Seidl and Buckley (2005) were familiarized for about three minutes. For that reason, it is possible that they were at ceiling. 34

68 linguistic experience that aids infants in the formation of speech sound categories, but also their production. Indeed, even at the younger age of 7 months, infants already produce nasals and stops when they babble, but not fricatives. In fact, fricatives do not arise in children’s inventory until 24 months of age (by which age /s/ was present in the inventory of 50% of children in Stoel-Gammon, 1985). Their lack of experience in producing fricative sounds might prevent infants young from analyzing or constructing their featural specification.35


Experiment 3: Learning in older infants

This section presents preliminary results of an experiment that probed learning of phonotactic constraints on classes of sounds by fourteen-month-old infants. As mentioned before, fourteen-month-old infants are expected to be at a different stage of phonological development. Given the results of Experiments 1 and 2, it might be predicted that fourteen-month-old infants would behave like adults (or 16.5-month-old babies, as in Chambers et al., 2003) and learn constraints on segments rather than classes. In this case, no difference should be found between legal (within the class) and illegal (outside the class) test items. In other words, if older infants learn like adults, both within and outside the class items are not trained items, and therefore illegal. As an illustration of this, adults did not display a statistically significant difference between novel and illegal items, and if they had been tested only on these two kinds, no difference would have been found between test item types.36 Alternatively, it can be hypothesized that older infants can focus on classes, and thus behave the same as seven-month-olds do. In this case, older infants would display a novelty preference. A third possibility is that fourteen-month-olds are able to learn It might be argued that infants of this age do have some experience with continuant sounds: vowels, glides and the bilabial trill. Such an argument, however, still rests on the assumption that all continuant sounds bear the feature [+continuant], which is exactly the kind of circularity this thesis attempts to avoid. 36 In fact, a pilot experiment not reported here pursued this possibility with adults. Unlike Experiment 1, participants in the pilot heard a minute of music between familiarization and testing, and were only tested on novel and illegal items. No difference was found between these item types. 35

69 constraints on classes, but find this a more difficult than seven-month-olds do. The latter hypothesis predicts that fourteen-month-old infants would display a familiarity preference. Participants Twenty-two fourteen-month-old (M = 13.93, range 13.4-14.5; 12 female) infants were tested. A further 7 infants were tested whose results are not reported for the following reasons: 1 for being more than 4 weeks premature and/or having a birth-weight below 6 pounds; 1 for being exposed to languages other than English more than 20% of the time; 5 for fussing or crying. No infants had difference looking times more than 2.5 standard deviations above or below the mean. Equipment and Apparatus The same recording and editing equipment was used as in Experiment 2. Stimuli The same familiarization and testing stimuli was used as in Experiment 2. Procedure The same procedure was followed as in Experiment 2.

5.3.2 Preliminary results An ANOVA with Condition (Stops and Nasals, Fricatives and Nasals) and Order (A, B) as factors and difference looking times (illegal-legal) as the dependent measure showed no main effect of condition ( F (1,23) = 0.12), no main effect of order ( F (1,23) = 0.1), but a significant interaction between order and condition ( F (1,23) = 4.72, p < 0.05).

70 Pair-wise comparisons between each condition and order and every other condition and order are reported in Table 5. Since the comparison between the Fricative and Nasals Condition in Order A and Order B is marginally significant, I did not collapse across orders or conditions. Table 5: Pair-wise comparisons between orders and conditions in Experiment 3 Order A

Order B




Order A


t(18) > 1.86, p < .08

Order B


t(18) < 1.37

t(18) < .32


t(18) < .14

t(18) > 1.74, p = .1

t(18) < 1.25

I then calculated t-test values comparing orientation to legal versus illegal items within each condition and order. None of these tests, reported in Table 6, were significant. Table 6: t and p values for each condition and order in Experiment 3 Order



t value

p value


Stops and Nasals




Fricatives and Nasals




Stops and Nasals




Fricatives and Nasals





Means and standard deviations can be found in Table 7. Table 7: Means and Standard Deviations by test item type in Experiment 3 Order





Stops and Nasals

10.76 (3.44)

8.19 (3.32)

Fricatives and Nasals

10.4 (7.07)

14.28 (8.04)

Stops and Nasals

9.75 (11.08)

12.44 (9.67)

Fricatives and Nasals

8.51 (2.56)

6.4 (2.64)


Four out of 6 infants in Order A, Stops and Nasals Condition oriented longer to illegal items, while only 1 of 4 did so in Order B. In the Fricatives and Nasals Condition, two

71 participants out of 6 oriented longer to illegal items in Order A, but 4 out of 6 did so in Order B. Average looking times to illegal and legal test item types by condition and order are plotted in Figure 7.

Figure 7: Looking times to legal and illegal items in Experiment 3

5.3.3 Discussion Although a significant interaction between Order and Condition was found in the present experiment, ttests do not reveal that fourteen-month-olds, regardless of Order or Condition, are responding differently to legal than illegal items. Thus, it appears that the interaction was driven by a preference to listen to certain segments over others in the test phase of the experiment. These results suggest that fourteen-month-olds cannot learn a phonotactic rule concerning a class of sounds, which supports the developmental

72 hypothesis by which fourteen-month-olds are more similar to adults than they are to younger infants, having completed some of their phonological development.


General discussion

The three experiments reported here shed some light on the question of how classes of sounds arise in the phonological systems of languages and also raise some questions. In summary, it appears that neither adults nor fourteen-month-old babies readily generalize to members of a phonological class after brief auditory exposure to a phonotactic pattern. In contrast, younger infants are able to generalize within certain groups of sounds, although not just any grouping. Further research is necessary to establish why the nasals and stops pattern was easier to learn than the fricatives and nasals one. A second important finding is the possibility of delimiting a developmental trend in acquisition of speech sound categories. Research by Werker and Tees (1984) showed that young infants acted as universal listeners by paying attention to distinctions that were not present in their ambient language. By twelve months of age, however, most infants had turned their attention away from contrasts that were not phonemic. The timeline found here for speech sound classification echoes these findings. Future experiments may reveal exactly at what age infants stop being able to develop sound classes, and whether there is a relationship between phonetic tuning and phonological classification.



While phonological theory seems to place a great weight on the notion of features, the literature reviewed in chapters 3 and 4 showed little evidence for the psychological reality of these important phonological notions. On the other hand, the results of the experiments reported in chapter 5 contribute to explain exactly why phonological features might be present in language but absent from the developed, adult grammar. Experiment 2 suggests that young infants can indeed learn a generalization concerning an abstract class of sounds and project it to other members of the category. Adults and older infants exposed to the exact same familiarization failed to generalize in the testing, lack of generalization that echoes previous results in the literature. Bearing in mind the results of the experiments reported in Chambers et al. (2003) and Onishi et al. (2002), it stands to reason that features may only be present in young infants’ grammar but are later overridden by phonemic representations, which are dominant in the developed grammar. This state of affairs would explain both the pervasiveness of features in languages and their elusiveness in people’s minds. As mentioned before, it is still possible that the lack of generalization found in Experiment 1 was an artifact of the design. For this reason, it would be important to replicate other designs which seem to show an effect of phonetic classes (e.g. Endress & Mehler, under review), and test whether this effect is also evident for phonological classes. In this way, we could compare learning of a low-level phonetic class, such as voiceless stops, with, for example, the phonological class of nasals and stops. If participants found it easier to learn a constraint on word-middles due to the phonological representation of this constraint, a distribution concerning voiceless stops ought to be as easy to learn as one concerning nasals and stops, since both of these classes are phonologically natural. It might be tempting to relate the fact that the former two types of sounds are present in babies’ production with the outcome of phonological learning studies with adults that

74 require production of responses. From this perspective, production would force a kind of abstraction that perceptual tasks do not require; a second possibility would argue that features arise in production and perhaps are to be identified with the gestural scores necessary to implement a speech sound. In line with the latter possibility, Pulvermüller et al. (2006) show that there is an overlap in the activation of the motor cortex between perception and articulation of syllables that vary in place of articulation. For example, articulation of /pa/ activates a region of the precentral and central cortex that is more superior to that activated in the production of /ta/, and the same pattern arises when subjects perceive /pa/ and /ta/ syllables. However, it is difficult to explain, within this perspective, how patterns concerning nasals and fricatives are so wide-spread crosslinguistically. In other words, if gestural scores underlie the formation of phonological classes, and this formation occurs only at the younger ages (as suggested by the present results), it is difficult to explain how the nasals and fricatives patterns arise. A further possibility is to attribute the difficulty in grouping fricatives and nasals to lowlevel acoustic characteristics. It is possible to hypothesize that languages where nasals and fricatives pattern together tend to have voiceless nasals (which are similar to fricatives, according to Ohala & Ohala, 1993), or tend to have tense stops, and tenseness in stops prevents their being grouped with nasals.37 Beyond these great questions that are here left unanswered, these experiments begin to provide an answer to the question of how features come to shape languages. These results, together with the literature reviewed in previous chapters, suggest that features find their way into phonological patterns through one of the means: in demanding tasks (involving production, as suggested by Peperkamp et al., in press; or perception, Endress & Mehler, under review), and through perception in young infants, as suggested by the results of Experiment 2. Further, the present results may provide some evidence in favor of the linguistic-features hypothesis. Thus, if infants had been merely tracking probabilities of basic phonetic classes (such as stops) and then inducing a constraint on the position of two such classes, both groups of infants should have learned the familiarized

The latter possibility is proved wrong by the fact that in Russian, where nasals and fricatives pattern together, stops are not tense. Nonetheless, other acoustic variables may prove more useful. 37

75 constraint. On the contrary, the fact that infants in the non-phonological condition were not able to do so suggests that infants may be relying on representations that do not allow the categorization of some sounds (namely fricatives and nasals) into a uniform class. This consideration raises a further question regarding how cross-linguistic frequency may be correlated with Universal Grammar. Since one of the groupings studied here is easier for babies to learn, one would expect it to be more widespread cross-linguistically than the other. And yet, Mielke (2004a, b) argues that they are equally likely in languages across the world. Previous research has often assumed that cross-linguistic generalizations represent fairly directly the cognitive biases present in the language faculty of humans (Chomsky & Halle, 1968; Stampe, 1973; Wilson, in press). At most, it has been assumed that these biases are further shaped by the phonetic medium of language, such that two patterns that are equally likely taking into account their cognitive status may in the long run have different probabilities of occurrence. For example, Moreton (2006) reports how well adult subjects learned three different phonotactic constraints, which are the following. ‘Height-height’: the height of one vowel depended on the height of the preceding vowel (a fairly common harmonic or disharmonic process, which is also phonetically grounded – that is, it might be related to a perceptual basis); ‘height-voice’: height of a vowel depended on the voicing of a previous consonant (an infrequent process cross-linguistically, although, as the height-height pattern, phonetically sound); ‘voicevoice’: the voicing of a consonant depended on the voicing of a previous consonant (a very rare constraint in natural languages, and also phonetically unlikely). Learning was significantly better in the height-height and voice-voice conditions as compared with the height-voice condition. These results suggest that frequency in the world’s languages is dependent both on the phonetic precursors of a sound change (that is, how perception and production might impact on the likelihood of a change), and other cognitive biases. In particular, frequent patterns should be favored by both phonological biases and phonetics. The second experiment reported in this thesis suggests that it is possible for a disfavored pattern to catch up. In other words, if these results truly show that nasals and stops are easier to categorize together than nasals and fricatives, the fact that these two patterns are

76 equally likely would require that the nasals and fricatives one be favored by he phonetic factors converging on language change such that the initial difference between them is erased. These results may also be interpreted as a warning against assuming that crosslinguistic frequency is a direct correlate of ease of acquisition. However, it stands to question whether experiments such as the ones reported here constitute a measure of learning strategies, or whether they reveal something about grammar itself. For example, although it has been pointed out that the pattern of results found is consistent with the linguistic hypothesis, that assumes features to be a part of Universal Grammar, it is also possible to postulate a modified version of the phonetic hypothesis to accommodate them. When measuring language learning, one necessarily measures a performance that includes mediating systems (in the case of learning of phonological patterns, the production and perception systems). A second important question has not been addressed here. As mentioned in the Introduction, distinctive feature theory was proposed to resolve two questions: First, how sounds are grouped into phonological classes, which is the aspect tackled in this study; and second, how sounds are distinguished phonologically. In phonological theory, two sounds are allophones of different phonemes if their substitution results in a change in meaning, which would imply that language learners rely on lexical distinctions to resolve what changes in the acoustic signal are linguistically relevant in their input language. On the other hand, extensive research has shown that the phonological representation of words is highly underspecified in the initial stages of word learning (reviews can be found in Pater, Stager & Werker, 2003; Werker & Fennell, 2004). This relative independence of phonological and lexical representation operates both ways. In one sense, toddlers, who have been shown to be attuned to the phonological contrasts of their language, do not rely on this knowledge to determine when a word in their input is different from one they have stored in memory. In the other, phonological tuning occurs too early to be triggered by stored lexical representations. Furthermore, features do not seem to operate in the same manner in another modality. Specifically, many features which are active in sign languages may not be encoded in minimal pairs, and even their hierarchical relationship with respect to segments may be

77 questioned (Brentari, 2002, for example, proposes that while segments in spoken languages dominate features, the opposite occurs in sign languages). These may constitute indications that the concept of feature really conflates more than one phenomenon, and it may be both theoretically and empirically desirable to revise it. Nevertheless, phonological features are useful constructs for linguistic description. The use of features allows elegant descriptions of sound changes as well as phonotactic distributions in natural languages. The experimental results reported in this thesis suggest that sound classes might arise from the way infants learn phonotactic classes, although this learning is not simply the result of an unconstrained, probabilistic mechanism, but that other cognitive and phonetic factors influence acquisition.



Abramson, A. S., & Lisker, L. I. (1968). Voice timing: Cross-language experiments in identification and discrimination. The Journal of the Acoustical Society of America, 44 (1), 377. (Abstract). Anderson, J. L., Morgan, J. L., & White, K. S. (2003). A statistical basis for speech sound discrimination. Language and Speech, 45 (2-3), 155-182. Archangeli, D., & Pulleyblank, D. (1994). Grounded phonology. Cambridge, Mass. and London, UK: MIT Press. Archibald, J. (ed.) (1995). Phonological acquisition and phonological theory. Hillsdale, NJ: Lawrence Erlbaum. Aslin, R. N., & Pisoni, D. B. (1980) Effects of early linguistic experience on speech discrimination by infants: A critique of Eilers, Gavin and Wilson (1979). Child Development, 51, 107-112. Baltaxe, C. (1978). Foundations of distinctive feature theory. Baltimore, MD: University Park Press. Bedore, L. M., Leonard, L. B., & Gandour, J. (1994). The substitution of a click for sibilants: A case study. Clinical Linguistics & Phonetics, 8, 283-293. Beers, M. (1996). Acquisition of Dutch phonological contrasts within the framework of feature geometry theory. In B. Bernardt, J. Gilbert & D. Ingram (Eds.), Proceedings of the UBC International Conference on Phonological Acquisition (pp. 28-41). Somerville, MA: Cascadilla Press. Best, C. T. (1994). The emergence of native-language phonological influences in infants: a perceptual assimilation model. In J. C. Goodman and H. C. Nusbaum (Eds.) The Development of Speech Perception: The Transition from Speech Sounds to Spoken Words (pp. 167-224). Cambridge, MA: MIT Press.

79 Best, C. T., McRoberts, G. W. & Sithole, N. M. (1988). Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults. Journal of Experimental Psychology: Human Perception and Performance. 14(3), 345-360. Blevins, J. (2004). Evolutionary Phonology. Cambridge, UK: Cambridge University Press. Boersma, P., & Weenik, D. (2005) Praat: Doing phonetics by computer. Downloaded from http://www.praat.org on 12/08/2005. Brentari, D. (1998) A prosodic model of sign language phonology. Cambridge, Mass. and London, UK: MIT Press. Brentari, D. (2002). Modality differences in sign language phonology and morphophonemics. In R. Meier, D. Quinto, & K Cormier (Eds.) Modality in Language and Linguistic Theory (pp. 35-64). Cambridge, UK: Cambridge University Press. Brown, C. A. (2000). The interrelation between speech perception and phonological acquisition from infant to adult. In J. Archibald (Ed.), Second language acquisition and linguistic theory (pp. 4-63). Malden, MA: Blackwell. Buckley, E. (2000). What should phonology explain? Handout from Linguistics Colloquium, State University of New York at Buffalo, March 17. Calabrese, A. (2005). Markedness and economy in a derivational model of phonology. Berlin and New York: Mouton de Gruyter. Camarata, S., & Gandour, J. (1984). On describing idiosyncratic phonologic systems. Journal of Speech and Hearing Disorders, 49, 262-266. Chambers, K. E., Onishi, K. H., & Fisher, C. (2003). Infants learn phonotactic regularities from brief auditory experience. Cognition, 87, B69–B77. Chomsky, N., & Halle, M. (1968). The sound pattern of English. New York: Harper & Row. Clements, G., & Hume, E. (1995). The internal organization of speech sounds. In J. Goldsmith (Ed.) The handbook of phonological theory (pp. 245-306). Oxford and Cambridge, MA: Blackwell.

80 Clements. G. N. (2001). Representational economy in constraint-based phonology. In T. A. Hall (ed.) Distinctive feature theory (p. 71-146). Berlin and New York: Mouton de Gruyter. Cohn, A. (1993) Nasalization in English: Phonology or phonetics. Phonology, 10, 43-81. Cohn, A. (2003). Phonetics in phonology and phonology in phonetics. Paper presented at the 11th Manchester Phonology Meeting, Manchester, UK, May 24-26. Dell, G.S., Reed, K.D., Adams, D.R., & Meyer, A.S. (2000) Speech errors, phonotactic constraints, and implicit learning: A study of the role of experience in language production. Journal of Experimental Psychology. Learning, Memory, and Cognition. 26(6), 1355-67. Dienes, Z., Broadbent, D., & Berry, D. (1991) Implicit and explicit knowledge bases in artificial grammar learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 17(5), 875-887. Dinnsen, D. A., & Chin, S. B. (1995). On the natural domain of phonological disorders. In J. Archibald (ed.) Phonological acquisition and phonological theory (pp. 135-150). Hillsdale, NJ: Erlbaum. Donegan, P.J., & Stampe, D. (1979). The study of natural phonology. In D. Dinnsen (Ed.), Current approaches to phonological theory (pp. 126-173). Bloomington: Indiana University. Dresher, B. E. (2003a) Contrast and asymmetries in inventories. In A.-M. di Sciullo (ed.) Asymmetry in Grammar, Volume 2: Morphology, Phonology, Acquisition (pp. 239-257). Amsterdam: John Benjamins. Dresher, B. E. (2003b) The contrastive hierarchy in phonology. In D. C. Hall (ed.) Toronto Working Papers in Linguistics (Special Issue on Contrast in Phonology) 20 (pp. 47-60). Toronto: Department of Linguistics, University of Toronto. Dresher, B. E., & H. van der Hulst (1995). Global determinacy and learnability in phonology. In J. Archibald (ed.) Phonological Acquisition and Phonological Theory. (pp, 1-21). Hillsdale, NJ: Lawrence Erlbaum.

81 Dupoux, E., Fushimi, T., Kakehi, K., & Mehler, J. (1999). Prelexical locus of an illusory vowel effect in Japanese. Eurospeech ’99 Proceedings, ESCA 7th European Conference on Speech, Communication and Technology. Durand, J., & Laks, B. (2002). (Eds.) Phonetics, phonology and cognition. Oxford, UK: Oxford University Press. Eilers, R. E., Gavin, W., & Wilson, W. R. (1979). Linguistic experience and phonemic perception in infancy: A crosslinguistic study. Child Development, 50(1), 14-18. Eilers, R. E., Gavin, W., & Wilson, W. R. (1980). Effects of early linguistic experience on speech discrimination by infants: A Reply. Child Development, 51(1), 113-117. Eimas, P. D. (1975). Speech perception in early infancy. In L. B. Cohen & P. Salapatek (Eds.), Infant Perception, 2: From Sensation to Cognition (pp. 341-347). New York: Academic Press. Emmorey, E., McCullough, S., & Brentari, D. (2003) Categorical perception in American Sign Language. Language and Cognitive Processes, 18(1), 21-45 Endress, A. D., & Mehler, J. (under review). Perceptual constraints in phonotactic learning. Eulitz, C., & Lahiri, A. (2004) Neurobiological evidence for abstract phonological representations in the mental lexicon during speech recognition. Journal of Cognitive Neuroscience,16(4), 577-83. Faber, A., & Best, C. T. (1994). The perceptual infrastructure of early phonological development. In S. D. Lima, R. L. Corrigan, & G. K. Iverson (Eds.) The reality of linguistic rules (pp. 281-280). Amsterdam: John Benjamins. Fant, G. (1973). Speech Sounds and Features. Cambridge, Mass.: MIT Press. Fey, M. E., & Gandour, J. (1982). Rule discovery in early phonological acquisition. Journal of Child Language, 9(1), 71-81. Fleischhacker, H. A. (2005). Similarity in phonology: Evidence from reduplication and loan adaptation. PhD Dissertation, UCLA. Flemming, E. (2001a). Auditory representations in phonology. New York: Garland Press. Flemming, E. (2001b). Scalar and categorical phenomena in a unified model of phonetics and phonology. Phonology, 18, 7–44.

82 Flemming, E. (2004). Contrast and perceptual distinctiveness. In B. Hayes, R. Kirchner, & D. Steriade, (Eds.). Phonetically-based phonology (pp. 232-276). Cambridge, UK: Cambridge University Press. Francis, A. and Nusbaum, H. C. (2002) Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology. Human Perception and Performance. 28(2), 349-66. Fromkin, V. (1971). The non-anomalous nature of anomalous utterances. Language, 47(1), 27-52. Fry, D. B., Abramson, A. S., Eimas, P.D. and Liberman, A. M. (1962). The identification and discrimination of synthetic vowels. Language and Speech, 5, 171-189. Gierut, J. (1998) Production, conceptualization and change in distinctive featural categories. Journal of Child Language, 25, 321-341. Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review. 105(2), 251-279. Goldrick, M. (2004). Phonological features and phonotactic constraints in speech production. Journal of Memory and Language. 51(4), 586-603. Goldsmith, J. (1990). Autosegmental and metrical phonology. Oxford and Cambridge, MA: Blackwell. Goldsmith, J. (1995). Phonological Theory. In J. Goldsmith (Ed.) The handbook of phonological theory (pp.1-23). Oxford and Cambridge, MA: Blackwell. Goldsmith, J. (Ed.) (1995). The handbook of phonological theory. Oxford and Cambridge, MA: Blackwell. Goldstein, L., & Fowler, C. A. (2003). Articulatory phonology: A phonology for public language use. In N. O. Schiller and A. S. Meyer (Eds.) Phonetics and phonology in language comprehension and production: Differences and similarities (pp. 159-207). Berlin and New York: Mouton de Gruyter. Goodman, J. C., & Nusbaum, H, C. (1994). (Eds.) The development of speech perception: The transition from speech sounds to spoken words. Cambridge, MA, and London, UK: MIT Press.

83 Goyvaerts, D. L., & Pullum, G. K. (Eds.) (1975). Essays on the Sound Pattern of English. Belgium: Story-Scientia Ghent. Graham, L. W., & House, A. S. (1971). Phonological oppositions in children: A perceptual study. Journal of the Acoustical Society of America, 49(2B), 559-566. Grijzenhout, J. (2001). Representing nasality in consonants. In T. A. Hall (ed.) Distinctive feature theory (pp. 177-210). Berlin and New York: Mouton de Gruyter. Gussenhoven, C., & Kager, R. (2001). Introduction: Phonetics in phonology. Phonology, 18, 1-6. Hale, M. & Kissock, M. (2005). Underspecification: Neutralization, and Acquisition. Paper presented at the 13th Manchester Phonology Meeting, May 26-28. Hale, M., & Reiss, C. (1998). Formal and empirical arguments concerning phonological acquisition. Linguistic Inquiry, 29(4), 656-683. Hale, M., & Reiss, C. (2003). The Substance Principle in phonology: Why the tabula can’t be rasa. Journal of Linguistics, 39(2), 219-244. Hale, M., Kissock, M., & Reiss, C. (in press). Microvariation, variation, and the features of universal grammar. Lingua. Hall, T. A. (2001). (ed.) Distinctive feature theory. Berlin and New York: Mouton de Gruyter. Halle, M. (1964). On the Bases of Phonology. In J. A. Fodor, & J. J. Katz (Eds.) The structure of language: Readings in the philosophy of language (pp. 324-333). Englewood Cliffs, NJ: Prentice-Hall. Hardison, D. (2003). Acquisition of second language speech: Effects of visual cues, context, and talker variability. Applied Psycholinguistics, 24, 495-522. Haspelmath, M. (2006) Against markedness (and what to replace it with). Journal of Linguistics, 42, 25-70. Hayes, B. and Steriade, D. (2004) Introduction: the phonetic bases of phonological Markedness. In B. Hayes, R. Kirchner, & D. Steriade, (Eds.). Phonetically-based phonology (pp.1-33). Cambridge, UK: Cambridge University Press. Hayes, B., Kirchner, R., & Steriade, D. (Eds.) (2004). Phonetically-based phonology. Cambridge, UK: Cambridge University Press.

84 Hazan, V., & Barrett, S. (2000). The development of phonemic categorization in children 6-12. Journal of Phonetics, 28, 377-396. Hewlett, N., & Waters, D. (2004). Gradient change in the acquisition of phonology. Clinical Linguistics and Phonetics, 18, 6-8. Hillenbrand, J. (1983) Perceptual organization of speech sounds by infants. Journal of Speech Language and Hearing Research, 26, 268-282. Holt, L. L., Lotto, A. J., & Diehl, R. L. (2004). Auditory discontinuities interact with categorization: Implications for speech perception. Journal of the Acoustical Society of America, 116(3), 1763-1773. Houston, D. M., & Jusczyk, P. W. (2003). Infants’ long-term memory for the sound patterns of words and voices. Journal of Experimental Psychology. Human Perception and Performance, 29(6), 1143-54. Houston, D., Jusczyk, P. W., & Jusczyk, A. M. (2003). Memory for bisyllables in 2-montholds. In D. Houston, A Seidl, G.Hollich, E. Johnson, & A. Jusczyk (Eds.) Jusczyk Lab Final Report. Retrieved from http://hincapie.psych.purdue.edu/Jusczyk. Hume, E. and Johnson, K. (Eds.) (2001). The role of speech perception in phonology. San Diego, California, USA: Academic Press Ingram, D. (1989). First language acquisition. Method, description and explanation. Cambridge: Cambridge University Press. Inkelas, S., & Rose, Y. (2003). Velar fronting revisited. In B Beachley, A. Brown and F. Conlin (Eds.) Proceedings of the 27th annual Boston University Conference on Language Development (pp. 334-345). Somerville, MA: Cascadilla. Jakobson, R. (1968). Child language, aphasia and phonological universals. The Hague, Mouton. Jakobson, R., Fant, G., & Halle, M. (1963). Preliminaries to Speech Analysis: The Distinctive Features and their Correlates. Cambridge, Mass.: MIT Press. Jusczyk, P. W (1992). Developing phonological categories from the speech signal. In C. E. Ferguson, L. Menn, & C. Stoel-Gammon (Eds.), Phonological development: Models, research and implications. Parkton, MD: York Press.

85 Jusczyk, P. W. (1994) Infant speech perception and the development of the mental lexicon. In J. C. Goodman and H. C. Nusbaum (Eds.) The development of speech perception: The transition from speech sounds to spoken words (pp. 227-270). Cambridge, MA: MIT Press. Jusczyk, P. W. (1997). The discovery of spoken language. Cambridge, MA: MIT Press. Jusczyk, P. W. Luce, P., & Charles-Luce, J. (1994). Infants’ sensitivity to phonotactic patterns in the native language. Journal of Memory and Language, 33, 630–645, Jusczyk, P. W., Friederici, A. D., Wessels, J. M., Svenkerud, V. Y., & Jusczyk, A. M. (1993). Infants’ sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32, 402–420. Jusczyk, P. W., Jusczyk, A. M., Kennedy, L. J., Schomberg, T., & Koenig, N. (1995). Young infants' retention of information about bisyllabic utterances. Journal of Experimental Psychology: Human Perception and Performance, 21(4), 822-36. Jusczyk, P., Goodman, M. B., & Baumann, A. (1999) Nine-month-olds’ attention to sound similarities in syllables. Journal of Memory and Language, 40(1), 62-82. Jusczyk, P., Rosner, B. S., Cutting, J. E., Foard, L. B. (1977). Categorical perception of nonspeech sounds by 2-month-old infants. Perception and Psychophysics, 21, 50-54. Kager, R., Pater, J., & Zonneveld, W. (Eds.) (2004). Constraints in phonological acquisition. Cambridge, UK: Cambridge University Press. Keating, P. (1987). A survey of phonological features. UCLA Working Papers in Phonetics, 66, 124-150. Keating, P. (1988) Underspecification in phonology. Phonology, 5, 275-292. Kenstowicz, M. J. (1994). Phonology in generative grammar. Cambridge, MA: Blackwell. Kenstowicz, M. J. and Kisseberth, C. W. (1979). Generative phonology: Description and theory. San Diego, California: Academic Press. Kingston, J. (2003) Learning foreign vowels. Language and Speech, 46(2-3), 295-349. Kiparsky, P. (1995). The phonological basis of sound change. In J. Goldsmith (ed.) The handbook of phonological theory (pp.640-670). Oxford and Cambridge, MA: Blackwell. Klatt, D. H. (1967). Psychophysical reality of the distinctive features of phonology. The Journal of the Acoustical Society of America. 42(5), 1181-1182.

86 Kuhl, P. K. (1981). Discrimination of speech by nonhuman animals: Basic auditory sensitivities conducive to the perception of speech-sound categories. The Journal of the Acoustical Society of America. 70(2), 340-349. Kuhl, P. K. (1987). The special-mechanisms debate in speech perception: Nonhuman species and nonspeech signals. In S. Harnad (Ed.) Categorical perception: The groundwork of cognition (pp. 355-386). New York: Cambridge University Press. Kuhl, P. K. (1991). Human adults and human infants show a ‘perceptual magnet effect’ for the prototypes of speech categories, monkeys do not. Perception in Psychophysics, 50(2), 93-107. Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., & Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by 6 months of age. Science, 255, 606-608. Labov, W., Karen, M., & Miller, C. (1991). Near-mergers and the suspension of phonemic contrast. Language Variation and Change, 3, 33-74. Ladefoged, P. (2005). Features and parameters for different purposes. UCLA Working Papers in Phonetics, 104, 1-13. Ladefoged, P., & Cho, T. (2001). Linguistic contrasts to reality: The case of VOT. In N. Gronnum, & J. Rischel (Eds.), Travaux Du Cercle Linguistique De Copenhague,Vol. XXXI (pp. 212-225). Copenhagen: C.A. Reitzel. Lass, R. (1975). How intrinsic is content? Markedness, sound change, and ‘familiy universals’. In D. L. Goyvaerts, & G. K. Pullum (Eds.) Essays on the Sound Pattern of English. Belgium: Story-Scientia Ghent. Liberman, A., & Mattingly, I. (1985) The motor theory of speech perception revised. Cognition, 21(1), 1-36. Liberman, A., Harris, K., Hoffman, H., & Griffith, B. (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology, 54(5), 358-368. Lindblom, B. (1990). On the notion of ‘possible speech sound’. Journal of Phonetics, 18, 135-152.

87 Lisker, L. (1978). Rapid versus Rabid: A catalogue of acoustic features that may cue the distinction. Haskins Laboratories Status Report on Speech Research SR-54,127–132. Lisker, L., & Abramson, A. S. (1964). A Cross-Language Study of Voicing in Initial Stops: Acoustical Measurements. Word, 20, 384-422. Lisker, L., & Abramson, A. S. (1971). Distinctive features and laryngeal control. Language, 47(4), 767-785. Maye, J. (2000). Learning speech sound categories from statistical information. PhD Dissertation, University of Arizona. Maye, J., & Weiss, D. (2003). Statistical cues facilitate infants’ discrimination of difficult phonetic contrasts. In B. Beachley, A. Brown & F. Conlin (Eds.), Proceedings of the 27th Annual Boston University Conference on Language Development (pp.508-518). Sommerville, MA: Cascadilla Press. Maye, J., Werker, J., & Gerken, L. A. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82, B101–B111. McMurray, B., & Aslin, R. N. (2005). Infants are sensitive to within-category variation in speech perception. Cognition, 95(2), B15-26. Mielke, J. (2002). Turkish /h/ deletion: evidence for the interplay of speech perception and phonology. ZAS Papers in Linguistics 28, 55-72. Mielke, J. (2004a). The Emergence of Distinctive Features. PhD Dissertation, Ohio State University. Mielke, J. (2004b). What ambivalent segments can tell us about the universality of distinctive features. Talk given at the 78th Linguistics Society of America, Boston, Jan 811. Mielke, J. (2005). Ambivalence and ambiguity in laterals and nasals. Phonology, 22: 169203. Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. The Journal of the Acoustical Society of America, 27(2), 338-352 Miller, J. L. (1997). Internal structure of phonetic categories. Language and Cognitive Processes. 12(5-6), 865-869.

88 Mitterer, H., Csecpec, V., Honbolygoc, F., & Blomert, L. (2006). The recognition of phonologically assimilated words does not depend on specific language experience. Cognitive Science, 30, 451–479. Moreton, E. (2006). Phonotactic learning and phonological typology. Paper presented at NELS 37, University of Illinois Urbana-Champaign, October 14. Morrison, G. S. (2005). Phonetic naturalness and phonological learnability. Paper presented at The 13th Manchester Phonology Meeting, Manchester, UK, May 26-28. Narayan, C. R. (2006). Acoustic-perceptual salience and developmental speech perception. PhD Dissertation, University of Michigan. Nearey, T. M. (1997). Speech perception as pattern recognition. Journal of the Acoustical Society of America, 101(6), 3241-3245. Newport, E. L. (1982). Task specificity in language learning. Evidence from speech perception and American Sign Language. In Wanner, E. and Gleitman, L. R. (Eds.) Language acquisition: The state of the art (pp. 450-486). Cambridge: Cambridge University Press. Newport, E.L., & Aslin, R.N. (2004). Learning at a distance: I. Statistical learning of nonadjacent dependencies. Cognitive Psychology, 48, 127-162. Obleser, J., Eulitz, A., & Lahiri, C. (2004). Magnetic brain response mirrors extraction of phonological features from spoken vowels. Journal of Cognitive Neuroscience, 16(1), 3139. Obleser, J., Lahiri, C., & Eulitz, A. (2003). Auditory-evoked magnetic field codes place of articulation in timing and topography around 100 milliseconds post syllable onset. NeuroImage, 20, 1938-1847. Ohala, J. J. (1983). The Origin of Sound Patterns in Vocal Tract Constraints. In P. F. MacNeilage (Ed.), The production of speech (pp.189-216). New York: SpringerVerlag.. Ohala, J. J. (1995). Phonetic explanations for sound patterns: Implications for grammars of competence. Proceedings of the Thirteenth International Congress of Phonetic Sciences, Vol. 2 (pp.52-59). Stockholm.

89 Ohala, J. J. (2005). Phonetic explanations for sound patterns: Implications for grammars of competence In W. J. Hardcastle & J. M. Beck (Eds.) A figure of speech. A festschrift for John Laver. (pp. 23-38) London, UK: Erlbaum. Ohala, J. J., & Ohala, M. (1993). The phonetics of nasal phonology: Theorems and data. In M. K. Huffman & R. A. Krakow (Eds.), Nasals, nasalization, and the velum. Vol.5: Phonetics and Phonology Series (pp. 225-249). San Diego, CA: Academic Press. Onishi, K. H., Chambers, K. E., & Fisher, C. (2002). Learning phonotactic constraints from brief auditory experience. Cognition, 83(1), 13-23. Padgett, J. (1995). Stricture in Feature Geometry. Stanford, California: CSLI. Pater, J., & Tessier, A. M. (2003). Phonotactic knowledge and the acquisition of alternations. In M.J. Solé, D. Recasens, & J. Romero (Eds.) Proceedings of the 15th International Congress on Phonetic Sciences (pp.1777-1780). Barcelona. Pater, J., C. Stager, & J. Werker. (2004). The perceptual acquisition of phonological contrasts. Language, 80(3). Peperkamp, S. LeCalvez, R., Nadal, J.P., & Dupoux, E. (in press). The acquisition of allophonic rules: Statistical learning with linguistic constraints. Cognition. Peperkamp, S., K. Skoruppa, & E. Dupoux (2006). The role of phonetic naturalness in phonological rule acquisition. In Bamman, T. Magnitskaia & C. Zaller (Eds.) Proceedings of the 30th Annual Boston University Conference on Language Development (pp. 464-475). Somerville, MA : Cascadilla Press. Phillips, C., Pellathy, T., Marantz, A., Yellin, E., Wexler, K., Poeppel, D., et al. (2000) Auditory cortex accesses phonological categories: An MEG Mismatch study. Journal of Cognitive Neuroscience, 12, 1038-1055. Phillips. C. (2001) Levels of representation in the electrophysiology of speech perception. Cognitive Sicence, 25(5), 711-731. Pierrehumbert, J (2001). Exemplar dynamics: Word frequency, lenition, and contrast. In Bybee, J. and P. Hopper (eds) Frequency effects and the emergence of linguistic structure (pp. 137-157). John Benjamins, Amsterdam. Pierrehumbert, J. (2000). The phonetic grounding of phonology. Les Cahiers de l’ICP, Bulletin de la Communication Parle, 5, 7-23.

90 Pierrehumbert, J. (in press) Why phonology is so coarse-grained. McQueen, J. and Cutler, A. (Eds) special issue of Language and Cognitive Processes. Pisoni, D. B. and Lazarus, J. H. (1974). Categorical and non-categorical modes of speech perception along the voicing continuum. The Journal of the Acoustical Society of America, 55(2), 328-333. Pulleyblank, E. G. (2003). Non-contrastive features or enhancement by redundant features? Language and Linguistics. 4(4), 713-755. Pulvermüller, F., Huss, M., Kherif, F., Moscoso del Prado Martin, F., Hauk, O. & Shtyrov, Y. (2006) Motor cortex maps articulatory features of speech sounds. Proceedings of the National Academy of Sciences of the United States of America, 103(20), 7865-7870. Pycha, A.; Nowak, P; Shin, E., & Shosted, R. (2003). Phonological rule-learning and its implications for a theory of vowel harmony. In Tsujimura, M. and Garding, G (Eds.) WCCFL 22 Proceedings (pp.101-114). Somerville, MA: Cascadilla Press. Rice, K. & P. Avery (1995). Variability in a deterministic model of language acquisition: a theory of segmental elaboration. In Archibald (Ed.) Phonological acquisition and phonological theory (pp. 23-42). Hillsdale, NJ: Lawrence Erlbaum. Roelofs, A. (1997). The WEAVER model of word-form encoding in speech production. Cognition, 64, 249-284. Roelofs, A. (1999). Phonological segments and features as planning units in speech production. Language and Cognitive Processes, 14(2); 173-200. Rosen, S., & Howell, P. (1987). Auditory, articulatory, and learning explanations of categorical perception in speech. In S. Harnad (Ed.) Categorical perception: The groundwork of cognition (pp. 113-160). New York: Cambridge University Press. Saffran, J. R., & Thiessen, E. D. (2003). Pattern induction by infant language learners. Developmental Psychology. 39(3), 484-494. Sagey, E. (1990). The representation of features in non-linear phonology: The articulator node hierarchy. New York and London: Garland Publishing. Sandler, W., & Lillo-Martin, D. C. (2006) Sign language and linguistic universals. Cambridge: Cambridge University Press.

91 Schacter, D., Reiman, E., Curran, T., Yun, L. S., Bandy, D., McDermott, K. B., et al. (1996) Neuroanatomical correlates of veridical and illusory recognition memory: Evidence from Positron Emission Tomography. Neuron, 17, 267-274. Scobbie, J. M. (2005). The phonetics-phonology overlap. QMUC Speech Science Research Centre Working Paper WPI. Seidl, A. and Buckley, E. (2005). On the learning of arbitrary of arbitrary phonological rules. Language Learning and Development. 1(3-4), 289-316. Stampe, D. (1973). A dissertation on natural phonology. NY: Garland. Steriade, D. (1995). Underspecification and markedness, In J. Goldsmith (Ed.) The handbook of phonological theoryy. (pp.114-174). London: Blackwell. Steriade, D. (2000). Paradigm Uniformity and the Phonetics/Phonology Boundary. In J. Pierrehumbert and M. Broe (Eds.) Papers in Laboratory Phonology vol. 6 (pp.313-334). Cambridge, UK: Cambridge University Press. Steriade, D. (2001). Directional asymmetries in place assimilation: A perceptual account. In E. Hume and K. Johnson (Eds.) The role of speech perception in phonology (pp.219250). San Diego, California, USA: Academic Press. Stevens, K. N. (1972). The quantal nature of speech: Evidence from articulatory-acoustic data. In E. E. David and P. B. Denes (Eds.), Human communication: A unified view. (pp. 51-66) New York: McGraw-Hill. Stevens, K. N. (1989). On the quantal nature of speech. Journal of the Acoustical Society of America, 17, 3-45. Stevens, K. N. (2003). Acoustic and perceptual evidence for universal phonological features. In M.J. Solé, D. Recasens, & J. Romero (Eds.) Proceedings of the 15th International Congress on Phonetic Sciences (pp. 33-38). Barcelona. Stevens, K. N. (2003). Toward a model for lexical access based on acoustic landmarks and distinctive features. Journal of the Acoustical Society of America. 111(4), 1872-1891. Stoel-Gammon, C. (1985). Phonetic inventories, 15-24 months: A longitudinal study. Journal of Speech and Hearing Research, 28, 505-512. Studdert-Kennedy, M. & Shankweiler, D. (1970). Hemispheric specialization for speech perception. Journal of the Acoustical Society of America, 48, 579-594.

92 Studdert-Kennedy, M., Shankweiler, D. & Pisoni, D (1972). Auditory and phonetic processes in speech perception: Evidence from a dichotic study. Cognitive Psychology, 3, 455-466. Swingley, D. and Aslin, R. N. (2002). Lexical neighborhoods and the word-form representation in very young children. Cognition, 76, 147-166. Troubetzkoy, N. S. (1939/1969). Principles of Phonology. Berkeley & Los Angeles: University of California Press. Utman, J., Blumstein, S., & Burton, M. (2000) Effects of subphonetic and syllable structure variation on word recognition. Perception and Psychophysics, 62(6), 12971311. Vihman, M. (1996). Phonological development: The origins of language in the child. Cambridge, MA: Blackwell. Wang, M. D., & Bilger, R. C. (1973). Consonant confusions in noise: a study of perceptual features. Journal of the Acoustical Society of America, 54(5), 1248-1266. Wang, Y., Yongman, A. and Sereno, J. (2003) Acoustic and perceptual evaluations of tone production before and after perceptual training. Journal of the Acoustical Society of America, 113(2), 1033-1043. Warker, J., & Dell, G. (2006). Speech Errors Reflect Newly Learned Constraints. Journal of Experimental Psychology. Learning, Memory and Cognition. 32(2), 387-399. Werker, J, & Lalonde, C. (1988). Cross-Language Speech Perception: Initial Capabilities and Developmental Change. Developmental Psychology, 24(3), 672-683. Werker, J. F. and Pegg, J. E. (1992). Speech perception and phonological acquisition. In C. E. Ferguson, L. Menn, & C. Stoel-Gammon (Eds.), Phonological development: Models, research and implications. Parkton, MD: York Press. Werker, J. F. and Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7, 49-63. Werker, J. F., Gilbert, J. H. V., Humphrey, K., & Tees, R. C. (1981). Developmental aspects of cross-language speech perception. Child Development, 52, 349-353.

93 Werker, J., & Fennell, C. (2004). From listening to sounds to listening to words: Early steps in word learning. In: D. G. Hall, & S. R. Waxman (Eds.) Weaving a Lexicon (pp. 79-109). Cambridge, MA: MIT Press. Wickelgren, W. A. (1965). Distinctive features and errors in short-term memory for English consonants. Journal of the Acoustical Society of America. 38, 583-588. Wickelgren, W. A. (1966). Distinctive features and errors in short-term memory for English vowels. Journal of the Acoustical Society of America. 39:2, 388-398. Wilson, C. (in press). An experimental and computational study of velar palatalization. Cognitive Science. Wilson. C. (2003) Experimental investigation on phonological naturalness. In G. Garding and M. Tsujimura (Eds.) WCCFL 22 Proceedings, (pp. 101-114). Somerville, MA: Cascadilla Press. Wright, R. (2004). A review of perceptual cues and cue robustness. In B. Hayes, R. Kirchner, & D. Steriade (Eds.) Phonetically-based phonology (pp. 34-57). Cambridge, UK: Cambridge University Press. Zamuner, T. S. (2006) Sensitivity to word-final phonotactics in 9- and 16-month-old infants. Infancy, 10(1), 77-95.


to produce speech errors when their reading rhythm is accelerated. ...... Items within blocks were randomly selected (without replacement) by the program ...... Aslin, R. N., & Pisoni, D. B. (1980) Effects of early linguistic experience on speech.

2MB Sizes 2 Downloads 193 Views

Recommend Documents

Phonological categories in infant-directed speech ...
Aug 18, 2011 - most previous work was based on 1 or 2 dimensions, whereas here we used all .... to the English and French comparison in the project website.

pdf-1594\multilingual-aspects-of-speech-sound-disorders-in ...
There was a problem previewing this document. Retrying... Download ... below to open or edit this item. pdf-1594\multilingual-aspects-of-speech-sound-disorde ...

Language and Speech
2 Hong Kong Institute of Education ... Conrad Perry, Swinburne University of Technology, School of Life and Social ...... of Phonetics, 26(2), 145–171. DELL, F.

Language input and semantic categories: a relation ...
Nov 9, 2004 - of function, form or meaning is arguably one of our most important ... Address for correspondence: Arielle Borovsky, Department of Cognitive Science #0515, ... cognitive and linguistic functioning around the middle of the second year, .