On the role of the amplitude envelope for the perception of lb] and [w Phil Shinn and Sheila E. Blumstein
Department of Linguistics, BrownUniversity, Providence, RhodeIsland02912
{Received 22 April 1983;accepted for publication 28 November1983} Thisstudyinvestigated theroleof theamplitudeenvelope in thevicinityof consonantal release in theperception ofthestop-glide contrast. Threesetsofacoustic [b-w]continua, eachin thevowel environments [a]and[i], weresynthesized usingparameters derivedfromnaturalspeech. In the firstset,amplitude,formantfrequency, anddurationcharacteristics wereinterpolated between exemplarstopandglideendpoints.In the secondset,formantfrequencyandduration characteristics wereinterpolated,but all stimuliweregivena stopamplitudeenvelope.The third setwaslike the second, exceptthat all stimuliweregivena glideamplitudeenvelope. Subjects weregivenbothforced-choice andfree-identification tasks.The resultsof the forced-choice task indicatedthat amplitudecueswereableto overridetransitionslope,duration,and formant frequency cuesin theperception of thestop-glidecontrast.However,resultsfromthefreeidentification taskshowedthat,althoughpresence of a stopamplitudeenvelope turnedall stimuli otherwiselabeledasglidesto stops,thepresence of a glideamplitudeenvelope changedstimuli labeledotherwise asstopsto fricativesratherthanto glides.Theseresultssupporttheviewthat
theamplitude envelope inthevicinityoftheconsonantal rdlease isa criticalacoustic property for the continuant/noncontinuant contrast.The resultsare discussed in relationto a theoryof acoustic invariance.
. PACS numbers:43.70.Dn, 43.70.Ve
overridethe acoustic cuesprovidedby form.ant onsetfrequencies andformanttransition rate,extent,andduration. Recentmeasurements of naturalspeech tokenssuggest. Previous research on theperception of the [b-w]con-
INTRODUCTION
that an invariantacousticcorrelateof the stop-glidecontrastis therelativechangein theamplitudeenvelope in the vicinityof thestopandgliderelease. In particular,Mackand Blumstein{1983)chartedthe relativechanges in the amplitudeof the waveformfor voicedstopsandglidesat adjacent samples immediately priorto andfollowingthe consonant release.They found that for stopsthere was an abrupt increasein relativeamplitudeat the stoprelease.Unlike the
trast focusedon the role of transition duration, rate, and
extent{Liberman etaL, 1956;O'Connoretal., 1957;Cooper et aL, 1976;DieM, 1976;Miller and Liberman,1979;GodfreyandMillay, 1981;Schwab etaL, 1981).Results indicated thatstimuliwith rapidformanttransitions (about40 ms)are perceived asstopsandthosewith slowformanttransition
rates (about 80ms)areperceived asglides. Holding thehigher formantsconstant, Suzuki(1969)showeda complemen-
stops, glides exhibited nosuchlargechange inrelativeamplitude.In fact,for glidestherewasa gradualchangein the amplitudeenvelope at therelease relativeto nearbyportions
tar3/relation between therateoffrequency change off I and
of the waveform.
toperceive astop compared toaglide. Incontrast, Kasuya et
Consideration of the articulatoryconfiguration in the
the amount off I transition. With an increasein transition
rate,therewasa reductionin thefrequency extentrequired
aL (1982)foundthattheextentoff 1,andnotitsduration, production ofstops andglides suggest• thatthese amplitude wastherelevantvariablein the perception of [b] and [w]. differences area naturalconsequence of the way thesetwo classes of speech sounds areproduced. For stops,thereis a complete closureof thevocaltractwitha resultantincrease in air pressure behindtheconstriction. With therelease of theclosure,thereisan abruptandtransientchangein pressure.The'consequence isa rapidriseof acoustic energyat the release of a stopconsonant. In contrast,glidesareproduced with onlya partialconstrictionin the vocaltract anda gradualreleaseintotheconfiguration of thefollowingvowel.As a result,fortheglidestherelativechange in amplitude at the releaseis considerably lessand moregradualthan that of
Schwabet aL (1981)foundthatwhileholdingF 1 constant, theduration andextentofF2 transitions significantly affectedtheperception ofstops andglides. ShortF 2durations and extents signaled lb],wherea•longF2 durations andextents signaled [w].Rateoffrequency change ofF2 didnotseem to contribute significantly to perception of thestop-glidecontrast.
Thesestudiesalsoshowedthattheperceptionof the [bw] contrastseemsto be significantly' affectedby phonetic context.In particular,thelocusof thephoneticboundaryfor a [b-w]continuum variedacross vowelcontexts asa function of the differentextentsof the formanttransitionsoccurring stops. The purpose of thisexperiment isto investigate thcrole in thesedifferentvowelcontexts(Libermanet al., 1956),and of amplitude characteristics in theperception ofthecontrast asa functionof thedurationof theentireCV syllable(Miller between thestop[b]andtheglide[w].We wereinterested in and Liberman, 1979;Miller, 1980}. However,in noneof thesestudieswerethe amplitude ßdetermining whetherchanges in theamplitudecharacterischaracteristics of the stimulia controlledvariable.In genticsin the vicinityof the consonant releasewouldbeableto 243
J. Acoust.Soc.Am.75 (4).April1984
0001-4966/84/041243-10500.80
¸ 1984Acoustical Societyof America
1243
eral,therisetimeof formantamplitudes wascovariedwith formanttransitionrateandduration.For example,a 40-ms transitionstimuluswouldreachmaximumamplitudein 40 ms and an 80-ms transition stimulus would reach maximum
1960)and in part on the experimenter's perceptionof the synthesized stimuli(cf.Kasuyaet aL, 1982}.The valuesused in the experimentreportedhere are similarto the values obtainedby Mack and Blumstein{1983)in their analysisof stopsconsonants andglides,and are alsosimilarto the valuesobtaindby Kewley-Port{1982}in her analysisof stop
amplitudein 80 ms.The risetime wasalsolinearoverthe formanttransitions. Thesecharacteristics areincompatible with thoseobtainedfromthemeasurement of naturalspeech (Mack and Blumstein,1983),whereanalysisof the relative Amplitude valueswere chosenso that the synthesized change in amplitudeshowed a gradualchangeforglidesand stimuli contained amplitude envelopessimilar to those an abruptchangefor stops.Moreover,noneof the synthetic foundin natural speech.Figure 1 showsa three-dimensional stimuli contained the acoustic attributes characteristic of displayof two naturalspeechsyllables,[ba]and [wa]. Note naturalstopconsonants, where,especially ff thereisa burst, thatfor [ba],thereisa rapidrisein theformantamplitudes at thereisa transient(about10-15ms)but verylargeamplitude all frequencies in the vicinityof the stoprelease(around40 increase at consonantal release. ms},whereasfor [wa], the changeis not as abrupt.AmpliIn this study, we investigatethe perceptualconse- tudeparameters usedin the [ba-wa]serieswereidenticalto ßquences of theamplitudechanges foundin naturalspeechin thoseusedin the [bi-wi] series. theperception of a [b-w]continuum.Woulda stimuluswith Input parametersto the synthesizerfor the endpoint the formantfrequencycharacteristics of a [bi] or [ba],but [ba-wa]and [bi-wi]stimuliare shownin TableI. All stimuli withamplitudecharacteristics ofa [wi]or [wa](orviceversa) were of 445-ms duration and contained the same fundamenbe perceived asbeginningwith a stopor glide?To thisend, tal frequency (Fo}.Therewas50-msprevoicing priorto the wesynthesized twobaseline continua,[ba-wa]and[bi-wi],in consonantal releasefor all stimuli.We includedprevoicing whichthe formantfrequencies, transitionratcs,and durafor all the stimuli for the followingreasons:(1) natural tions,andamplitudecharacteristics for theendpointstimuli speechmeasurements indicatethat prevoieingis common were derived from natural speechmeasurements, and the for stopsin Englishandisvirtuallyalwayspresentfor glides other stimuli on the continuawere linear interpolations readin citationform{of.Mack andB!umstein,1983},{2}the betweentheseendpoints.We then createdfour more conpresenceof prevoicingfor all stimulimadetheir structure tinua in whichthe frequencies, formanttransitions,and dumoresimilarthanifprevoieingwereonlyincludedonthe [w] ration characteristics
were the same as in the first two con-
tinna, but all stimuli on each continuum containedeither a
end of the continua, and {3} we wanted to insure that the basicstructureof our stimuli was similar to that usedby
[w]-amplitudeenvelope,i.e., a gradualonsetof energyat consonantal releasefor the entiretestseries,or a [hi-amplitudeenvelope, i.e.,a rapidonsetof energyat theconsonant release for the entire series.
The strongest resultwouldbeif perception of the fully specifiedendpointstimuli[b] and [w] werealteredby the changein amplitudeenvelope.A weakerdemonstrationof
8O
160
the effect would be if there was a shift in the locus of the
phoneticboundaryfor a [b-w]continuumwhenthe amplitudeenvelopewasaltered,comparedto the baselinecontin-
60
Y- 120
d8 4O
80
uum. 2O
-
40
-ms
I. METHODS I
A. Stimuli
Stimuliweregeneratedusingthe parallelbranchof the Klatt (1980) parallel/cascadesynthesizer.The synthesizer was implementedon a PDP 11-34 computerby J. Mertus at the Brown University PhoneticsLaboratory.Formant frequencyvaluesand timing characteristics for the [bi-wi] serieswereadaptedfrom averagevaluesfor onemalespeaker, asreportedby Mack and Blumstein(1983).Formant values and timingcharacteristics for the 0aa-wa]seriescamefrom
LPC analysis of sixtokenseachof [ba]and[wa]produced by one male speaker.For the LPC analysis,a full Hamming windowof 20 mswasusedto computethe spectraltransfer functionevcry10 ms. It is worthwhilenotingthat the measuredvaluesderivedfrom naturalspeecharc ditfcrentfrom thoseusedin syntheticspeechexperiments wherethe particularparameters chosenarebasedin parton idealizedvalues derivedfrom the acoustictheoryof speechproduction(Fant, 1244
J. Acoust.Soc. Am.,VoL75, No. 4, April1984
2
RHz
3-
4
S
8O
6O
120
d8 4O
20
o
I
2
3
4
5
0
-.•$
FIG. l. Three-dimensional LPC spectrafor the first 160msof the natural syllables [ba]{top)and[wa]{bottom). Spectraweregcnerated usinga 20-ms full Hammingwindowand a framemovementof 10 ms.
P. Shinnand S. E. Blumste'n:Perceptionof [b] and [w]
1244
TABLE I. Synthesizer inputparameters for endpoint stimuli[bi],[ba],[wi],and[wa].Valuesof thesynthesis parameters aregivenin columnn. For a descriptio• ofparameters seetext.Timeisrepresented bytherowsinms.Thesynthesizer updated values every5 m• Wherethetimevaluebetween entries is greaterthan5 ms,or whereblanksappear,intermediate valueswerecalculated by piecewise linearinterpolation. ms
AV
AI
A2
A3
A4
A5
b
w
b
w
b
w
b
w
b
w
b
w
50 5O
40 44
50
50
10
10
10
10
10
10
10
10
30
10 40
10 45
0
Fo
B2
83
95
500
500
500 110
500 170
110
170
35
30
44
50
50
10
10
10
10
95
40 45 50 55
10 5 60 60
45 47 49 51
10 10 40 40
43 43 50 50
14 18 22 26
13 16 19 22
13 16 19 22
120 120 120
60 65 70 75 80 85 90 95 100 410 445
55 55 55 60 60
53 55 57 59 60
45 50 55
65
65
65
65
0
0
38
16 21 27 33
40 55 55
39 44 50
55 58
30 33 36 39 43 46 49 52 55
40 45
25 28 32 35 38 41 44 47 50
50
50
58
55
45
50
10 35
10 55 55
40
45
25 28 32 35 38 41 44 47 50 100
55
50
45
50
80
Formantfrequencyparameters for eachvowelseries
vowel[a] ms
FI b
vowel[i]
F2 w
b
F3 w
FI
b
F2
F3
w
b
w
b
w
b
w
0
274
346
1300
500
2000
2300
259
323
500
500
1500
1300
35
362
412
1300
664
2000
2311
254
337
500
690
1500
1625
40 45 50
466 466 480
421 423 425
1248 1248 1210
687 710 734
2366 2366 2362
2313 2315 2316
367 367 362
339 341 342
1594 1594 1825
717 802 897
2460 2460 2445
1671 1746 1746
65
653
431
1139
804
2359
2321
357
352
356
356 2041 2048
1656 2041
2701 2710
2401 2701
2135
2135
2832
2832
80 100 125
656
653
150 445
675
675
1124 1073
1139 1073
2363 2400
2350 2400
336
336
other researchers who foundcontexteffectsin the perception of [b] and [w] (cf. Miller and Liberman,1979).In fact, synthesis of theendpoint[ba]stimuliwith andwithoutprevoicingresultedin no perceptual differences in the stoplike qualityof the consonants. To simulatethe fact that prevoicingin naturalspeech contains predominantly low-frequencyperiodicity, the bandwidthof the higherformants(B 2 and B 3} were setat theirmaximumvalueof 500Hz throughoutprevoicing, and theamplitudes ofthehigherformants(.42-.4 5}werekeptlow (around10dB)andgraduallyincreased priorto theonsetof
for asstatedabove,theamplitudefor theseformantswasset
formant transitions. The bandwidth of the first formant was
Table I. 2
kept constantthroughoutthe stimuli.The resultof this manipulationof bandwidthandamplitudeparameters wasthat therewasverylittle energyin the upperfrequencies during prevoicing.Thus Table I is somewhatmisleadingsincethe abruptchanges in frequencybetweenprevoieingandconsonantalreleasearenullifiedby the verylow amplitudevalues
The normal continua containedinterpolatedvalues fromthelb] and[w] endpointstimuliforall parameters, i.e., frequency, amplitude,andduration.The [hi-amplitude and [w]-amplitudecontinuaincludedinterpolatedvaluesalong the continuumonly for the frequencyparameters.Fof the [b]-amplitude continua,theamplitudeparameters {AV,•4 1-
and wide bandwidths throughout prcvoicing. Parameters
.4 5}for all stimulion thecontinuumwerethe sameasthe lb]-
which controlled formant transition motions from the con-
endpointstimulus.Similarly,for the [w]-amplitudecontinua,theamplitudeparameters for all stimuliwereequivalent to the [w]-endpointstimuli. Thus, for each vowel
sonantreleaseto the vowelbeganfor the stopconsonants at 50ms.Notethatalthoughabruptformantmotionsactually beganforF2 andF3 at 40 ms,theywere 'minimally excited, 1245
J. Acoust.Soc.Am.,Vol.75, No.4, April1984
at a minimum
value and the bandwidth
was set at a maxi-
mumduringprevoicing. Justbeforethestoprelease, theamplitudeof voicing(AV) wasdecreased to simulatethesimilar decrease oftenobservedin naturalspeech. We shallreferto thefourbasicendpointstimulias[bi], [wi], [ba], and [wa]. From thesefour setsof input parameters,sixcontinuaweregenerated, threefor eachvowelenvironment. Each continuum consisted of 11 stimuli whose in-
putparameters werederivedbyinterpolating thedifferences betweentheendpointstimuliforeachpointin timeshownin
environment, the three continua contained identical forP. ShinnandS. E. Blumstein: Perception of [b] and[w]
1245
mantfrequency andduration interpolations, whiletheam- B. Subjects plitude characteristics varied. Figure2 depicts three-dimensionalLPC spectra of theendpoint stimulifor thenormal amplitude [bi-wi]and[ba-wa]continua (a)-(d)andfor the Subjects were20 BrownUniversitystudents whowere [b]-amplitude (t•and(h)and[w]-amplitude continua (c)and paidfor their participation.Therewereten malesand ten females, all werenativespeakers of English, andnonere-
FIG.2.Three-dimensional LPC spectra forthe first 160 ms ofthe synthesized test stimuli. (a)-(d) are the normal continuum endpoints (both frequencies and amplitudes are interpolated): (a)[ba]; (b) [wa]; (c} [bi]; (d) [wi]. (e)-(h) are the constant amplitude continua endpoints: (e) [ba] frequencies, glide amplitudes; [wa] frequencies, stop amplitudes; {g)[bi]frequencies, glide amplitudes; (h)[wi]frequencies, stop amplitudes.
1246 J.Acoust. Soc. Am., Vol. 75,No. 4,April 1984
P.Shinn and S.E.Blumstein: Perception of[b]and [w]
1246
166
portedanyhearingloss.No subject hadanyprevious experiencein phonetictranscription or in listeningto synthetic speech.
,
96-
C. Procedure
Stimuliwerebalanced for levelat synthesizer outputto bewithin0.1 dB of eachother.Fourtesttapesweregenerated,twoforeachvowelenvironment. Thefirsttwotapes(one for eachvowel)containedthe normalamplitudecontinua and consisted of ten repetitionsof eachof the 11 stimuli, totaling110stimuli.The secondtwo tapesconsisted of ten repetitions of eachof the stimulion the [b]-amplitudeand [w]-amplitude continua,totaling220stimuli.Therewere2 s of silence betweenstimulianda 6-spauseaftereachblockof ten stimuli.At the beginningof eachtape,all stimuliwere playedonceto familiarizethe subjectswith the test items. The normalamplitudetapestook approximately10 min to complete,the [b]- and [w]-amplitudetapestook about20 min.The tapeswereplayedat a comfortable listeninglevel overan MCI taperecorderandAKG headphones. Subjects weredividedinto two groupsof ten subjects each.Onegroupreceivedthethree[bi-wi]continuaandthe otherthe three[ba-wa]continua.For eachgroup,subjects weregiveneachof the testtapestwice.In the firstpresentation, theyweretold that theywouldheareither[bi]-likeor [wi]-like{alternatively, [ba]-likeor [wa]-like)syllables. Their taskwasto markeither[b] or [w] on the answersheet,dependingonwhethertheinitialconsonant of theteststimulus wasmoresimilarto a [b]or [w] (i.e.,a two-alternative forcedchoicetask).In the secondpresentation, subjects weretold that they would hear CV syllablesand were instructedto
66-
48-
26-
'
I 2
'
I
'
I
4
'
6 STIHULUS
I
tude
continua
for
both
tasks.
The
two-alternative
forced-choice task alwaysprecededthe free-identification task. The reasonfor choosingthis order of presentation (whichmighthavebiasedthe subjects towards[b] and [w] responses on thesecondtask)wasthat previousstudieshad onlyemployedtheforced-choice method,andwewantedthe methodsof the presentstudyto be assimilaraspossible to theseearlierstudies.Breaksweregivenbetweentesttapes. Eachsession lastedaboutan houranda quarter. II. RESULTS
Figure 3 presentsthe resultsof the two-alternative forced-choice task for both the [ba-wa]and [bi-wi] series. For thenormalcontinuain whichbothfrequencyandamplitudecharacteristics wereinterpolated,thereis the expected categorical-like identification function.In contrast,for both the [hi-amplitudeand[w]-amplitudecontinua{in whichthe
formantfrequency anddurationvalues areinterpolated), the identificationfunctionis no longercategorical.Rather, the
l
NUI'IBER
190
P
E R
80-
C E H
T
68-
E
S P 0 H S
E
øN
29-
S
write down whatever initial consonant or consonants that
theyheard(i.e.,a free-identification task).Subjects weretestedin groupsof five.They werefirstgiventhe normalamplitudecontinuafor bothtasksandthenthe [b]-and[w]-ampli-
'
(I
2
4
6
6
.... .... 10
ST II'IULUS HUIISER
FIG. 3. Resultsof theforced-choice task;percent"B" responses bystimulus numberfor the[i] (top)and[a](bottom)vowelseries. The normalcontinua responses are represented by circles,stop-amplitude continuaresponses by X's, andglide-amplitude continuaby stars.
exemplars of theiroriginalphoneticcategories. Thuswitha [hi-amplitude envelope, all ofthestimulion bothvowelseries, including theprototype [w]stimulus, were perceived mostlyas [hi-initialsyllables. For the [a] vowel series,subjects'percentages of [b] responses rangedfrom 64%-100%, and for the [i] vowelseries,from 76%-100%. With the[w]-amplitudc envelope, all stimulisavetheprototype [ba] stimuluswere perceivedpredominantlyas glide initial syllables.For the [a] vowel series,overall scores rangedfrom 52%-100% and for the [i] vowelseries,they rangedfrom 79%-100%. Thusthe amplitudecharacteristicsof the stimuliwereableto overrideformantfrequency, rate,anddurationcharacteristics in determiningperception of the stop-glidecontrast. It appearsfrom Fig. 3 that theremay havebeensome
amplitude characteristicsfor thesetwo continua have per-
differencesin the effectsof the three amplitudemanipulation
ceptualeffectson all of the stimulion the testcontinua.Althoughperception of stimulitowardsthe[b]endandstimuli towardsthe [w] endof thecontinuawerelessaffectedby the amplitudechanges, theywereno longerperceivedas good
conditionsacrossthe two vowelcontexts.In particular,the
1247
J. Acoust.Sec. Am.,Vol.75, No. 4, April1984
glide-amplitude continuumseemsto be moreeffectivein producing [w]responses in the[i] vowelseries thanin the[a] vowelseriesstimuli. In contrast,there seemto be minimal P. Shinnand S. E. Blumstein: Perceptionof lb] and [w]
1247
differences acrossvowelseriesin the otheramplitudeconditions.Resultsof threetwo-wayANOVAs with vowelsasthe betweensubjectvariableandstimulusnumberasthe within subjectvariableconfirmedtheseobservations. For theglideamplitudecontinuatherewasa vowelby stimulusnumber infraction [F(10,190}----2.84,p <0.01]. Posthocanalyses revealedthat the interactionwas due to differencesin [w] responses between the[i] and[a]vowelseries forstimuli1,2, 3, and 4 on the continua. There was no vowel by stimulus interactionfor either the normal continuum(F----0.39) nor for the stop-amplitude continuumIF = 0.91. Table II depictsthe resultsfor the free-identification taskfor all threecontinuaof the [ba-wa]and[bi-wi]series. Therewere15uniquesingleconsonant andclusterresponse categories for thevowel[i] seriesandsixresponse categories for the [a] series.Intersubjectvariationwasgreaterthan intrasubjectvariation(i.e.,eachsubjectwasmoreor lessconsistentin choosinghis or her response categories). Surprisingly,therewasa fairly largeproportionof [d] responses to stimuli1-3 for the [i] vowelseriesin thenormalcontinuum aswell asthestop-amplitude continuum.With theexception of [bi]responses forstimuli5-11 withstopamplitudes in the
[i] series, therewereveryfewdusterresponses. Consequently, we reduced the totalnumberof response categories by analyzingresponses according to the phoneticcategoryof theinitialsegment, andfurthersimplifying thesetothecategoriesstop,fricative,andglide,asshownin Figs.4 and5. The moststrikingaspectof thisanalysisis the numberof fricativeresponses (i.e., Iv]) obtained,particularlyfor'the [ba-wa]series (seebottompanelof Fig.4 andthetworighthandsidepanels6f Fig. 5}.This effectwasevenfoundin the baselinecontinuumcontainingappropriatelyinterpolated amplitudevalues.Althoughnot totally unexpected (which wasthe reasonfor includingthe free-identification taskin theexperimental design), themagnitude of theeffectwasnot foreseen.Moreover, it shouldbe rememberedthat if there wasany response biasfor the subjects, it wouldbe towards stopsandglides,for thesewereexplicitlymentioned in the instructions to the subjects in the two-alternative f.orced-
choicetask,andthe forced-choice taskpreceded the free identification.
Figure4 showstheresultsof the free-identification task
for thenormalcontinuain the [i] and[a] vowelseries.For the [ba-wa]series,fricativeresponses occurredfor every
TABLEII. Identification taskresults forthethreecontinua ofvowel[i]-(A},andvowel[a]-(B}. Rowsrepresent responses toeachstimulus along each particular continuum, withstimulus number I corresponding tothe[•]-frequency endpoint and! 1corresponding tothe[w]-frequency endpoint. Columns represent subject's response categories. Cellvalues equal thenumber ofreponses perstimulus percategory outof 100(tensubjects X tenrepetitions ofeach stimulus}. Missing values occurred whensubjects failedtorespond toa particular stimulus.
(A•-[i] series •
B
W
V
D
(3
BR
BW
•
(B}--[a]series
(3W
TH
WB
WV
BL
(3L
VB
Interpolatedampfitudes I 25 2 35 3 74 4 86 5 - 71 6 33 7 10 8 0 9 0 10 0 11 1
0 0 0 I 6 35 64 85 96 98 97
•
B
W
V
BW
•
WB
Interpolated ampHtudes
0 3 0 6 16 16 12 13 4 2 1
73 61 22 3 I 0 2 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 I 0 9 5 0 0 0 0
0 0 0 0 0 0 0 0 0 0 I
0 0 0 0 0 0 0 0 0 0 0
2 1 4 3 3 I 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 ! 0 0 0 0 0
0 0' 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 3 5 6 I 0 0 0
Glideamp•tudes
I 2 3 4
5 6 7 8 9 10 11
71 79 82 81 62 8 I 0 0 0 0
0 0 0 0 4 31 50 69 -79 82 82
26 21 16 18 32 •2 40 21 11 9 8
0 0 0 0 0 0 0 0 0 0 0
3 0 2 I
10 9 10
2 9 9 10
0 0 0 0 0 0 0 0 0 0 0
(31ideamplitudes
1 2 3 4 5
18 13 8 5 6
12 13 18 27 29
56 63 63 65 61
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 I 0
6 2 5 0 4
I 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 I 2 2 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
6 7
2 2
54 68
30 12
0 0
0 0
0 I
0 0
4 6
0 0
0 0
0 I
8 9
0 0
0 0
g 9 10 11
I 0 0
85 96 97
3 0 1
0 0 0 0
0 0 0 0
0 0 0 0
$ 4 2
0 0 0
0
0 0 0 0
2 0 0
100
0 0 0 0
I 0 0
0
0 0 0 0
0
0
0
0
4 5
28 28 30 18 6
16 18 21 25 49
56 54 49 54 41
0 0 0 3 3
0 0 0 0 0
0 0 0 0
2 I
6 7
4 0
62 74
31 19
0 2
I 4
2 !
0 0 0
0 0 0
8 9 10
17 5 5 2
3 I 0 !
1 0 0
0
71 89 90 90
3 4 5
0
5 I 0 0
7
0
$topamp•tudes
I 2 3
11
Stopamp•tudes
I 2
17 24
0 I
0 I
75 70
7 4
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
I 2
gl 88
3 2
16 10
0 0
0 0
0 0
3 4 $ 6 7
62 74 71 62 42
0 0 0 0 4
I 0 0 ! 0
30 12 4 ,6 5
6 7 I 0 0
0 0 I 5 11
0 0 I I 0
0 0 0 0 0
0 0 0 2 6
0 0 0 0 0
0 0 0 0 4
0 0 0 0 0
0 6 20 23 28
! I 2 0 0
0 0 0 0 0
3 4 $ 6 7
85 83 80 70 55
0 ! 3 4 9
15 16 17 24 24
0 0 0 2 8
0 0 0 0 3
0 0 0 0 1
8
.18
8
0
7
0
19
I
I
9
O'
7
0
30
0
0
8
61
9
17
8
4
1
9
16
9
0
6
0
22
I
0
10
0
9
0
27
0
0
9
56
14
16
9
5
0
10 11
19 10
11 !!
I 0
6 7
0 0
17 24
0 0
0 0
9 10
0 0
I0 10
0 2
27 25
0 0
0 1
10 !1
51 50
23 30
12 4
8 10
5 6
1 0
1248
J. Acoust. Soc.Am.,Vol.75,No.4, Apdl1984
P.ShinnandS. E.Blumstein: Perception of [b]and[w]
1248
more fricative responses in'the[a]vowel environment than in the[i] vowelenvironment. For the [w]-amplitude series, an interesting patternof resultsemerged. Althoughthepresence of a [w]-amplitude envelope affected theperception ofallofthestimuliofthe[bw] continuain bothcontexts, stimulion the [hi-endof the continuawereperceivedprimarilyasfricativesandlesssoas glides,and stimuli on the [w]-endof the continuumwere perceivedalmostexclusively as [w].
88-
46-
III. DISGU$$1ON 26-
Theresults of thisexperiment indicatethattheamplitudeenvelope in thevicinityof theconsonantal releaseplays a criticalrole in the perceptionof the stop-glidecontrast. The presence of a [hi-amplitude envelopewassufficiently strongto orehidetheformantfrequency, rate,andduration cuesfor a glide,and the presenceof a [w]-amplitudeenve-
0=
lopewassufficiently strong tooverride theformant frequen-
190
cy, rate, and durationcuesfor a stop.As a result,the cate-
gorical-like identifieatior• function ofa [b-w]continuum was
R
•,o/
H 60-
T
',,, .
'/
,
•,
/
i
.
E
S
P 0
_
48-
S
6 E
$ 29-
/:
changedto a singlecategoryfunctionasa resultof theparticularamplitudeenvelope presentin thestimuli.In threeout of thefourendpointstimuli,i.e.,[bi]with [w]-amplitude envelopeand[wa]and[wi]with [hi-amplitude envelope, perceptionof thephoneticcategorywasdictatedby theamplitudecues,andin thefourthcase,[ba]formantfrequency and durationbut glideamplitude,perception wasspritaboutfifty-fifty. In all theotherstimulion thecontinua,theamplitudecharacteristics weretheprimarydeterminerof perception.
Nevertheless, results from the free-identification task
indicatedthatalthoughthepresence of a stopamplitudeene ¾ I '"•'--f " ¾ a 2 4 6 6 19 velopeturnedall stimulitypicallylabeledasglidesto stops, ST IPIULUS 14UPIBœR the presenceof a [w]-amplitudeenvelopeturnedall stimuli typicallylabeledasstopsto fricativesratherthan to glides. FIG. 4. Results of the œtee-iclent,ficatlontask foœthe normal continua showThese resultsstronglysupportthe viewthat the amplitude in8 categoryresponse asa functionof stimulusnumberfor boththe[i] (top) envelopein thevicinityof theconsonantal releaseisa critical and[a] (bottom) series. C•rcle•repre•t stop-iaitial Fesponses, X's representglide-initialresponses, andsta•srepresentœricative responses. acousticpropertyfor the continuant-noncontinuant contrast. That is, thosespeechsoundswith a rapid increasein stimulusonthecontinuum,andparticularlyfor thosestimuamplitudein the vicinityof the releasewill be perceivedas and thosespeechsoundswith a li aroundtheboundary between thestopandglidecategories the classof noncontinuants (i.e.,stimuli 5-7;compare Fig.3).Forthe[bi-wi]se. ries,the gradualincreasein amplitudein the vicinityof the release numberof fricativeresponses wasmuchlessthanthatforthe will beperceived astheclassof continuants. [ba-wa]series.However,similarto the [ba-wa]series,the However,it islessclearwhattheseresultssignifyforthe majorityof thefricativeresponses occurredforthosestimuli claim that the amplitudeenvelopeprovidesan inuariant aroundthe phoneticboundary. propertyfor the stop-glidecontrast.To addressthis quesFigure 5 showsthe resultsof the free-identification tion,it isimportant toconsider whythe[b]-frequency stimutasksfor the[b]-amplitude and[w]-amplitude seriesforboth li on the [b-w] continuawereperceivedasfricativesand not vowel environments. As in the normal continua, the reglides.A reviewof the articulatoryconfigurations of the vosponses wereclassified as eitherinitial stop,glide,or fricacaltractandtheconsequent acousticcharacteristics of stops, five.It is clearthat,asin thetwo-alternative forced-choice glides,and fricativesin natural speechmay provide some clues._ results, presentation of [b]-amplitudes in boththe[i] and[a] vowelenvironments significantly affectedall stimulion the The spectralcharacteristics for placeof articulationare continuaincludingtheprototype[w] stimulus.Thusstimuli determinedby the area of closureand constrictionin the containinga [hi-amplitudeenvelopewereidentifiedasstops vocaltract.For mannerof articulation,it hasbccnsuggestcoT between80% and 100%of thetimein theenvironment of[i] by Kasuyaet al. (1982)that it is the degreeof lip opening and 60%-8•1% of the time in the environmentof [a]. It is whichdistinguishes stopsandnasalsfromglidesand.liquids. Kasuya et al. (1982) consideredthe acousticcorrelatesof worthwhilenotingthat,asin thefree-identification resultsof degreeof lip openingto bethefrequencyvalueoff I andthe the amplitudemodulatednormal continuum,there were 1249
J. Acoust.Soc.Am.,VoL75, No. 4, April1984
P. Shinnand S. E. Blumstoin: Perceptionof lb] and [w]
1249
ill
lal
lb] 46-
40-
2
4
6
8
I
tO
2
•
-1 4
ST I rIULUS HU/•Oœ•
'
I
$
'
I
$
'
I
19
1
ST IIIULUS HUNBœR
IOn.
86-
88-
amp
'
I
2
'
I
4
'
I
(;
ST HIULUS
'
I
9
NUHSER
2
4
6
8
I0
ST U•ULU$ FIUtIOER
FIG. 5. Resultsof theidentificationtaskfor the stop-amplitude (top)andglide-amplitude (bottom)continuafor the [i] series(lei•)andthe [al series(fight),
showing cate8oryresponse for thestop-initial (circles}, fricativc-initial (stars), andglide-initial ( X 's)asa functionof stimulus number. duration of the formant transitions. Their results showed
that only the valuesof F 1 significantly contributedto the perception of thestop-glidecontrast.However,theydidnot consideranotheracousticcorrelateof lip opening--thenatureof the pressure buildupaccompanying a completeclosure(zerolip opening) versus a partialclosure.Both[w] and Iv] havepartialclosures, in contrastto [b]. Thusthe glide and fricativesharea commonacousticpropertyrelatingto the nature of the articulatoryclosure.Moreover, although [hi, [v], and [w] all involvethe labialplaceof articulation, they are not identicalin the articulatoryconfiguration for placeof articulation.For [b] the lipsare closedprior to the consonant release, for [w] thelipsareprotractedandrounded,andthetonguepositionisbothhighandback,andfor [v] the bottom lip forms a partial constrictionwith the upper teeth,i.e., labiodentalplaceof articulation.As a resultof thesearticulatorydifferences,the spectralcharacteristics at the releaseof thesethreesoundswill be different.Of particulaximportance,the formantfrequencies of the glide[w] will beverylow, whereastheywill notbeaslow in the secondand higherformantsfor the labial[b] nor for thefricativeIv]. Turning to the syntheticspeechcontinuausedin this experiment, the [b]prototypestimulihad,similarto natural speechmeasurements, muchhigherstartingF2 frequencies thandid [w] (seeTableI). Changingthe mannercuesof the 1250
J. Acoust.Soc.Am.,VoL75. No. 4, April1984
[hi stimulus(i.e.,thenatureof thereleasecharacteristics of the stimulusby manipulating the amplitudeenvelope), produceda stimuluswith a gradualonsetof energy,but with spectralproperties characterized by higherformantexcitation, a propertymore nearlylike the labiodentalfricative than the labial glide.Thus, it is not surprisingthat subjects perceived the[b]endof thecontinuaasthefricative[v]rather than the glide[w]. Thefactthata largenumberof [d]-responses weremade to the stimuli 1-3 for the [i] vowelserieswassurprising,but againmay be a consequence of the distributionof the spectral propertiesinherentin thesestimuli. In particular,the startingfrequencies were relativelyhigh from the onsetof the transitionsand into the vowel,and the amplitudesof theseformantswere alsosetat a relatively high dB level (cf.
Table 1). Thus the spectralcharacteristics of thesestimuli had a predominanceof high-frequencyenergyfrom the momentof stopreleaseandonsetof voicingrelativeto thelower frequencies[cf. Fig. 2, spectra(c)]. Thesecharacteristics havebeenshownto providecritical perceptualcuesfor coronalconsonants, suchas [d] (Lahiriet al., 1984).Presumably,wedid notgenerateexemplar[bi]stimuliwith sufficient spectralenergyin the low frequencies. Whilewearemakingseveralclaimsabouttheacoustic properties of stopsandglides,it isthe casethatthe percepP. Shinnand S. E. Blumstein: Perceptionof [b] and [w]
1250
tualeffectof theamplitude envelope variedsomewhat asa
surements ofnaturalspeech. Thisresebxch wassupported in
function ofvowel context, asdidthenumber of[v]responsespart by Grant NS15123. in theopencategorization task.In particular,in the[a]series the glide-amplitudecontinuumwas lesseffectivein the forced-choice task, and there were more [v] responses ob-
tained/ntheopencategorization task.Wearehesitant to attributethesedifferences to vowelcontexteffects per se, sinceweusedexactlythesameinputparameters for amplitudeacrossthevowelseries.In fact,wefirstsynthes/zed the [i] series andthentransferred theamplitude parameters directlyto the [a] series.Whilethisinsuredsomedegreeof uniformityacross thetwovowelcontinua, it ignoredthefact that the fine detailsof the amplitudecharacterhticswould not be the same across vowel contexts due to the interaction
of the differentfrequencyvalueswith the amplitudeparameters.Thuswe maynot havesynthes/zed optimalamplitude character/sties for the [a] series. It isalsoimportantto consider thedifferentpatternof results whichemerged whensubjects weregivena free-identiffcationversus a forced-choice response task.It iscriticalto know not only how subjectsrespondwhengivenalternatives,butperhapsmorehnportanfly,what theyperceive the particularteststhnulito bewithoutsuchalternatives.Unless thesemeasures are taken,the experimentermay cometo /nappropriate conclusions concerning the effectsof part/cular acousticvariables on thespeechperception process. Theresultsof thisexperiment supporttheviewthatthe
amplitude envelope is a criticalperceptual cuedistinguishing[b]and[w],andperhaps moreimportantly, provides an invariantacoustic propertydistinquishing theclassof stops fromtheclassof continuants. Supportfor theargumentthat thispropertyis invariantcomesfrom {1)the acousticanalys/sof naturalspeechshowingthat the ampi/rudeenvelope provides aninvariantpropertydistinguishing [b],[d],and[g] from [w] and Lv]acrossspeakers andvowelcontexts{Mack andBlumstein,1983),and{2)theperception dataobtainedin this experimentshowingthat listenerscan and do usethis propertyin distinguishing the classof stopsfrom the classof continuants. That thispropertydoesnot un/quelydistinguishthe classof stopsfrom glides,but ratherthe classof stopsfromcontinuants (Whichincludes glides) suggests that stopsand glidesare not distinguished by a singleinvariant acousticproperty.Thesefindingsare consistent with traditionalphonological analyses in which[b]and [w], aswellas [d], [el, and [y], are distinguished by a numberof features, including [ + consonantal] versus [ -- consonantal], [ - continuant]versus[ + continuant],[ + SOhorant] versus [-sonorant], and [--back] and [--high] versus [ + back]and[+ high](Jakobsonetal., 1963;Chomskyand Halle, 1968).Thephoneticconsequences of thesedifferences wouldmeanthatthecontrastbetween[b]and[w] wouldnot turn on one property, but rather would residein several properties,amongthemamplitudecharacteristics aswell as the particular spectralshapecharacteristicsat the moment of consonantal
release.
ACKNOWLEDGMENTS
We would like to thank John Mertus for development of the synthesizer programand Molly Mack for somemea1251
J. Acoust.Soc.Am.,Vol.75, No.4, April1984
tThegreatest disparity between theformant frequency values ofourstimulusparameters andthosetypicallyusedin synthetic speech concern, theF 1 andF2 valuesfor [b] {cf.TableI). Accordingto theoretical predictions (Fant,.1900},FI for [b] shouldbelowerthanthat of [w], whereas in our
stimuliF 1islowerfor [w]thanfor[b].Also,F2 for[b]should besharply risingandnotfiator falling.Wehavenoreadyexplanation forthedisparity between the theoretical calculations and our obtained measures from natu-
ral speech.One reasonmaybe that thereare limitationson our measurementprocedures. For example,if thereis a very rapidchangein the formant frequencies {onthe orderof •-10 ms},then the LPC analysis,even whenusinga windowas smallas 10 ms, wouldderivehigherfrequency valuesthanthat givenfrom the theoreticalpredictions.Anotherpossible reasonfor the disparitymightbe that the theoreticalcalculations of the formantfrequency valuesarebasedonstaticstates of thevocaltractanddo not incorlx)rateacousticeffectsof dynamicarticulatorymovementsactually occurringin naturalspeech.
•'fhereis some question whether useof theparticular parameter values specified for theendpoint[halstimulus produced an unambigons [b] percept.Two reviewers whoreceivedcopiesof the teststimulihad reservationsaboutthequalityofthestimulus. To explorethis,weperformed apost hoctestusingten naivesubjects who wereaskedto tell us "what they heard"whenpresented withthisstimulus. All reported[hal.Nevertheless, aswill beseenin theresults{Fig.4}, theendpoint[halwasnotasgoodan exemplaras theendpoint[bi], assomesubjects reportedpercepts other thanCoal.
•It isimportant tonotethatalthough theamplitude characteristics ofthe synthetic stimuli[ba]and[wa]endpoints aresimilarto thenaturalspeech stimulishownin Fig.1,thespectral tilt at theonsetof [ba]isquitedifferent fromthat of naturalspeech. Thisdifference reflectsthefactthat we used theamplitudevaluesfromthe [bi] endpointstimulusto generate the [hal endpointstimulus.We decidedto usethesamevaluesfor bothvowelcontinua in order to maintainsomedegreeof consistency acrossthe vowel environments. However,owingto thedifferentformantfrequencies of the stimuliandtheinteraction effects of frequency andamplitude,appropriate spectraltilt characteristics weresacrificedfor consistency in amplitude values across vowels.
Chomsky,N. andHalle, M. (196•). T•e SoundPatternof English(Harper and Row, New York).
Cooper,W. E., Ebert,R. R., andCole,R. A. (1976)."Perceptual analysis of stopconsonants andglides,"J. Exp. Psychol.:HumanPercep.Perf. ß92-104.
Diehi, R, (1976)."Featureanalyzersfor the phoneticdimension 'stopvs. continuant,"'PercepLPsychophys. 19, 267-272. Fant, O. (1960). AcousticTheoryof SpeechProduction{The Mouton, Hague}. Godfrey,$. J., andMillay, K. K. (19•1},"Discrimination of the'tempoof frequency change'cue.,"J. Acuust.Soc.Am. (•, 1444-1448. Iakobson,R., Fant,O., andHalle,M. (1963).Preliminaries toSpeech.4nalys/s(MIT, Carabridge,MA}. Kasoya,H., Takcuchi,S., Sato,S.,andK/do, K. (1982},"Articulatoryparametersfor theperception of bilabials,"Phonetica 33, 61-70. Kewley-Port, D. (1992)."Measurements offormanttrans/tious in naturally produced stopconsonant-vowel syllables," $. Aconst.Soc.Am. 72, 379389.
Klatt, D. {1980}."Softwarefor a cascade/•el
formantsynthesizer,"
Acoust. Soc. Am. 67, 971-995.
Lahiri, A., Gewirth, L., and B!umstein,S. E. (1984)."A reconsideration of acousticinvariancefor placeof articulationin diffusestopconsonants: Evidencefrom a cross-language study,"J. Acoust.Soc.Am. (submitted for publication}. Liberman,A.M., Delattre,P. C., Gerstman,L. J., andCooper,F. S.
"Tempoof frequency changeasa cuefordistinguishing classes of speech sounds," J. Exp.Psychol. $2, 127-137. Mack,M., andBlumstein, S.E. (1983)."Furtherevidence ofacuustic invarlancein speechproduction:The stop-glidecontrast,"•. Acoust.Soc.
P. Sh'nnandS. E. Blumstein: Perception of [b] and[w]
1251
Am. 73, 1739-1750.
Miller, J. L. {1980)."Contextualeffects in thediscrimination of stopconsonantsand semivowels," Percept.Psyehophys. 9.8,93-95. Miller, $. L., and Libarm,an,A.M. (1979)."Someeffectsof later-ocaurring information ontheperception ofstopconsonants andsemi-vowels," Pereelat.Psyehophys. 25, 457-465. O'Connor, J. O., Gerstman, L J., Liberman, A.M.,
Delattre, P. L., and
in English,"Word 13, 24-43. Schwab, E., $awusch, J., andNusbaum,H. C. (1981}."The roleof second formanttransitionsin the stop-semi-vowel distinction,"Percept.Psychophys. 29, 121-128. Suzuki,H. {1969}."Mutuallycomplementary effectof rateandamountof formanttransitionin distinguishing vowel,semi-vowel andstopconsonant,"Q. Prog.Rep.Res.Lab. Electron.,MIT. 96, 164-172.
Cooper,F. S. (1957}."Acousticcuesfor theperceptionof initial/w,j,r,l/
1252
J. Acoust.Soc. Am., Vol. 75, No. 4, April 1984
P. Shinnand S. E. Blumstein:Perceptionof [b] and [w]
1252