You are on page 1of 14

Dept.

for Speech, Music and Hearing

Quarterly Progress and Status Report

A quantitative theory of cardinal vowels and the teaching of pronunciation


Lindblom, B. and Sundberg, J.

journal: volume: number: year: pages:

STL-QPSR 10 2-3 1969 019-025

http://www.speech.kth.se/qpsr

STL-QPSR 2-3/1969

B,

A QUANTITATIVE THEORY OF CARDINAL VOWELS AND THE TEACHING O F PRONUPJCIATION*

B I Lindblom and J. Sundberg


Abstract The p r o b l e m of devising a r e f e r e n c e s y s t e m f o r specifying t h e phonetic value of vowels is d i s c u s s e d . The c l a s s i c a l t h e o r y of C a r d i n a l Vowels is examined a s well a s previous quantitative f r a m e w o r k s f o r describing vowel pronunciation. An attempt i s m a d e to c o n s t r u c t a model of vowel production that combines the i d e a of a r e f e r e n c e s y s t e m with a n objective n u m e r i c a l method of specification. A s e t of vowels i s generated +hat r e p r e s e n t the m o s t e x t r e m e vowels i n t e r m s of t h e total acoti:.tic vowel s p a c e c h a r a c t e r i s t i c of the model. T h e s e sounds a r e selected s o a s t o b e a l s o approximately equidistant acoustically. T h e s e modelbased "cardinal vowels" a r e compared with a s e t of t r u e c a r d i n a l vowels pronounced by Daniel Jones. The two s e t s display many qualitative s i m i l a r i t i e s both acoustically and auditorily. The i m plications of a vowel r e f e r e n c e s y s t e m that could indeed b e produced "from w r i t t e n d e s c riptions ", is touched upon i n p a r t i c u l a r with r e g a r d t o the teaching of pronunciation t o h a r d of hearing as well a s n o r m a l students. An analysis -by- synthesis application i s suggested i n which the model i s used t o supplement v i o u a l s p e c t r a l d i s p l a y s of vowel sounds with a r t i c u l a t o r y i n t e r p r e t a tions. Some of the p r o b l e m s that would have t o b e o v e r c o m e i n s u c h a method of "automatic articulation instruction" a r e m e n tioned, f o r instance, those of normalization and compensatory articulation. T h e C a r d i n a l Vowel S v s t e m In the teaching of pronunciation one principle i s t o d e s c r i b e the unknown sounds to b e learned i n relation to sounds that a r e known t o t h e student. Phoneticians have pointed out that the sounds of a given language cannot b e used f o r this purpose s i n c e t h e r e a r e l a r g e variations among s p e a k e r s of the s a m e language owing t:, dialectal, socialogical, and o t h e r f a c t o r s ( I ) . Instead of teaching, f o r instance, vowel pronunA a ciation i n t e r m s of the vowel sounds of a p a r t i c u l a r l a n g u a ~ e r e f e r e n c e s y s t e m that i s independent of any given language has been devised. f a m o u s example of s u c h a s y s t e m is the C a r d i n a l Vowels. t i e s and known tongue and l i p positions"(2). T h e s e sounds

a r e said to b e "a s e t of fixed vowel-sounds having known acoustic qualiThere a r e a primary set and a secondary s e t of c a r d i n a l vowels each s e t c o m p r i s i n g eight vowels.

T h i s p a p e r was p r e s e n t e d a t the Second International C o n g r e s s of Applied Linguistics, Cambridge, England, 8-1 2, Sept. 1969.
,

STL-QPSR 2-3/1969

20.

It is c u s t o m a r y t o r e f e r t o a given c a r d i n a l vowel by i t s number and t o d e s c r i b e i t s articulation i n t e r m s of t h r e e dimensions : tongue height, front-back position of the tongue and d e g r e e of rounding. In Table I-B-1 below we show the a r t i c u l a t o r y specifications and symbols used t o d e s c r i b e the p r i m a r y and secondary s e t s . TABLE I-B-1 close half-close half - open open close half-close half open open
i

u
o
3

e
E

PRIMARY CARDINAL VOWELS

a
Lu
Y
A
D

b
c e

SECONDARY CARDINAL VOWELS

CIE

T h e following f e a t u r e s a r e said t o c h a r a c t e r i z e t h e s e sounds (3). (1) They a r e independent of the vowels of any language.
(2) They a r e fixed r e f e r e n c e points of "exactly d e t e r m i n e d and invariable quality".

(3) They a r e p e r i p h e r a l vowels, Thus i n principle i t should be possible t o d e s c r i b e a n a r b i t r a r y vowel quality of any language by interpolating between the r e f e r e n c e points.
(4) They a r e auditorily equidistant.

(5) Moreover, "the values of c a r d i n a l vowels cannot b e l e a r n t


f r o m w r i t t e n descriptions; they should b e l e a instruction f r o m a t e a c h e r who knows them"

At.by o r a l

The l a s t p r o p e r t y indicates that the s y s t e m i s not a n objective and quantitative one but r e l i e s heavily on the m o t o r s k i l l s and perceptual acuity of the student. It is passed on by o r a l tradition.

Quantitative F r a m e w o r k s of Vowel S ~ e c i f i c a t i o n F r o m the e a r l y fifties onwards acoustic phonetics h a s made rapid progress. Among t h e achievements i n t h i s field a r e the s c h e m e s devised We a r e r e f e r r i n g t o t h e t h r e e The following t h r e e p a r a m e t e r s by F a n t (4) and Stevens and House (5) t o study the relation between vowel articulations and t h e i r acoustic r e s u l t s . p a r a m e t e r models of t h e s e investigators.

a r e controlled i n t h e s e models: (1) length and opening a r e a of l i p section,

( 2 ) position of maximal tongue constriction,


(3) the magnitude of this constriction. On the b a s i s of t h e s e t h r e e num'bers t h e c r o s s - s e c t i o n a l a r e a s along t h e vocal t r a c t a r e derived with the aid of r u l e s that differ somewhat between t h e two v e r s i o n s of the t h r e e - p a r a m e t e r model. Given t h e distribution of c r o s s - s e c t i o n a l a r e a s along the t r a c t , o r the a r e a function, t h e acoustic d e t e r m i n a n t s of vowel quality, the f o r m a n t frequencies, a r e ccmputed. According to this type of model possible vowel a r t i c u l a t i o n i s defined a s any p e r m i s s i b l e combination of p a r a m e t e r values the p a r a m e t e r s being dimensions of the a r e a function. Although t h e s e f r a m e w o r k s of vowel specification a r e objective and quantitative which the c a r d i n a l vowel s y s t e m is not they s o m e t i m e s s p e c ify "possible vowel articulation" i n too g e n e r o u s a fashion and i n a mann e r which i s not always e a s y t o i n t e r p r e t i n intuitively meaningful a r t i c ulatory t e r m s s u c h as open-clos e, front-back, etc.
A Model of Vowel Production

In the p r e s e n t p a p e r we s h a l l r e p o r t on a n attempt to c o n s t r u c t a model that combines the idea of a n a r t i c u l a t o r y and perceptual r e f e r e n c e s y s t e m


. inherent i n t h e c a r d i n a l vowel t h e ~ ~ r y Our a i m h a s been t o build into the

model a l l that we know a t p r e s e n t about the n a t u r a l d e g r e e s of f r e e d o m of the vocal t r a c t . In s o doing we hope that we might a r r i v e a t a n i m proved definition of the notion of "possible vowel articulation". A. Articulatory P r o p e r t i e s -------- -----Our model is controlled by m e a n s of t h e following independent c o m ponent s : the mandible t h e tongue whose movements we r e s t r i c t t o a single fixed path. whose shape c a n b e v a r i e d continuously by l i n e a r interpolation between t h r e e b a s i c configurations of tongue c o n t ~ u r s corresponding t o [i], [ a ] and [u], respectively. A c e r t a i n m i x t u r e of palatalization, velarization and pharyngealization ("[i) - n e s s f l , "Cu] - n e s s n andl[a] -ness1', respectively) c o r r e s p o n d s t o c e r t a i n n u m e r i c a l values of t h e s e p a r a meters.

STL-QPSR 2-3/1969

22.

T h i s choice c a n b e justified p a r t l y on t h e b a s i s of d a t a obtained f r o m l a t e r a l X - r a y profiles of Swedish vowels. It t u r n s out that t h r e e m a i n farnilizs of tongue contours a r e obtained provided that t h e s e cont o u r s a r e plotted with the l o w e r jaw a s r e f e r e n c e (Fig. I-B-1). An explanation of this r a t h e r r e s t r i c t e d s e t of contours i s readily apparent when we think of the a r r a n g e m e n t of the m a j o r e x t r i n s i c m u s c l e s of the tongue. We find the genioglossus, styloglos s u s and t h e hyoglossus m u s c l e s which s e e m mechanically capable of participating i n the contraction p a t t e r n s underand [ a ] , respectively(7). lying the production of [i], [ labio-muscular activitv (rounding- spreading)
I

which is independent of jaw position.

l a r y n x height All of t h e s e p a r a m e t e r s lend t h e m s e l v e s naturally t o a n i n t e r p r e t a t i o n i n t e r m s of "muscle lengths". T o compute a sound wave f r o m s u c h specifications t h e p r o c e d u r e is t h e following:
1 . Choose p a r a m e t r i c values (jaw opening, tongue shape, rounding spreading, l a r y n x height).

2. Compute the a s s o c i a t e d a r t i c u l a t o r y profile, that is, the contours of the vocal t r a c t and i t s length i n a l a t e r a l projection.
3. T r a n s l a t e the r e s u l t of 2 into a n a r e a function (the variation of c r o s s - s e c t i o n a l a r e a along the t r a c t ) .
4. Compute the formant frequencies corresponding t o this a r e a function.

In Fig. I-B-2t h e t h r e e b a s i c tongue shapes a r e shown. s y s t e m anchored on the mandible i s a l s o depicted. t o compute interpolated tongue shapes. a r e a i s derived. ordinate system.

A coordinate

T h i s s y s t e m is used

At the top left we s e e t h e p a r a -

m e t e r s of width and height of l i p s e p a r a t i o n with the aid of which the opening Below a l a t e r a l profile t r a c i n g i s shown with another coT h i s s y s t e m i s used t o compute the a r e a function.

B. Acoustic P r p e i s - - - - - - - - -o- -r t-eNaw a s s u m e that like the child learning t o talk, we combine different mandible positions, tongue s h a p e s , l i p s t a t e s and l a r y n x heights i n a l l possible ways and l i s t e n t o the acoustic r e s u l t i n each individual case. Whatever w e do with o u r a r t i c u l a t o r y components it i s c l e a r t h a t t h e human s p e e c h organs a r e constrained i n s u c h a way s o a s to p e r m i t only c e r t a i n vowel qualities, o r combinations of f o r m a n t frequency values.

1I
I
i

Fig. I-B-1.

Midsagittal tongue contours for Swedish vowels in relation to outline of mandible. Top right: [u]. Below: [ a , o, Top left: [ i , e , r ,

. I

?I.

Fig. I - B - 2 .

Intheupperleft-handparttheparametersdetermining t h e mouth opening a r e a A a r e shown: h = v e r t i c a l s e p a r a t i o n between lips; w = d i s t a n c e between mouth c o r n e r s ; p i s a n u m b e r that specifies the c u r v a t u r e of the lip cont o u r s . T h e s e contours when projected on a frontal plane a r e a s s u m e d t o be given by

In t h e u p p e r right p a r t the b a s i c tongue s h a p e s of the model a r e shown. A p o l a r coordinate s y s t e m defined i n r e l a t i o n t o the mandible i s a l s o indicated. With the a i d of t h i s coordinate s y s t e m interpolated tongue s h a p e s a s s o c i a t e d with [ i , u] and [ a ] w e r e computed. In the l o w e r p a r t of the figure a l a t e r a l X - r a y t r a c i n g c a n b e s e e n . Superimposed on the profile i s a c o o r d inate s y s t e m defined i n r e l a t i o n t o fixed s t r u c t u r e s s u c h a s the maxilla. This s y s t e m w a s used i n the d e t e r mination of a r e a functions.

Certain other mouths.

F3-F2dFI combinations c h a r a c t e r i z e vowels that we could

produce only with the aid of a t e r m i n a l analogue s y n t h e s i z e r combinations would b e impossible.

- not with o u r

F r o m the point of view of the human s p e e c h m e c h a n i s m s u c h The s p a c e that c h a r a c t e r i z e s the a In this figure The l o w e r fields The top a r e a s

coustic possibilities of o u r model i s shown i n Fig. I-B-31 we have s e p a r a t e d the rounded and s p r e a d subspaces. r e f e r t o a l l possible combinations of F p e r t a i n t o the corresponding symbols.

2 and F 1 values.

F 3 values.

When we explore the contours

of t h e s e s p a c e s auditorily we find the qualities indicated by the vowel T h e s e points have been selected a t approximately equidistant

F1 steps.
T r u e and Model-Based C a r d i n a l Vowels C l e a r l y the f i r s t f o u r f e a t u r e s mentioned on p. 20 a s c h a r a c t e r i z i n g c a r d i n a l vowels apply a l s o t o the vowels generated by t h e model. points and they a r e acoustically (if not auditorily) equidistant. Consequently i t would b e of s o m e i n t e r e s t t o c o m p a r e a s e t of modelbased vowels with a s e t of t r u e c a r d i n a l vowels. Fig. I-B-4 d e m o n s t r a t e s The f i r s t t h r e e forThey a r e independent of any language, they a r e p e r i p h e r a l and fixed r e f e r e n c e

the r e s u l t s of a n acoustic c o m p a r i s o n of t h i s type. by Daniel J o n e s ( I ) .

mant frequencies w e r e m e a s u r e d i n a s e t of c a r d i n a l vowels a s spoken The left plot i n t h i s figure shows the e x t r e m e vowels It is s e e n that t h e This difference is generated with t h e model ( a l s o shown i n Fig. I-B-3). t r u e c a r d i n a l vowels s p a n a somewhat l a r g e r range.

probably due t o the fact that,among o t h e r f a c t o r s , Daniel J o n e s h a s a s h o r t e r o v e r a l l t r a c t length than that of o u r model and that he a l s o d e l i b e r a t e l y shortened h i s t r a c t by elevating his l a r y n x to a n e x t r e m e position when producing

[il

and probably [ a ) .

Qualitative s i m i l a r i t i e s d o

exist between the s e t s , however.

T a b l e I-B-2 contains a specification of

t h e a r t i c u l a t o r y p a r a m e t e r s that underlie the model generated sounds. Table I-B-2. The a r t i c u l a t o r y dimensions of p e r i p h e r a l vowel types Tongue shape Palatal Close
JAW i

Palato-pharyngeal

Pharyngeal
o

Velar u

e Open
E

a
a

ROUNDED

VOWEL S P A C E

SPREAD

VOWEL

SPACE

+-.-.-.4.

-.-.-.-

Roundrd and larynx bprassad

[Ul
I
1

.
. I

.1

.2

4
FIRST

.4

.S

.6

.7

.8 kHz

.2

.3

. 4

.5

.6

.7

.8 kHz

FORMANT

FREQUENCY

FIRST FORMANT

FREQUENCY

Fig. I - B - 3 .

The maximal rounded and spread vowel spaces that the model i s capable of g e n e r a t i n g .

Implications of a Cuantitative Theory of C a r d i n a l Vowels --f o r the Teaching of Pronunciation In theory, a physiological model of vowel production should b e a useful tool i n the teaching of pronunciation t 3 second language l e a r n e r s and h a r d of hearing children. mountable. In practice, such a n application would r e q u i r e solutions to s o m c technical problems which, by rro m e a n s , however, a p p e a r i n s u r Imagine that the f o r m a n t p a t t e r n of a vowel in a given language o r dialect could be m e a s u r e d automatically and r e p r e s e n t e d a s a dot on a n

F2 and F 3 ' o r s o m e m e a s u r e combining t h e s e two frequencies, could b e plotted along the ordinate and F along the a b s c i s s a . 1 Technologically this i s not wishful thinking. Attempts have a l r e a d y been
oscilloscope s c r e e n . made along t h e s e l i n e s (6). dicated i n Fig. I-B-5. Suppose that w e f i r s t plot a t a r g e t vowel a s inIn a l l Next we a s k o u r subject who might f o r i n s t a n c e b e

a deaf child t o produce a v o v ~ e l s c l o s e to the t a r g e t a s possible. a i n F i g , I-B-5.

probability o u r pupil will m i s s the t a r g e t perhaps in the m a n n e r indicated might a t this s t a g e have the child t r y t o i m p r o v e his An even b e t t e r method Such condipronunciation by a t r i a l - a n d - e r r o r procedure.

would b e t o supplement t h e visuzi display with a n indication of how the child should change his articulation in o r d e r to r e a c h the target. information could b e given i n t e r m s of t r a j e c t o r i e s depicting the f o r m a n t

frequency shift a s s o c i a t e d with "isolingual" and "isomandibular"


tions.
A s e t of such c u r v e s i s given in Fig. I-B-5.

This i n s t r u m e n t

would s e r v e a s a s o r t of "automatic articulation instructor". This goal might a l s o sound utopian but i t should b e possible given a c o m p u t e r and a n acceptable theory of vowel production. T h e p r e s e n t model of vowel production is obviously unsatisfactory a s s u c h a theory. T h e r e a r e a number of modifications that a r e c l e a r l y T h e r e is f o r instance the p r o b l e m of normalizing T h e r e i s a l s o t h e p r o b l e m of compensatory a r t i c u Although it might b e possible t o produce a n needed to i m p r o v e it?. uniformly i n s i z e non-peripheral vowels.
3:

F - p a t t e r n d a t a f o r t a l k e r s whose vocal t r a c t lengths differ often nonlations that o u r model a t p r e s e n t often i n c o r r e c t l y allows in the c a s e of
[jd]

e. g. , independent control of tongue blade i n retroflexion and c o r o n a l consonant articulation; independent control of pharynx width i n connection with t h e t e n s e - l a x distinction on which t h e vowel harmony of the Akan languages s e e m s t o b e based; hyperpalatalization, velarization and pharyngealization t o produce cotnnensatory a r t i c a l a t i o n s and (ifferent types of d o r s a l constrictions l a r g e r than those f o r vowels; control of the f o r m of the e r o z s - s e c t i o n a l a r e a ( l a t e r a l s , f r i c a t i v e s e t r . ).

MOVE TONGUE FORWARD !

PUPI~S VOWEL

FIRST FORMANT

FREQUENCY

F i g . I-B-5.

Stylized v i s u a l d i s p l a y of s p e c t r a l p r o p e r t i e s of vowels intended t o b e u s e d i n i m p r o v i n g t h e pronunciation of vowels by h a r d of h e a r i n g , s u b j e c t s . T h e t a r g e t vowel and t h e p u p i l ' s vowel a r e r e p r e s e n t e d by d o t s . T h e t r a j e c t o r i e s i n d i c a t e t h e f o r m a n t f r e q u e n c y s h i f t s t h a t would r e s u l t if t h e pupil changed h i s a r t i c u l a t i o n a s i n d i c a t e d , t h a t i s , b y lowering h i s jaw and b y moving h i s tongue f o r w a r d . T h e s e t r a j e c t o r i e s a r e a s s u m e d t o b e b a s e d o n a n a n a l y s i s - b y - s y n t h e s i s of t h e pupil' s vowel using a quantitative m o d e l of vowel p r o d u c t i o n of t h e t y p e d e s c r i b e d i n t h e p a p e r .

STL-QPSR 2-3/1969

25.

with s p r e a d lips by compensating e l s e w h e r e along the t r a c t i t is e x t r e m e ly r a r e t o find a Swedish native t a l k e r who consistently s p r e a d s his sounds. tion that i s yet t o b e discovered i n future r e s e a r c h . References : (1) C a r d i n a l Vowels (spoken by D. Y w % 6)

[b]

Among o t h e r things t h e r e s e e m s t o b e a principle of disambigua-

ones), Linguaphone Institute

( 2 ) D. Jones:

An Outline of English Phonetics (Cambridge 1956), 8 t h edition.

1967). (3) D. Aberc rombie: E l e m e n t s of G e n e r a l Pholletics ( ~ d i n b u r ~ h


(4) G. Fant:

Acoustic T h e o r y of Speech Production ( ' s -Gravenhage 1960).

(5)

K,NA Stevens and A.

S, House: "Development of a quantitative d e s c r i p tion of vowel articuIation", J. Acoust. Soc. Am. - (1955), 17 pp. 484-493r

(6) J. M. Pidkett and A, Constam: "A v i s u a l s p e e c h t r a i n e r with simplified indication of vowel spec!truml', Am, Ann. of t h e d e a f 1 13 (1968), pp. 253-258.

I. B. Thomas and R. C. Snell: "Articulation training through the u s e of a r e a l - t i m e visual display of s p e e c h p a r a m e t e r s " , m s submitted f o r publication i n Am.Ann. of the Deaf.

A. J. Goldberg: "Visual perception of s p e e c h stimuli", Q u a r t e r l y P r o g r e s s R e p o r t No. 92 (MIT, RLE, Cambridge, Mass. ), Jan. 15, 1969, pp. 335-338. (7) P. Ladefoged: "Physiological c h a r a c t e r i z a t i o n of speech", Working P a p e r s in Phonetics (UCLA), June 1964, pp. 2-9.
(8) G. Fant: "A note on vocal t r a c t s i z e f a c t o r s and non-uniform Fp a t t e r n scalings", STL-QPSR 4/1966 (KTH, Stockholm), pp. 22-30.

Acknowledgments T h i s w o r k was supported by the National Institutes of Health R e s e a r c h Grant No. NB 04003-07 and the Tri-Centennial Fund of t h e Bank of Sweden Contract No. 67/48.

You might also like