Professional Documents
Culture Documents
Marco Zemke
Institut fr Informatik, Humboldt University Berlin, Rudower Chaussee 25, 12489 Berlin, Germany
Malte Kob
Erich Thienhaus Institute, University of Music Detmold, Neustadt 22, 23756 Detmold, Germany
Hanspeter Herzel
Institute for Theoretical Biology, Humboldt University Berlin, Invalidenstrae 43, 10115 Berlin, Germany
This paper is based on a talk presented at the 6th ICVPB, Tampere, Finland, 69 August 2008.
b
Author to whom correspondence should be addressed. Electronic mail:
isao@jaist.ac.jp
1528
0001-4966/2010/1273/1528/9/$25.00
I. INTRODUCTION
Pages: 15281536
h1
2
h4
2
k2,3 m
3
m2
h2
2
h3
2
0.4
0.2
0
Vocal
Tract
Trachea
x1
x2
x3
x4
al., 2008; Zhang, 2009 describing anatomical and physiological details. However, many parameters are not precisely
known and a comprehensive bifurcation analysis is difficult
Berry et al., 1994. Consequently, we constructed a lowdimensional model that is consistent with the basic experimental observations.
Our model is based on the body-cover differentiation
proposed by Story and Titze 1995, a three-mass representation of the cover Tokuda et al., 2007, and a smooth vocal
fold geometry as in Lous et al. 1998. Advantage of dividing the cover layer into the three masses is that they are
suitable for representing the coexistence of different vibratory patterns, which may correspond to chest and falsetto
registers Tokuda et al., 2007. The chosen parameter values
are similar to the values in these papers and they are consistent with muscle activation rules Titze and Story, 2002. A
detailed discussion of the modeling and a complete set of
equations and parameters are given in Appendixes A and B
and a recent thesis Zemke, 2008.
Figure 1 visualizes our four-mass polygon model. The
three cover masses allow wave-like vibrations of the whole
vocal folds with a complete closure of the glottis in chestregister simulation. For other parameter sets, high-pitched
oscillations with diminished closure of the glottis are simulated, which resemble falsetto register. Figure 2 shows chestlike vibrations fundamental frequency of 96 Hz for the default parameters listed in Appendix A. The phase shifts in the
opening areas shown in the upper graph allow the energy
transfer from the air flow to the masses and contribute to a
skewing of the glottal pulses of the lower graph.
In order to simulate register transitions, we recall rules
for controlling low-dimensional vocal fold models with
muscle activation Smith et al., 1992; Titze and Story, 2002.
An active CT muscle decreases the vibrating mass and increases stiffness. We introduce a tension parameter T which
mimics the CT muscle Steinecke and Herzel, 1995. Masses
are divided by T and stiffness parameters are multiplied by T
and thus the fundamental frequency of the model is roughly
proportional to the parameter T. Increasing T from 1 to 7
J. Acoust. Soc. Am., Vol. 127, No. 3, March 2010
Symmetry Axis
Air Flow
x0
0.6
2.5
2
1.5
1
0.5
00
10
15
Time [msec]
20
25
0.3
0.2
0.1
0
1
0.8
0.6
0.4
0.2
0
4
Time [msec]
1529
m1
k1,2
r3
k3
h0
2
r2
k2
r1
a1
a2
a3
0.8
k1
rb
mb
kb
1500
1000
Frequency [Hz]
500
(a)
10
Time [s]
1500
1000
Frequency [Hz]
The analysis of register transitions and source-tract interaction is often studied using glissando singing Henrich et
al., 2005, experimental variation in vocal fold tension
Tokuda et al., 2007, or gliding of the fundamental frequency in biomechanical models Titze, 2008. In Fig. 4, we
compare a glissando of an untrained singer with simulations
of a corresponding F0 glide in our four-mass model coupled
to sub- and supraglottal resonators. The singers glissando in
Fig. 4a exhibits register transitions with frequency jumps
around 3.3 and 7.8 s at slightly different pitches. There is an
abrupt phonation onset at 1.2 s and a smoother offset with
some irregularities. Glissando is simulated in Fig. 4b by
varying our tension parameter T from 1 to 5.5 and then back.
We find a frequency jump at 6.7 s T = 3.8, F0 = 390 Hz and
a backward transition at 17 s T = 3.3, F0 = 350 Hz. These
differences between chest-falsetto and falsetto-chest transitions are a landmark of hysteresis see Tokuda et al., 2007
for a detailed discussion of bifurcations leading to hysteresis. Hysteresis indicates that there are coexisting vibratory
regimes limit cycles for a range of parameters. Moreover,
hysteresis implies that there are voice breaks instead of passagi of trained singers.
In addition to register transitions, occasionally subharmonics are observed, e.g., at 8.7 and 14.1 s. It has been
discussed earlier Berry et al., 1996; Tokuda et al., 2007 that
register transitions are often accompanied by nonlinear phenomena such as subharmonics and chaos. The gross features
of the experimental and simulated F0 glides in Fig. 4 are
similar. The study of hysteresis at phonation onset/offset requires a more detailed Hopf bifurcation analysis.
500
(b)
10
12
14
16
18
20
22
Time [s]
FIG. 4. Color online a Spectrogram of human voice with a gliding fundamental frequency F0. b Model simulation of the gliding F0.
discussion. It is known that bifurcations, i.e., sudden transitions due to slow parameter variations, might depend
strongly on system parameters. Thus, it is possible that even
medium effects of the resonances see Titze et al., 2008 for
data on source-filter interactions can shift register transition
drastically. It has been suggested that particular subglottal
resonances govern involuntary register transitions Titze,
2000; Zhang et al., 2006.
In order to study source-filter coupling, we implemented
the wave-reflection model Kelly and Lochbaum, 1962; Liljencrants, 1985; Story, 1995; Titze, 2006. Details of the
simulations are given in Appendix B. For simplicity we approximate the resonators by uniform tubes characterized by
their length and area. This simplification gives direct insight
on how resonance frequencies given by the tube lengths affect the location of register transitions.
Type of registers
Volume Flow
0.6
0.2
0.3
00
Time [ms]
20
0.15
Chest
Falsetto
Volume Flow
0.1
0.6
0.3
(a)
0
0
0.05
0.2
0.8
0.3
20
Time [ms]
0.4
0.5
Subglottal Pressure [kPa]
0.6
0.6
0.075
0.2
0.08
0
Time[ms]
20
0.1
Volume
Flow
0.07
20
0.1
0.06
0.08
0
(b)
0.14
0.15
0.16
Subglottal Pressure [kPa]
Time [ms]
20
0.17
FIG. 5. Phonation onset. No vocal tract is attached to the vocal fold model
in a, but in b vocal tract is attached. Tension parameter is set as T = 4.
Local maxima of the opening area of the lower mass a1 = lh1 are plotted for
both increasing crosses and decreasing circles subglottal pressure. The
small graphs inside of each diagram represent the volume flow U cm3 / ms
corresponding to each branch of the onset curve.
We induced register transitions in our model by changing the tension parameter T gradually and by measuring the
fundamental frequency F0, the amplitude of the opening
area, and the number of the colliding cover masses. We observed a steady increase in F0 and collision of all three cover
masses at low F0 and collision of only the top mass at high
F0. For simplicity, a binary classification is applied to draw
the register transitions of Figs. 68 as follows: collision of
three cover masses are termed chest, whereas collision of
less masses are termed falsetto. If only the upper masses
collide, open quotient OQ, defined as OQ= duration of the
open phase of the glottis/pitch period, became large as
known from measurement in singers Henrich et al., 2005.
In our bifurcation diagrams, with the tension T as the bifurcation parameter, we plotted the fundamental frequency F0
on the x-axis instead of T since this allows a direct comparison with glissando spectrograms.
Figures 6 and 7 compare register transitions of the isolated four-mass model with the ones of the complete model
including sub- and supraglottal resonances. In both cases, we
J. Acoust. Soc. Am., Vol. 127, No. 3, March 2010
300
400
Frequency [Hz]
500
600
FIG. 6. Register transition of the vocal fold model without vocal tract.
Frequency domains for chest and falsetto registers are drawn in the upper
graph, whereas the corresponding bifurcation diagrams are drawn in the
lower graph. The curves were drawn by both increasing dotted line with
crosses and decreasing solid line with circles tension parameter T. In the
bifurcation diagram, local maxima of the opening area of the lower mass
a1 = lh1 were plotted.
find a pronounced hysteresis of about 3040 Hz but relatively small jumps of the amplitudes. Most notable is the
dramatic shift in the transition due to the coupling to vocal
tract resonators. This observation reveals that the chestfalsetto transition depends sensitively on source-tract interactions.
In order to substantiate this finding, we varied the length
of the sub- and supraglottal tubes. First, we decreased and
increased the length of the subglottal tube by 25%. It turned
out that there are only minor effects on the register transition.
The length changes led to shifts in the transition point by
1015 Hz no graphs shown. In contrast, the supraglottal
resonance had a profound effect: changing the default length
of 17.5 cm to 75% or 125% induced major shifts in the
Type of registers
Time [ms]
Chest
Falsetto
0.08
0
Volume
Flow
0.065
200
0.8
0.6
0.4
0.2
0
200
300
400
Frequency [Hz]
500
600
FIG. 7. Register transition of the vocal fold model with vocal tract. The
default lengths for sub- and supraglottis are Lsub = 24.7 cm and Lsup
= 17.5 cm, respectively.
Tokuda et al.: Modeling register transitions
1531
Volume Flow
0.4
0.1
Long Supraglottis
3000
Short Supraglottis
Type of registers
Chest
Falsetto
200
300
400
Frequency [Hz]
500
600
Frequency [Hz]
2500
2000
1500
1000
500
(a)
10
10
Time [s]
0.65
0.6
1532
0.55
0.5
0.45
0.4
0
(b)
Time [s]
Open Quotient
V. EXPERIMENT
Subject
/a/
/i/
I
Hz
II
Hz
III
Hz
270 19
240 14
271 18
275 13
238 20
231 20
increasing F0, were computed from the ten data sets, as summarized in Table I. Because of the high variability of the
register transitions, the standard deviation was estimated to
be relatively large. According to Welchs t-test, the mean
frequency difference between /a/ and /i/ was statistically significant for subject I with a level of 1%. For the other two
subjects, the difference was not significant.
We remark that Titze et al. 2008 carried out the same
experimental framework in the context of voice instability
induced by the source-tract coupling. Their main focus was,
however, on the frequency jumps and not much attention has
been paid to the register change. They found many frequency
jumps induced by the F0-F1 crossing accompanied by hysteresis, in particular, for male subjects. Our observation essentially agrees with their study.
From the nonlinear dynamics point of view, voice registers are distinct types of limit cycle oscillations. In this context, phonation onset refers to a Hopf bifurcation and register
transitions are associated with bifurcations of limit cycles. In
Tokuda et al. 2007, we characterized register transition in
excised larynx experiments and simulations by twodimensional bifurcation diagrams. In that paper, we analyzed
a simple three-mass cover model.
Here we introduced a more realistic four-mass polygon
model coupled to sub- and supraglottal resonators. This
model was used to study bifurcations at phonation onset and
the chest-falsetto transition.
In Mergell et al. 2000, a smooth phonation onset was
quantified using the normal form of a supercritical Hopf bifurcation. This model explained high speed glottographic
data in a reasonable way. In excised larynx experiments,
however, amplitude jumps and hysteresis were reported at
phonation onset Berry et al., 1996. The present simulation
without vocal tract exhibits a subcritical Hopf bifurcation,
where the associated hysteresis is in good agreement with the
excised larynx experiments. Another simulation with vocal
tract showed that the coupling to the resonators lowers the
phonation onset threshold. It remains to be tested experimentally under which circumstances the phonation onset can be
regarded as super- or subcritical Hopf bifurcation.
It is well known that register transitions in untrained
singers are accompanied by vocal breaks see, e.g., vec and
Pek, 1994. In our simulations, we find indeed sudden
jumps of pitch and amplitudes while varying the tension parameter T smoothly. In computer simulations, the associated
phenomenon of hysteresis can be studied more easily than in
J. Acoust. Soc. Am., Vol. 127, No. 3, March 2010
ACKNOWLEDGMENTS
We thank Markus Hess and Frank Mller for the opportunity of acoustic and EGG recordings. We are grateful to
Tobias Riede for stimulating discussions. This work was supported by SCOPE Grant No. 071705001 of Ministry of
Internal Affairs and Communications MIC, Japan.
h1
2
+ k1,2y 1 y 2 = F1 ,
Tokuda et al.: Modeling register transitions
A1
1533
+ k1,2y 2 y 1 + k2,3y 2 y 3 = F2 ,
m3y 3 + r3y 3 y b + k3y 3 y b + h3c3
h2
2
Parameter
A2
h3
2
+ k2,3y 3 y 2 = F3 ,
A3
= 0.
hi,i1x,t =
A5
U
2 hx,tl
= P0 +
U
2 hminl
A6
xi
xi1
x xi1
Px,tdx
xi xi1
xi+1
xi
xi+1 x
Px,tdx.
xi+1 xi1
A7
Cover mass 1
Cover mass 2
Cover mass 3
Body mass
Stiffness of cover mass 1
Stiffness of cover mass 2
Stiffness of cover mass 3
Stiffness of body mass
Stiffness between cover masses 1 and 2
Stiffness between cover masses 2 and 3
Damping ratio of cover mass 1
Damping ratio of cover mass 2
Damping ratio of cover mass 3
Damping ratio of body mass
Collision stiffness of cover mass 1
Collision stiffness of cover mass 2
Collision stiffness of cover mass 3
Prephonatory displacement of cover mass 1
Prephonatory displacement of cover mass 2
Prephonatory displacement of cover mass 3
Height at vocal fold entrance
Height at vocal fold exit
Mass displacement at point 0
Mass displacement at point 1
Mass displacement at point 2
Mass displacement at point 3
Mass displacement at point 4
Glottal length
Symbol
Nominal value
m1
m2
m3
mb
k1
k2
k3
kb
k1,2
k2,3
1
2
3
b
c1
c2
c3
h01
h02
h03
h0
h4
x0
x1
x2
x3
x4
l
0.009 g
0.009 g
0.003 g
0.05 g
6.0 N/m
6.0 N/m
2.0 N/m
30.0 N/m
1.0 N/m
0.5 N/m
0.1
0.4
0.4
0.4
3k1
3k2
3k3
0.036 cm
0.036 cm
0.036 cm
1.8 cm
1.8 cm
0 cm
0.05 cm
0.2 cm
0.275 cm
0.3 cm
1.4 cm
A8
i = 1,2,3,b.
A9
Sub- and supraglottal resonances were described by using the wave-reflection model Kelly and Lochbaum, 1962;
Liljencrants, 1985; Story, 1995; Titze, 2006, which is a
time-domain model of the propagation of one-dimensional
planar acoustic waves through a collection of uniform cylindrical tubes. The supraglottal system was modeled as a
simple uniform tube area of 3 cm2 and length of 17.5 cm,
which is divided into 44 cylindrical sections. The area function for the subglottal tract was based on the one proposed by
Zaartu et al. 2007. The area function is composed of 62
cylindrical sections. For both sub- and supraglottal systems,
the section length z was set to 17.5/44 cm. This determines
the sampling time interval as t = z / c = 11.4 s, where c
= 350 m / s stands for the sound velocity. The corresponding
sampling frequency is 88 kHz.
Attenuation factor for the resonators was approximated
as ak = 1 0.007 / Ak1/2z Ak is kth cylinder area. Radiation resistance and radiation inertance at the lip were
128 c
,
92 AL
Rr =
Ir =
,
3/2
3
AL
B1
ag
ag
kt
A
ag
A
2kt
Pl + 2ps+ 2pe
c2
1/2
B2
Adachi, S., and Yu, J. 2005. Two-dimensional model of vocal fold vibration for sound synthesis of voice and soprano singing, J. Acoust. Soc.
Am. 117, 32133224.
Alipour, F., Berry, D. A., and Titze, I. R. 2000. A finite-element model of
vocal-fold vibration, J. Acoust. Soc. Am. 108, 30033012.
Berry, D. A., Herzel, H., Titze, I. R., and Krischer, K. 1994. Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillation with empirical eigenfunctions, J. Acoust. Soc. Am. 95, 35953604.
Berry, D. A., Herzel, H., Titze, I. R., and Story, B. H. 1996. Bifurcations
in excised larynx experiments, J. Voice 10, 12938.
Fant, G. 1960. The Acoustic Theory of Speech Production Moulton, The
Hague, The Netherlands.
Gmmel, A., Butenweg, C., and Kob, M. 2008. Calculation model of the
influence of the vocal fold shape and the ventricular folds on the laryngeal
flow, in Proceedings of the 6th International Conference on Voice Physiology and Biomechanics, Tampere, Finland, pp. 194196.
Guckenheimer, J., and Holmes, P. 1983. Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields Springer-Verlag, New
York.
Hatzikirou, H., Fitch, W. T., and Herzel, H. 2006. Voice instabilities due
to source-tract interactions, Acta Acust. 92, 468475.
Henrich, N., dAlessandro, C., Doval, B., and Castellengo, M. 2005.
Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency, J.
Acoust. Soc. Am. 117, 14171430.
Hirano, M. 1974. Morphological structure of the vocal cord as a vibrator
and its variations, Folia Phoniatr Basel 26, 8994.
Hirano, M., and Kakita, Y. 1985. Cover-body theory of vocal cord vibration, in Speech Science, edited by R. G. Daniloff College Hill Press, San
Diego, CA, pp. 146.
Hirano, M., Vennard, W., and Ohala, J. 1970. Regulation of register,
pitch, and intensity of voice, Folia Phoniatr Basel 22, 120.
Hollien, H. 1974. On vocal registers, J. Phonetics 2, 125143.
Horek, J., vec, J. G., Vesel, J., and Vilkman, E. 2004. Bifurcations in
excised larynges caused by vocal fold elongation, in Proceedings of the
International Conference on Voice Physiology and Biomechanics, edited
by A. Giovanni, P. Dejonckere, and M. Ouaknine, Laboratory of AudioPhonology, Marseille, France, pp. 8789.
Ishizaka, K., and Flanagan, J. L. 1972. Synthesis of voiced sounds from a
two-mass model of the vocal cords, Bell Syst. Tech. J. 51, 12331268.
Kelly, J. L., and Lochbaum, C. 1962. Speech synthesis, in Proceedings
of the 4th International Congress on Acoustics, Paper No. G42, pp. 14.
Liljencrants, J. 1985. Speech synthesis with a reflection-type line analog,
Ph.D. thesis, Royal Institute of Technology, Stockholm, Sweden.
Lous, N. J., Hofmans, G. C., Veldhuis, R. N. J., and Hirschberg, A. 1998.
A symmetrical two mass vocal fold model coupled to vocal tract and
trachea, with application to prothesis design, Acta Acust. 84, 11351150.
Lucero, J. C. 1998. Subcritical Hopf bifurcation at phonation onset, J.
Sound Vib. 218, 344349.
McCandless, S. 1974. An algorithm for automatic formant extraction using linear prediction spectra, IEEE Trans. Acoust., Speech, Signal Process. 22, 135141.
Mergell, P., Herzel, H., and Titze, I. R. 2000. Irregular vocal fold
vibrationHigh-speed observation and modeling, J. Acoust. Soc. Am.
108, 29963002.
Miller, D. G., vec, J. G., and Schutte, H. K. 2002. Measurement of
characteristic leap interval between chest and falsetto registers, J. Voice
16, 819.
Pelorson, X., Hirschberg, A., van Hassel, R. R., Wijnands, A. P. J., and
Auregan, Y. 1994. Theoretical and experimental study of quasi-steady
flow separation within the glottis during phonation, J. Acoust. Soc. Am.
96, 34163431.
Riede, T., and Zuberbhler, K. 2003. Pulse register phonation in Diana
monkey alarm calls, J. Acoust. Soc. Am. 113, 29192926.
Roubeau, B., Chevrie-Muller, C., and Arabia-Guidet, C. 1987. Electroglottographic study of the changes of voice registers, Folia Phoniatr
Basel 39, 280289.
Salomo, G. L., and Sundberg, J. 2008. Relation between perceived voice
register and flow glottogram parameters in males, J. Acoust. Soc. Am.
124, 546551.
Sciamarella, D., and dAlessandro, C. 2004. On the acoustic sensitivity of
a symmetrical two-mass model of the vocal folds to the variation of control parameters, Acta Acust. 90, 746761.
Shipp, T., Robert, E., and McGlone, R. E. 1971. Laryngeal dynamics
Tokuda et al.: Modeling register transitions
1535
ODE45 and it was confirmed that essentially the same results can be obtained.
To draw the bifurcation diagrams of Figs. 57, 20 local
maxima of the opening area of the lower mass a1 = lh1 were
plotted after discarding the transients. For the next parameter
values, the final state of the preceding simulation was used as
the initial condition.
The spectrogram of Fig. 4b was computed using the
minimum glottal area amin = lhmin with the following parameters. Sampling rate of 44 kHz, window length of 8192
sample points, overlap of 496 sample points, and Hanning
window.
1536
associated with voice frequency change, J. Speech Hear. Res. 14, 761
768.
Smith, M. E., Berke, G. S., Gerrat, B. R., and Kreiman, J. 1992. Laryngeal paralyses: Theoretical considerations and effects on laryngeal vibration, J. Speech Hear. Res. 35, 545554.
Steinecke, I., and Herzel, H. 1995. Bifurcations in an asymmetric vocal
fold model, J. Acoust. Soc. Am. 97, 18741884.
Stevens, K. 1999. Acoustic Phonetics MIT, Cambridge, MA.
Story, B. H. 1995. Physiologically-based speech simulation using an enhanced wave-reflection model of the vocal tract, Ph.D. thesis, University
of Iowa, Iowa City, IA.
Story, B. H., Laukkanen, A.-M., and Titze, I. R. 2000. Acoustic impedance of an artificially lengthened and constricted vocal tract, J. Voice 14,
455469.
Story, B. H., and Titze, I. R. 1995. Voice simulation with a body-cover
model of the vocal folds, J. Acoust. Soc. Am. 97, 12491260.
Sundberg, J., and Gauffin, J. 1979. Waveform and spectrum of the glottal
voice source, in Frontiers of Speech Communication Research, edited by
B. Lindblom and S. Oehman Academic, London, pp. 301320.
vec, J. G., and Pek, J. 1994. Vocal breaks from the modal to falsetto
register, Folia Phoniatr Basel 46, 97103.
vec, J. G., Schutte, H. K., and Miller, D. G. 1999. On pitch jumps
between chest and falsetto registers in voice: Data from living and excised
human larynges, J. Acoust. Soc. Am. 106, 15231531.
vec, J. G., Sundberg, J., and Hertegrd, S. 2008. Three registers in an
untrained female singer analyzed by videokymography, strobolaryngoscopy and sound spectrography, J. Acoust. Soc. Am. 123, 347353.
Tembrock, G. 1996. Akustische Kommunikation bei Sugetieren (Acoustic
Communication in Mammals) Wissenschaftliche Buchgesell, Darmstadt,
Germany.
Titze, I. R. 1988. The physics of small-amplitude oscillation of the vocal
folds, J. Acoust. Soc. Am. 83, 15361552.
Titze, I. R. 2000. Principles of Voice Production, 2nd ed. National Center
for Voice and Speech, Iowa City, IA.
Titze, I. R. 2006. Myoelastic Arodynamic Theory of Phonation National
Center for Voice and Speech, Iowa City, IA.