Professional Documents
Culture Documents
ACUSTICA
: Center for Advanced Science and Innovation, Osaka University, 2-1 Yamada-oka, Suita,
Osaka 565-0871, Japan. kato@casi.osaka-u.ac.jp
2
: Yoshimasa Electronics Inc., Japan
3
: Faculty of Education and Masters Course in Education, Kumamoto University, Japan
4
: Graduate School of Science and Technology, Kumamoto University, Japan
5
: Professor Emeritus, Kobe University, Japan
1. Introduction
In opera performances in a hall, it is important for singers
to attain both clarity with regard to the lyrics of the song
and a proper degree of liveness for the song in the audience area; here, these subjective attributes are considered to be in opposition [1]. Some singers tend to retain
their own performance that have been developed and established during long-term training in their practice rooms.
Other singers attempt to adapt their interpretation styles to
suit the acoustical conditions of the sound elds in a given
performance hall. Tsuyoshi Tsutsumi, a professional cellist and the president of Toho Gakuen School of Music,
stated that the idea of the concert hall as a great big instrument is very important and asserted that there are
many things we can do to help make the concert hall sound
better [2]. In order to discuss the latter approach with
respect to the relationship between operatic singing and
room acoustics, we dene the concept of blending of operatic singing with a given room. This concept includes
both the eorts by operatic singers to adapt their performance to a given room and the eorts by opera-house
acousticians to adjust the sound elds.
To describe the relationship between the temporal characteristics of sound source signals and those of room
acoustics, the decay characteristic of the autocorrelation
function (ACF) of a sound source signal has been investigated. This characteristic has been shown to be a parameter in the analysis of the subjective responses to sound
elds, and it represents the temporal coherence [3, 4, 5].
The eective duration e also represents the total amount
of randomness of the sound signals due to uctuating factors such as vibrato, intonation, and jitter [6, 7, 8, 9]. The
parameter e decreases with the increasing randomness of
the uctuations.
If musicians and composers are not familiar with the parameter e , it would be important to describe the e value in
relation to the musical score and musical expressivity. Previous studies measured (e )min for several types of music
signals although their main purpose was to clarify the rela-
421
ACUSTICA
422
p (; t, T )
p (0; t, T )p (0; + t, T )
(1)
where
p (; t, T ) =
1
2T
t+T
tT
p (s)p (s + ) ds.
(2)
In the above expression, 2T represents the integration interval and p (s) = p(t) s(t). The function p(t) denotes
the amplitude of the original waveform of the recorded
signals, and the function s(t) is chosen as the impulse response of the A-weighting lter corresponding to the ear
sensitivity. Note that the r-ACF is normalized by the geometric mean of the energy at t (p (0; t, T ) in equation 1)
and + t (p (0; t + t, T ) in equation 1) and should not
be normalized by only the energy at t; this ensures that
the normalized r-ACF satises p (0) = 1 and p () 1 at
> 0. The procedure for calculating the r-ACF is shown in
Appendix A1. If the signals are processed by applying an
FFT (fast Fourier transform) algorithm (FFT method A,
see Appendix A1) without the concept of time lag, window functions such as Hamming, Hanning, or Blackman
are sometimes used, for example, for technically detecting
the fundamental frequency (F0 ) of a sound signal. However, for simplication, we use a rectangular window as
the time window function for the sound data in this study.
The shape of the window function aects the decay rate
of the r-ACF and should be chosen carefully; it cannot
be easily determined at the present stage. We choose FFT
Vol. 93 (2007)
Figure 2. Examples of the measured e values of the r-ACF for twenty sung vowels (as sung by Sop. 3 and with four variably pitched
vowels) obtained from the analyses of three dierent integration intervals (: 2T = 100 ms, : 2T = 200 ms, and
: 2T =
500 ms).
Figure 3. Distribution of the (e )min values measured at six integration intervals for all the 205 data samples.
423
ACUSTICA
level was 0.0 dBA. Figure 4b shows an example of the running relative SPL (integration interval: 2T = 500 ms, running step: 100 ms, time window: rectangular), which was
calculated by using (0; t, T ) in equation (2). For calculating the (e )min values, the value of the running relative
SPL within the desired range was extracted as an explanatory variable termed relative SPL.
424
1
,
tk+1 tk1
[cycles/s],
ak1 2ak + ak+1
ak1 + 2ak + ak+1
(3)
,
(4)
Vol. 93 (2007)
Table I. Correlation coecients among four quantitative variables. Log: log10 (e )min , RSPL: Relative SPL, Vr: Vibrato rate,
Ve: Vibrato extent. **: 1% signicant level, *: 5% signicant
level.
Log
RSPL
Vr
Ve
Log
RSPL
Vr
Ve
1.00
0.35**
1.00
+0.23*
0.02
1.00
0.47**
+0.05
0.45**
1.00
Figure 5. Distribution of the quantitative variables for the individual singers: (a) (e )min , (b) relative SPL, (c) pitch, (d) vibrato
rate, and (e) vibrato extent.
425
ACUSTICA
3. Experiment II: Investigation of the relation between (e )min and operatic singing
with dierent subjective singing volume
3.1. Experimental method and analysis
The rst aim of this experiment is to investigate whether
singers can consciously vary the (e )min values of the
steady-state part of vowels in relation to (1) subjective
singing volume with regard to the expression marks shown
as ppp (pianississimo), mf (mezzo forte), and f (fortississimo) on vocal scores, (2) tone pitch, and (3) vowel selection. The second aim is to compare the results obtained
from Experiment I with those obtained from this experiment.
3.1.1. Subjects and recording conditions
Ten subjects, including professional singers, students of
a conservatory, and trained amateur singers, participated
in this experiment. The proles of the subjects are listed
in Table II. A singers voice was recorded in an anechoic
chamber using a 1/2-inch condenser microphone placed 25
cm in front and 5 cm to the side of the singers mouth. The
voice signals were sampled at 44.1 kHz.
426
Voice cl.
Prof. edu.
Music exp.
Age
Sop. 1
Sop. 2
Sop. 3
Soprano
27
4
4
33
20
17
43
25
22
Mez. 1
Mez. 2
Mez. 3
Mezzo soprano
26
14
4
26
30
14
42
34
22
Ten. 1
Ten. 2
Tenor
0
0
20
20
45
25
Bar.
Baritone
39
39
54
Bas.
Bass
27
27
38
Vol. 93 (2007)
DF
Sum of squares
Mean sq.
F-ratio
Model
Error
Total
109
2590
2699
217
64
281
1.99
0.02
80.60
p < 0.001
Table III. b. Results of the analysis of variance (ANOVA). Significant level: 1% . SV: Subjective singing volume; TP: Tone pitch.
The values larger than 10% are typed in bold font. Cont. rat.:
Contribution ratio [%].
Factor
DF
Sum of squares
Cont. rat.
SV
TP
Vowel
Subject
SV*TP
SV*Vowel
TP*Vowel
SV*Subject
TP*Subject
Vowel*Subject
Total
2
2
4
9
4
8
8
18
18
36
46.1
1.2
13.4
101
1.6
2.5
1.9
11.3
29.1
9.4
217
16.4
0.4
4.8
35.8
0.6
0.9
0.7
4.0
10.4
3.3
77.2
(5)
where a1 (SV ), a2 (P itch), a3 (V owel), and k1 are values calculated from a multiple regression analysis with
427
ACUSTICA
a3(Vowel)
Category
pp (pianissimo)
mf (mezzo forte)
(fortissimo)
Low
Middle
High
/a/
/e/
/i/
/o/
/u/
+0.17
0.03
0.14
+0.03
0.02
0.01
0.09
0.04
0.01
+0.02
+0.12
(6)
(j)
(7)
428
Figure 7. Coecients obtained from multiple regression analyses with dummy variables for the ten singers. : Mean values of
the coecients among the ten singers (corresponding to a1 (SV ),
a2 (P itch), and a3 (V owel) in equations (5) and (6) and listed in
Table IV).
varied among the individual singers, SV showed a similar trend for the individual singers; the (e )min values decreased with a louder vocal eort. Table V lists the contribution from the contribution ratios of SV, TP, and vowel
to the (e )min values for each singer. As noted, the values
greatly varied among the singers. The total contribution
ratio for each singers model varied from 41% (Sop. 3) to
88% (Sop. 2). Figure 8 shows the relationship between
the measured values of (e )min and the value calculated
by equation (7) for each singer. Although the residuals
with respect to (1) the interactions between factors and
(2) intra-individual changes within 6 trials were included,
the correlation coecient between the measured and calculated (e )min values was suciently high, amounting to
0.86 (p < 0.01).
Figure 9a shows the distribution of the (e )min values
for the sung vowels (10 singers). The values showed an
Vol. 93 (2007)
Singing volume
Tone pitch
Vowel
Total
Sop. 1
Sop. 2
Sop. 3
46**
7**
19**
14**
79**
17**
3**
2**
5**
63**
88**
41**
Mez. 1
Mez. 2
Mez. 3
41**
32**
4**
4**
<1
31**
9**
31**
22**
55**
63**
57**
Ten. 1
Ten. 2
44**
44**
9**
3**
8**
21**
61**
69**
Bar.
40**
1*
17**
58**
Bas.
34**
12**
47**
Figure 9. Distribution of the quantitative variables for the individual singers: (a) (e )min value, (b) relative SPL, (c) absolute
SPL, (d) vibrato rate, and (d) vibrato extent.
ual tones ranged between 6.8 ms and 1482 ms, and the geometric mean for the individual singers ranged between
18 ms (Sop. 2) and 100 ms (Mez. 3). The geometric mean
across the singers was 39 ms.
The relative SPL, absolute SPL, VR, and VE were considered as explanatory variables for (e )min in this study.
Figures 9be show the distribution of these variables for
the sung vowels of all the singers.
The s.d. of the relative SPL, which showed the dynamic
range of the SPL for each singer, varied between 7.5 dBA
(Sop. 3) and 11 dBA (Sop. 2), while the global mean
across the singers was 9.0 dBA (Figure 9b). The absolute
SPL varied between 77 dBA (Mez. 3) and 95 dBA (Sop.
1), while the global mean across the singers was 88 dBA
(Figure 9c).
Figure 9d shows observations of the VR. For individual
tones, this ranged between 2.4 cycles/s and 9.2 cycles/s,
and the mean for the individual singers varied between
5.0 cycles/s (Bar. 1) and 6.9 cycles/s (Mez. 3). The global
mean across the singers was 5.7 cycles/s. The s.d. of each
singers VR, which expressed the intra-individual variation of the VR, varied between 0.3 cycle/s (Sop. 2) and 1.3
cycles/s (Mez. 3); it amounted to 0.6 cycle/s for the global
average of the s.d. for VR across singers. The results of
the one-way ANOVA showed that the variable subject
contributed signicantly to the VR (R2 = 0.34, p < 0.01).
Figure 9e shows the ndings for the VE. For individual
tones, this ranged between 0.0 cents and 120 cents and
the singer mean varied between 8.0 cents (Mez. 3) and
73 cents (Sop. 2). The global mean across the singers
was 36 cents. The s.d. of each singers VE, which expressed the intra-individual variation of the VE, varied
between 5.0 cents (Mez. 3) and 24 cents (Sop. 2),
and it was 13 cents for the global average of the s.d.
of the VE across the singers. The result of the one-way
ANOVA showed that the variable subject contributed
signicantly to the VE (R2 = 0.56, p < 0.01).
Table VI lists the correlation matrix among ve quantitative variables: log10 (e )min , relative SPL, absolute SPL,
VR, and VE. The relative SPL, absolute SPL, and VE had
a signicant negative correlation less than 0.35 with the
log10 (e )min values (r = 0.37, p < 0.01; r = 0.52,
p < 0.01; and r = 0.72, p < 0.01, respectively); the
(e )min values decreased with a relatively louder voice, an
absolutely louder voice, and a greater VE. A small positive correlation was observed between log10 (e )min and the
VR (r = +0.34, p < 0.01).
Figure 10a shows the distributions of (e )min for the ve
vowels. The geometric mean values of (e )min for the vowels were /a/: 32 ms, /e/: 36 ms, /i/: 38 ms, /o/: 41 ms, and
/u/: 51 ms. The result of the one-way ANOVA showed that
vowel contributed signicantly to the log10 (e )min values (R2 = 0.05, p < 0.01). The results of the one-sample
t-test for each pair of vowels showed the relationship /a/
< /e/ = /i/ = /o/ < /u/ (p < 0.05). The s.d. of log10 (e )min
for each vowel, which described the intra-vowel variation,
was /a/: 0.24, /e/: 0.31, /i/: 0.37, /o/: 0.30, and /u/: 0.34.
The distribution of the relative SPL for the ve vowels is
shown in Figure 10b. The mean values of the relative SPL
429
ACUSTICA
log10 (e )min
Relative SPL
Absolute SPL
Vibrato rate
Vibrato extent
log10 (e )min
Relative SPL
Absolute SPL
Vibrato rate
Vibrato extent
1.00
0.37
1.00
0.52
+0.89
1.00
+0.34
+0.01
0.09
1.00
0.72
+0.22
+0.41
0.35
1.00
/a/: 8.8 dBA, /e/: 8.8 dBA, /i/: 8.8 dBA, /o/: 8.8 dBA, and
/u/: 9.2 dBA. The intra-vowel variation was rather large in
vowel /u/ than in the other vowels.
The distribution of the absolute SPL for the ve vowels is shown in Figure 10c. The mean values of the absolute SPL for the vowels were /a/: 90 dBA, /e/: 89 dBA,
/i/: 87 dBA, /o/: 89 dBA, and /u/: 86 dBA. The results of
the one-way ANOVA showed that vowel selection contributed signicantly to the absolute SPL (R2 = 0.02,
p < 0.01). The results of the one- sample t-test for each
pair of vowels showed the relationship /a/ = /o/ = /e/ > /i/
= /u/ (p < 0.05); the singers produced a lower voice at /i/
and /u/ than at the other vowels. The s.d. of the absolute
SPL for each vowel was equal to that of the relative SPL.
Figures 10c and d show the distributions of the VR and
VE for the ve vowels. In both the cases, the mean values for each vowel were similar. The result of the one-way
ANOVA showed that vowel selection does not make a substantial contribution to the VR (R2 = 0.003, p = 0.06) and
VE (R2 = 0.01, p < 0.01). The intra-vowel variations in
both the VR and VE were similar among the vowels; the
s.d. of the values was on average 0.9 cycle/s for the VR
and 21 cents for the VE.
4. Discussion
The present study provided a method for evaluating the
randomness of uctuating acoustical components of the
steady-state part of vowels of operatic singing by using the
parameter (e )min . In the following discussion, the results
are compared with previous studies.
4.1. Range of (e )min values for vowels sung by operatic singers
Figure 10. Distribution of the quantitative variables for the ve
vowels: (a) (e )min , (b) relative SPL, (c) absolute SPL, (d) vibrato rate, and (d) vibrato extent.
for the vowels were /a/: +1.6 dBA, /e/: +0.8 dBA, /i/: 1.3
dBA, /o/: +1.1 dBA, and /u/: 2.3 dBA. The results of the
one-way ANOVA showed that vowel selection contributed
signicantly to the relative SPL (R2 = 0.03, p < 0.01).
The results of the one-sample t-test for each pair of vowels
showed the relationship /a/ = /o/ = /e/ > /i/ = /u/ (p <
0.05); the singers produced a lower voice at /i/ and /u/ than
at the other vowels. The s.d. of the relative SPL for each
vowel, which represented the intra-vowel variation, was
430
Vol. 93 (2007)
431
ACUSTICA
5. Conclusions
A computational investigation into the characteristics of
operatic singing voices was conducted using a time-domain parameter (e )min derived from the running autocorrelation function (r-ACF) of the sound signals. To gain
good controllability of the (e )min value for singing voice
signals, we proposed the use of a new sound source model
of the steady-state part of vowels of operatic singing that
432
Figure A1. Direct method to determine the running autocorrelation function (r-ACF) in the time domain.
can connect the (e )min values to both subjective and objective parameters of voice acoustics and musical acoustics.
The resulting observations were as follows:
1. The (e )min value decreases with a louder voice and also
decreases with a greater vibrato extent. (see Tables I
and VI).
2. The contribution of the contribution ratios of the subjective singing volume, tone pitch, and vowel selection
to the value of (e )min depends on the individual singer
(see Figures 7 and 8 and Table V).
Acknowledgments
The authors are grateful to Ken-Ichi Sakakibara and Dennis Noson for many helpful suggestions and stimulating
discussions. The authors would like to thank the singers
who participated in the recording session. The authors
would also like to thank Kazuki Eguchi for technical support in building a calculation program for our data set. This
work was supported by a Grant-in-Aid for Scientic Research from the Japan Society for the Promotion of Science for Young Scientists.
Appendix
A1. Procedure for calculating running autocorrelation function
The procedure for calculating the r-ACF by using the direct method in the time domain is illustrated in Figure A1.
The ACF is well known as a method for estimating the
Vol. 93 (2007)
fundamental frequency (F0 ) of a sound signal; this is derived by determining the time lag between the origin and
the rst major peak of the function. Since the fundamental
frequency of a musical sound signal is higher than 100 Hz
in most cases, the required maximum lag (max ) to obtain
the fundamental period is around 10 ms at most. Yet, in
order to obtain the eective duration (e ) of the r-ACF,
the required value of (max ) is greater than 50 ms as far
as the operatic singing voice of vowels is concerned (see
Appendix A2).
Figure A2 shows a comparison of ACFs and power
spectra obtained by dierent methods. ACFs that employ
direct methods are obtained in the time domain. Based on
the Wiener-Khintchine theorem, ACFs that use FFT methods are obtained by a transform in the frequency domain
(by FFT) followed by the performance of an inverse FFT
(IFFT) calculation. It is important to note that the WienerKhintchine theorem is mathematically satised only for
completely periodic or innite-length signals, and it is not
satised for a quasi-periodic signal obtained for the analysis of operatic singing voices. Even in a practical situation,
the variation in both the ACFs and power spectra for different calculation methods is evident (see Figures A2at).
It is not possible to nd even one matched pair of the running ACF and running power spectrum for quasi-periodic
signals. Thus, we reiterate that the transform methods and
their precise denitions should be carefully examined before conducting an analysis of voice signals.
433
ACUSTICA
[2] T. Tsutsumi: The relationship between music and the concert hall. J. Temporal Des. Arch. Environ. 6 (2006) 7881.
http://www.jtdweb.org/.
[3] Y. Ando: Architectural acoustics blending sound sources,
sound elds, and listeners. Chapters 3, 4, 6, and 7. AIP
Press/Springer-Verlag, New York, 1998.
[4] Y. Ando, H. Sakai, S. Sato: Formulae describing subjective
attributes for sound elds based on the model of auditorybrain system. J. Sound Vib. 232 (2000) 101127.
[5] Y. Ando, H. Sakai, S. Sato: Correlation factors describing
primary and spatial sensations of sound elds. J. Sound
Vib. 258 (2002) 405417.
[6] J. Sundberg: The science of the singing voice. Northern
Illinois University Press, Dekalb, 1987.
[7] I. Titze: Principles of voice production.
1994.
Prentice-Hall,
This denition of the initial part of the r-ACF for calculating e was ecient as far as the operatic singing voice
of vowels was concerned.
It is possible to consider the peak at = 0 when we t
the envelope decay plot with a straight line. However, the
regression coecient of the tting model obtained by the
least mean square method by excluding the peak at = 0
has been better than that of the tting model that includes
the peak at = 0, particularly when the fundamental frequency is low (e.g., the bass singer). The lower the pitch,
the weaker is the rst major peak corresponding to the fundamental frequency. Therefore, we chose to exclude the
peak at = 0.
References
[1] T. Hidaka, L. Beranek: Objective and subjective evaluations
of twenty-three opera houses in Europe, Japan, and Americas. J. Acoust. Soc. Am. 107 (2000) 368383.
434
[10] T. Taguti, Y. Ando: Characteristics of the short-term autocorrelation function of sound signals in piano performances. In: Music and concert hall acoustics. Y. Ando,
D. Noson (eds.). Academic Press, London, 1997, Chapter
23.
[11] K. Kato, Y. Ando: A study of the blending of vocal music
with the sound eld by dierent singing styles. J. Sound
Vib. 258 (2002) 463472.
[12] I. Nakayama: CDs singings in Japanese using a common
verse. Singing voices data base 17-18, 2002.
[13] K. Mouri, K. Akiyama, Y. Ando: Preliminary study on recommended time duration of source signals to be analyzed,
in relation to its eective duration of the auto-correlation
function. J. Sound Vib. 241 (2001) 8795.
[14] K. Kato, T. Hirawa, K. Kawai, Y. Yano, Y. Ando: Investigation of the relation between (e )min and operatic singing
with dierent vibrato styles. J. Temporal Des. Arch. Environ. 6 (2006) 3548. http://www.jtdweb.org/.
[15] N. Henrich: Mirroring the voice from Garcia to the present
day: Some insights into singing voice registers. Logopedics
Phoniatrics Vocology 31 (2006) 314.
[16] C. Seashore: Psychology of music. McGraw-Hill, New
York, 1938.