You are on page 1of 14

ACTA ACUSTICA UNITED WITH

Vol. 93 (2007) 421 434

ACUSTICA

Investigation of the Relation between Minimum


Effective Duration of Running Autocorrelation
Function and Operatic Singing with Different
Interpretation Styles
Kosuke Kato1 , Kenji Fujii2 , Takatsugu Hirawa3 , Keiji Kawai4 , Takashi Yano4 , Yoichi Ando5
1

: Center for Advanced Science and Innovation, Osaka University, 2-1 Yamada-oka, Suita,
Osaka 565-0871, Japan. kato@casi.osaka-u.ac.jp
2
: Yoshimasa Electronics Inc., Japan
3
: Faculty of Education and Masters Course in Education, Kumamoto University, Japan
4
: Graduate School of Science and Technology, Kumamoto University, Japan
5
: Professor Emeritus, Kobe University, Japan

Invited paper based on a presentation at Forum Acusticum 2005 in Budapest


Summary
Operatic singers may be interested in knowing how their vocal sound is related to the sound eld in a given room.
This paper attempts to analyze singing voice signals through the use of a sound source for the steady-state part
of the vowels in operatic singing. This simplied sound source model enables us to investigate the relationship
between operatic singing and room acoustics in terms of the minimum value of the eective duration of the
running autocorrelation function (e )min of the voice signals. An analysis shows that a useful method is to further
introduce vibrato extent and singing volume, which play an important role in decreasing the (e )min value
of voice signals. These two factors aecting the (e )min value are greatly dependent on the individual singer;
therefore, it is important for each individual singer to know his or her own variation range of vibrato extent and
singing volume.
PACS no. 43.75.Rs, 43.75.St, 43.70.Gr, 43.55.Hy, 43.55.Br

1. Introduction
In opera performances in a hall, it is important for singers
to attain both clarity with regard to the lyrics of the song
and a proper degree of liveness for the song in the audience area; here, these subjective attributes are considered to be in opposition [1]. Some singers tend to retain
their own performance that have been developed and established during long-term training in their practice rooms.
Other singers attempt to adapt their interpretation styles to
suit the acoustical conditions of the sound elds in a given
performance hall. Tsuyoshi Tsutsumi, a professional cellist and the president of Toho Gakuen School of Music,
stated that the idea of the concert hall as a great big instrument is very important and asserted that there are
many things we can do to help make the concert hall sound
better [2]. In order to discuss the latter approach with
respect to the relationship between operatic singing and

Received 22 November 2005, revised 20 September 2006,


accepted 22 December 2006.

S. Hirzel Verlag EAA

room acoustics, we dene the concept of blending of operatic singing with a given room. This concept includes
both the eorts by operatic singers to adapt their performance to a given room and the eorts by opera-house
acousticians to adjust the sound elds.
To describe the relationship between the temporal characteristics of sound source signals and those of room
acoustics, the decay characteristic of the autocorrelation
function (ACF) of a sound source signal has been investigated. This characteristic has been shown to be a parameter in the analysis of the subjective responses to sound
elds, and it represents the temporal coherence [3, 4, 5].
The eective duration e also represents the total amount
of randomness of the sound signals due to uctuating factors such as vibrato, intonation, and jitter [6, 7, 8, 9]. The
parameter e decreases with the increasing randomness of
the uctuations.
If musicians and composers are not familiar with the parameter e , it would be important to describe the e value in
relation to the musical score and musical expressivity. Previous studies measured (e )min for several types of music
signals although their main purpose was to clarify the rela-

421

ACTA ACUSTICA UNITED WITH


Vol. 93 (2007)

ACUSTICA

tionship between (e )min for a sound source signal and the


psychological responses to the sound elds of rooms [3].
In an investigation of the variation in the running ACF (rACF) for musical signals related to the performing style,
the factors tempo, articulation, and damper pedaling were
shown to be the main expressive factors for controlling e
for the music from a piano [10]. However, prior to our
studies, no systematic attempt was made to examine variations in e for singing voices.
In order to achieve the temporal blending of an operatic
singing voice with a given sound eld, our study started
by investigating the features of e for the voices of amateur tenors for musical pieces that were sung in dierent
falsetto and operatic styles; here, the singing styles were
dened by the instruction provided to the singers by the
authors [11]. However, controlling e for singing voice
signals of musical pieces that included transient musical
tones, consonant-vowel sounds (CV), and voice onsets and
osets was too complicated. Subsequently, our attention
was drawn to the fact that vibrato was one of the most
important characteristics in operatic singing and therefore
it could be examined to determine the variation of (e )min
within the steady-state part of single-tone vowels.
This study is conducted in three steps. The rst stage
(Experiment I) describes the e characteristics of professional operatic singing voices for the steady-state part of
vowels in relation to (1) vowel selection, (2) sound pressure level, (3) vibrato rate, and (4) vibrato extent. The second stage (Experiment II) investigates whether singers can
consciously vary the (e )min values of the steady-state part
of vowels, relative to the expression marks shown as ppp
(pianississimo), mf (mezzo forte), and f (fortississimo) on
vocal scores, in terms of the subjective singing volume.
Finally, the consistency of the results from the two experiments is discussed.

2. Experiment I: Investigation of e for the


running autocorrelation function of vowels sung by thirteen professional opera
performers
2.1. Method
2.1.1. Singing voice data
The singing voice data was obtained from a commercially
available anechoic recording of thirteen professional operatic singers (4 sopranos, 2 mezzo-sopranos, 1 alto, 3
tenors, 2 baritones, and 1 bass-baritone) performing with
single-tone vowels. The singers were asked to sing the ve
vowels (/a/, /e/, /i/, /o/, and /u/) in two to ve dierent
pitches [12]. The total material included 205 data samples.
In order to statistically isolate the steady-state part of a
vocal segment, the segments at the beginning and end of
the tone that obeyed the following rules were eliminated
in advance.
Rule I: Segments with an A-weighted sound pressure level
(SPL) that was 30 dBA lower than the maximum SPL of
the tone,

422

Kato et al: Autocorrelation of operatic singing

Figure 1. Example of the determination of the eective duration


(e ) of the r-ACF of a vowel sung in operatic style.

Rule II: Segments with a fundamental frequency (F0 ) that


was larger or smaller than the mean fundamental frequency of the tone by 150 cents (approximately the maximum range of the vibrato extent [9]).
In this isolation process, the important vocal segments
with the vibrato content were not eliminated. The maximum vibrato extent (for the denition, see subsection 2.1.4
of this paper) through all the 205 data was 77 cents.
The tone duration for the remaining segments was on
average 1.7 0.4 (s.d.) [s].
2.1.2. Analysis of e for the running autocorrelation
function
The r-ACF as a function of the time lag () can be formulated as
p () = p (; t, T ) =

p (; t, T )
p (0; t, T )p (0; + t, T )

(1)

where
p (; t, T ) =

1
2T

t+T
tT

p (s)p (s + ) ds.

(2)

In the above expression, 2T represents the integration interval and p (s) = p(t) s(t). The function p(t) denotes
the amplitude of the original waveform of the recorded
signals, and the function s(t) is chosen as the impulse response of the A-weighting lter corresponding to the ear
sensitivity. Note that the r-ACF is normalized by the geometric mean of the energy at t (p (0; t, T ) in equation 1)
and + t (p (0; t + t, T ) in equation 1) and should not
be normalized by only the energy at t; this ensures that
the normalized r-ACF satises p (0) = 1 and p () 1 at
> 0. The procedure for calculating the r-ACF is shown in
Appendix A1. If the signals are processed by applying an
FFT (fast Fourier transform) algorithm (FFT method A,
see Appendix A1) without the concept of time lag, window functions such as Hamming, Hanning, or Blackman
are sometimes used, for example, for technically detecting
the fundamental frequency (F0 ) of a sound signal. However, for simplication, we use a rectangular window as
the time window function for the sound data in this study.
The shape of the window function aects the decay rate
of the r-ACF and should be chosen carefully; it cannot
be easily determined at the present stage. We choose FFT

Kato et al: Autocorrelation of operatic singing

ACTA ACUSTICA UNITED WITH ACUSTICA

Vol. 93 (2007)

Figure 2. Examples of the measured e values of the r-ACF for twenty sung vowels (as sung by Sop. 3 and with four variably pitched
vowels) obtained from the analyses of three dierent integration intervals (: 2T = 100 ms, : 2T = 200 ms, and
: 2T =
500 ms).

method C instead of the direct method for the purpose of


faster computation (compare Figure A2a, Figure A2b, and
Figure A2e).
Figure 1 illustrates an example of the logarithm of the
absolute value of the r-ACF as a function of . Note that
this gure does not show a decrease in the sound pressure
level of the room sound eld, such as that seen in a reverberation curve; however, the gure shows the decrease in
the absolute value of the signal autocorrelation. The parameter e is dened by the time lag at which the envelope
of the r-ACF becomes 10 dB; this can be obtained by extrapolating the decay rate to 10 dB (i.e., |p (t)| = 0.1).
In the previous studies [3, 4, 5], the envelope decay for
the initial part of the logarithm of the r-ACF of musical pieces has been considered to be linear. In this study,
a straight line obtained by using the least mean squares
method was tted to the major local peaks corresponding
to multiples of the fundamental period in the range extending from the amplitude of the rst major peak to the amplitude of the rst major peak subtracted by 5 dB. However,
if all the major local peaks from the origin to = 50 ms
exceeded the amplitude of the rst major peak subtracted
by 5 dB then a straight line was tted through all the major
local peaks (see Appendix A2).
The values of e can be dependent on the integration interval (2T ) of the r-ACF; preliminary analyses were made
for six integration intervals of 2T = 20, 50, 100, 200, 500,
and 1000 ms. Figure 2 provides examples of the running e
of variably pitched vowels sung by a professional soprano
singer and analyzed on the basis of the dierent integration intervals. Although the e values varied according to

Figure 3. Distribution of the (e )min values measured at six integration intervals for all the 205 data samples.

the integration interval, some of the local minimum peaks,


including the minimum value of running e , appeared to
be rather independent of the integration interval.
Figure 3 shows the (e )min values as a function of the
integration interval for all the 205 tones. The values of
(e )min for the shortest integration interval of 2T = 20 ms
are signicantly greater than the values obtained for other
integration intervals (both in the single-sample t-test, p <
0.01, and the paired t-test, p < 0.01). It is interesting to
note that the change in the (e )min values is less for the integration intervals exceeding 50 ms. Although the two-way
analysis of variance (ANOVA) showed that the (e )min
values for individual tones signicantly change according
to the integration interval, the one-way ANOVA showed

423

ACTA ACUSTICA UNITED WITH


Vol. 93 (2007)

ACUSTICA

Kato et al: Autocorrelation of operatic singing

level was 0.0 dBA. Figure 4b shows an example of the running relative SPL (integration interval: 2T = 500 ms, running step: 100 ms, time window: rectangular), which was
calculated by using (0; t, T ) in equation (2). For calculating the (e )min values, the value of the running relative
SPL within the desired range was extracted as an explanatory variable termed relative SPL.

Figure 4. Comparison of (a) the measured e values of the r-ACF


with a stepping interval of 100 ms (2T = 500 ms), (b) the measured relative A-weighted sound pressure level with a stepping
interval of 100 ms (2T = 500 ms), and (c) the vibrato waveform
as sung by Sop. 3 with vowel D5 /e/.

that the integration interval had a statistically signicant


(p < 0.01) but substantially little (R2 = 0.01) eect on
the (e )min value.
How should a proper integration interval for the r-ACF
analysis be selected? Ando stated that the integration interval for musical source signals should be set around 2T
= 2.0 s since the results of psychological experiments had
shown that (e )min values obtained with a 2.0 s integration interval had a good predictive relationship with the
listener-preferred temporal variations of sound elds [3].
Subsequently, Mouri et al. reported that the integration interval 2T should be set around 30 times the value of (e )min
for sound signals when considering the subjective analysis
of loudness perception [13].
Considering this investigative background and the observed behavior of running e and (e )min values of sung
vowels mentioned above (in Figures 2 and 3), the present
study focused on the values of (e )min obtained from the
analysis for 2T = 500 ms and considered them as the representative values. This integration interval is around 30
times the typical value of (e )min ; it is shorter than the data
length of the shortest tone (data length = 1038 ms) and
longer than one vibrato cycle. Figure 4a shows an example of the running e and (e )min obtained from the analysis
for 2T = 500 ms.
2.1.3. Relative sound pressure level (relative SPL)
Singers can vary the sound level by changing the subglottal pressure. We hypothesized that a louder voice decreases the (e )min values because the power ratio of highfrequency components due to the vocal noise normally increases with a louder voice [7]. Unfortunately, because the
calibration signal for determining the absolute SPL was
not included in the singers recordings, only a relative SPL
was calculated by normalizing the sound levels to each
singers mean SPL for all vowel segments. The normalized

424

2.1.4. Vibrato rate and vibrato extent


The fundamental frequency (F0 ) contours were tracked
by making use of the rst major peak of r-ACF (integration interval: 2T = 20 ms, running step: 4 ms, time window: rectangular). Then, the resulting F0 contours were
low-pass ltered at 10 Hz to obtain the vibrato waveform,
which can be dened as a periodic frequency uctuation
approximately within the range 48 cycles/s. Figure 4c
shows an example of the vibrato waveform. The extreme
F0 peaks within the desired range for calculating (e )min
values were extracted to analyze the vibrato rate (VR)
and vibrato extent (VE) calculated from three adjacent extremes using Prames strategy [8, 9],
VR(k) =

1
,
tk+1 tk1

VE(k) = 1200 log2 1 +

[cycles/s],
ak1 2ak + ak+1
ak1 + 2ak + ak+1

(3)
,

(4)

latter in [cents], where the values of t1 , t2 , . . . , tn1 , tn and


a1 , a2 , . . . , an1 , an were extracted from the peak values of
F0 (t) in the range (e )min , as illustrated in Figure 4c. These
calculations were repeated in steps of k, i.e., for each half
cycle. Finally, the mean values of VR and VE were calculated.
2.2. Results
Figure 5a illustrates the distribution of (e )min for the
vowels sung by all the singers. The values of (e )min
were approximately log-normally distributed. The values
of (e )min for individual tones ranged between 6.3 ms and
246 ms, and the singer geometric means varied between
15 ms (Ten. 3) and 39 ms (Alt.). The global geometric
mean across the singers was 24 ms. The result of the
one-way ANOVA showed that the factor subject signicantly contributed to the variable log10 (e )min (R2 = 0.30,
p < 0.01).
The relative SPL, VR, and VE were considered as explanatory variables for (e )min in this study. Figures 5be
show the distribution of these variables for the vowels
sung by all the singers. The standard deviation (s.d.) of
the relative SPL, which shows the dynamic range of the
SPL for each singer, varied between 3.2 dBA (Alt.) and
8.0 dBA (Sop. 3), while the global mean across singers
was 5.0 dBA (Figure 5b).
The singers performed by using the ve vowels in two to
ve dierent pitches. As displayed in Figure 5c, the dierence between the highest and lowest tones of each singer
varied between 0.2 octave (Mez. 1) and 1.7 octave (Sop.
2), and the global mean of the dierence across singers
was 0.9 octave.

ACTA ACUSTICA UNITED WITH ACUSTICA

Kato et al: Autocorrelation of operatic singing

Vol. 93 (2007)

Table I. Correlation coecients among four quantitative variables. Log: log10 (e )min , RSPL: Relative SPL, Vr: Vibrato rate,
Ve: Vibrato extent. **: 1% signicant level, *: 5% signicant
level.

Log
RSPL
Vr
Ve

Log

RSPL

Vr

Ve

1.00

0.35**
1.00

+0.23*
0.02
1.00

0.47**
+0.05
0.45**
1.00

Figure 5. Distribution of the quantitative variables for the individual singers: (a) (e )min , (b) relative SPL, (c) pitch, (d) vibrato
rate, and (e) vibrato extent.

Figure 5d shows observations of the VR. This rate


ranged between 4.3 cycles/s and 7.1 cycles/s for individual tones, and the singer mean varied between 4.9 cycles/s
(Ten. 2) and 6.5 cycles/s (Mez. 1). The global mean across
singers was 5.6 cycles/s. The s.d. of each singers VR,
which expressed the intra-individual variation of the VR,
varied between 0.1 cycle/s (Ten. 2) and 0.8 cycle/s (Sop.
1), and it was 0.3 cycle/s for the global average across the
singers. The results of the one-way ANOVA showed that
the variable subject made a signicant contribution to
the VR (R2 = 0.65, p < 0.01).
Figure 5e shows the ndings for the VE. For individual tones, it ranged between 6.5 cents and 77 cents and
the singer mean varied between 24 cents (Sop. 1) and 59
cents (Bar. 1). The global mean across the singers was
44 cents. The s.d. of each singers VE, which expressed
the intra-individual variation of the VE, varied between 4
cents (Ten. 2) and 13 cents (Mez. 1) and was 8 cents for
the global average across the singers. The result of a one-

Figure 6. Distribution of the quantitative variables for the ve


vowels: (a) (e )min , (b) relative SPL, (c) vibrato rate, and (d) vibrato extent.

way ANOVA showed that the variable subject made a


signicant contribution to the VE (R2 = 0.64, p < 0.01).
Table I lists the correlation matrix among four quantitative variables: log10 (e )min , relative SPL, VR, and VE.
The relative SPL and VE had signicant negative correlations with the log10 (e )min values (r = 0.35, p < 0.01
and r = 0.47, p < 0.01, respectively); the (e )min values decreased with a relatively louder voice and greater
VE. A small positive correlation was observed between
log10 (e )min and VR (r = +0.23, p < 0.05).
Figure 6a shows the distributions of (e )min for the ve
vowels. The geometric mean value of (e )min for each
vowel was /a/: 20 ms, /e/: 19 ms, /i/: 26 ms, /o/: 26 ms, and

425

ACTA ACUSTICA UNITED WITH


Vol. 93 (2007)

ACUSTICA

/u/: 36 ms. The result of the one-way ANOVA showed that


vowel selection made a signicant contribution to the values of log10 (e )min (R2 = 0.15, p < 0.01). The results of
the one- sample t-test for each pair of vowels showed the
relationship /a/ = /e/ < /i/ = /o/ < /u/ (p < 0.05). The s.d.
of log10 (e )min for each vowel, which described the intravowel variation, was /a/: 0.15, /e/: 0.17, /i/: 0.32, /o/: 0.17,
and /u/: 0.30; the intra-vowel variation was larger in the
vowels /i/ and /u/ than the vowels /a/, /e/, and /o/.
The distribution of the relative SPL for the ve vowels is shown in Figure 6b. The mean value of the relative SPL for each vowel was /a/: +1.2 dBA, /e/: +1.8 dBA,
/i/: 0.2 dBA, /o/: +0.8 dBA, and /u/: 3.6 dBA. The results of the one- way ANOVA showed that vowel selection
signicantly contributed to the relative SPL (R2 = 0.15,
p < 0.01). The results of the one-sample t-test for each
pair of vowels showed the relationship /a/ = /e/ = /i/ = /o/
> /u/ (p < 0.05); the singers produced a smaller voice
at /u/ than at the other vowels. The s.d. of the relative
SPL for each vowel, which represented the intra-vowel
variation, was /a/: 4.4 dBA, /e/: 3.9 dBA, /i/: 4.7 dBA, /o/:
4.1 dBA, and /u/: 5.9 dBA. The intra-vowel variation was
rather larger in vowel /u/ than in the other vowels.
Figures 6c and d show the distributions of VR and VE
for the ve vowels. In both the cases, the mean values
for the vowels were similar. The result of the one-way
ANOVA showed that vowel selection does not signicantly contribute to the VR (R2 = 0.01, p = 0.90) and
VE (R2 = 0.01, p = 0.84). The results of the two-way
ANOVA also showed that vowel selection did not signicantly contribute to the VR (p = 0.45) and VE (p = 0.18).
The intra-vowel variations in both the VR and VE were
similar among the vowels; the s.d. of the values was on
average 0.6 cycle/s for the VR and 14 cents for the VE.

3. Experiment II: Investigation of the relation between (e )min and operatic singing
with dierent subjective singing volume
3.1. Experimental method and analysis
The rst aim of this experiment is to investigate whether
singers can consciously vary the (e )min values of the
steady-state part of vowels in relation to (1) subjective
singing volume with regard to the expression marks shown
as ppp (pianississimo), mf (mezzo forte), and f (fortississimo) on vocal scores, (2) tone pitch, and (3) vowel selection. The second aim is to compare the results obtained
from Experiment I with those obtained from this experiment.
3.1.1. Subjects and recording conditions
Ten subjects, including professional singers, students of
a conservatory, and trained amateur singers, participated
in this experiment. The proles of the subjects are listed
in Table II. A singers voice was recorded in an anechoic
chamber using a 1/2-inch condenser microphone placed 25
cm in front and 5 cm to the side of the singers mouth. The
voice signals were sampled at 44.1 kHz.

426

Kato et al: Autocorrelation of operatic singing

Table II. Status of ten singers who participated in Experiment II.


Voice cl.: Voice classication, Prof. edu.: Professional education
(years), Music exp.: Music experience (years).
Singer

Voice cl.

Prof. edu.

Music exp.

Age

Sop. 1
Sop. 2
Sop. 3

Soprano

27
4
4

33
20
17

43
25
22

Mez. 1
Mez. 2
Mez. 3

Mezzo soprano

26
14
4

26
30
14

42
34
22

Ten. 1
Ten. 2

Tenor

0
0

20
20

45
25

Bar.

Baritone

39

39

54

Bas.

Bass

27

27

38

3.1.2. Recording tasks


The subjects were asked to sing the ve vowels (/a/, /e/, /i/,
/o/, and /u/) six times in three dierent pitches. The tone
pitches were set for each voice classication: soprano (F4,
C5, F5), mezzo-soprano (C4#, G4#, C5#), tenor (F3, C4,
F4), baritone (C3#, G3#, C4#), and bass (B2b, F3, B3b).
The target frequencies were given by a pitch pipe before
the changing of the tone pitch. The duration of each tone
was set at 1.8 s and maintained with the aid of a visual
metronome (a dotted half-note with 100 bpm) located 1.5
m in front of the singer. This duration was determined by
the following factors: (I) most of the singers reported that
it was easy to sing without pausing for breath for this duration even in the anechoic chamber, (II) this duration was
almost the same as that of the database used in Experiment II of this study, and (III) this duration was sucient
for calculating the r-ACF with the 500-ms integration interval.
The main purpose of this experiment, which was to investigate the eect of subjective singing volume, was explained to the singers as follows:
ppp (pianississimo): sing at the lowest volume that you
use for musical pieces in an actual situation,
mf (mezzo forte): sing at the volume that you think is at
the middle of pianississimo and fortississimo,
f (fortississimo): sing at the highest volume that you
use for musical pieces in actual situations.
Approximately one hour was required to obtain a recording of all the samples for each subject. During this recording session, none of the singers reported any fatigue
eects. The total material included 2,700 tones (5 vowels
6 trials 3 pitches 3 subjective singing volumes 10
subjects).
3.1.3. Analysis of e for running autocorrelation function
In order to statistically isolate the steady-state part of a
vocal segment, the segments at the beginning and end of
the tone satisfying the following rules (similar to those of
Experiment I) were eliminated in advance.

ACTA ACUSTICA UNITED WITH ACUSTICA

Kato et al: Autocorrelation of operatic singing

Rule I: Segments with an absolute A-weighted SPL that


was lower than 60 dBA.
Rule II: Segments with a fundamental frequency (F0 ) that
was larger or smaller than the mean fundamental frequency of the tone by 150 cents (approximately the maximum rage of the vibrato extent [9]).
In this statistical isolation process, the important vocal
segments comprising the vibrato content were not eliminated. The maximum VE in the entire 2700 vowels was
120 cents.
The tone duration for the remaining segments was 1.8
0.2 (s.d.) [s].
The parameter e was calculated by the same procedure
as described in subsection 2.1.2 of this paper. We selected
xed integration times for all the data (2T = 500 ms).
3.1.4. ANOVA and linear prediction models of (e )min
One of the purposes of Experiment II was to examine
whether the singers could consciously vary the (e )min values using musical cues without having a knowledge of
physical parameters such as the sound pressure level, VR,
and VE. Hence, the following three musical factors that
were specied in our instruction to the singers, as described in subsection 3.1.2, were assumed to be qualitative
explanatory variables for (e )min in this study: (1) subjective singing volume (3 levels: ppp, mf, and f), (2) tone
pitch (3 levels: with the absolute height depending on the
vocal part), and (3) vowel selection (5 levels: /a/, /e/, /i/,
/o/, and /u/). For the case where individual singer variations are not necessary, the factor subject (10 levels) can
be treated as a residual; however, this study included this
factor among the explanatory variables because of the focus on individual singer variations rather than the average
across all the singers. The results of ANOVA, including
the three musical factors and the factor subject, might
reect the relative importance of the musical factors with
regard to the individual singer variations. Consequently, a
four-way ANOVA was performed.
The linear prediction model for the (e )min values that
employs the three musical factors and the factor subject
is of interest to us because it is convenient to use. In this
study, the linear prediction process was divided into three
parts. First, we performed linear prediction by using only
the three musical factors (the simplest case). Second, we
performed linear prediction by using the three musical factors and the factor subject. The obtained result reected
the eectiveness of the relative variation of (e )min and not
the absolute variation. Lastly, we performed linear prediction by using the three musical factors for each individual
singer. The result made it possible to compare the variation
in (e )min for each individual singer among the ten singers.
3.1.5. Analysis of the other acoustical parameters
The other acoustical parameters dealt with in Experiment
I, such as relative SPL, VR, and VE, were analyzed in order to examine the consistency of the results between Experiment I and Experiment II. Further, because the calibration data for the analysis were available for Experiment II,

Vol. 93 (2007)

Table III. a. Results of the analysis of variance (ANOVA). Mean


sq.: Mean square.
Factor

DF

Sum of squares

Mean sq.

F-ratio

Model
Error
Total

109
2590
2699

217
64
281

1.99
0.02

80.60
p < 0.001

Table III. b. Results of the analysis of variance (ANOVA). Significant level: 1% . SV: Subjective singing volume; TP: Tone pitch.
The values larger than 10% are typed in bold font. Cont. rat.:
Contribution ratio [%].
Factor

DF

Sum of squares

Cont. rat.

SV
TP
Vowel
Subject
SV*TP
SV*Vowel
TP*Vowel
SV*Subject
TP*Subject
Vowel*Subject
Total

2
2
4
9
4
8
8
18
18
36

46.1
1.2
13.4
101
1.6
2.5
1.9
11.3
29.1
9.4
217

16.4
0.4
4.8
35.8
0.6
0.9
0.7
4.0
10.4
3.3
77.2

the value of the running absolute SPL within the desired


range was also extracted as the explanatory variable absolute SPL for calculating the (e )min values.
3.2. Results
Table III .a and b list the results of ANOVA after tting
a model with four variables to the data that included all
the 2700 sung vowels. We performed the full factorial
ANOVA in which the main eects of each variable and interactions among the variables were included in the model.
The measured total contribution ratio for the model was
77.2%.
For practical convenience, the sum of squares of each
main eect and interaction were normalized by the total
sum of squares; these are shown along with the main contributing factors in bold font in Table III.a The fourth column of Table III.b lists the contribution of each contribution ratio to the log10 (e )min values on the percentile scale.
Since the three factors subjective singing volume (SV),
tone pitch (TP), and vowel selection (vowel) are dependent on the musical expression, they can be indicated on
a musical score. Further, they are also inuenced by the
interpretation styles of the individual singers.
In order to predict the absolute value of (e )min , a linear prediction model that employs the three factors and
excludes the factor subject can be formulated as
log10 (e )min a1 (SV ) + a2 (P itch)
+ a3 (V owel) + k1 ,

(5)

where a1 (SV ), a2 (P itch), a3 (V owel), and k1 are values calculated from a multiple regression analysis with

427

ACTA ACUSTICA UNITED WITH


Vol. 93 (2007)

ACUSTICA

Kato et al: Autocorrelation of operatic singing

Table IV. Values of a1 (SV ), a2 (P itch), and a3 (V owel) in equation (5).


Factor (Item)
a1 (SV )
a2(Pitch)

a3(Vowel)

Category

Coecient (Cat. score)

pp (pianissimo)
mf (mezzo forte)
(fortissimo)
Low
Middle
High
/a/
/e/
/i/
/o/
/u/

+0.17
0.03
0.14
+0.03
0.02
0.01
0.09
0.04
0.01
+0.02
+0.12

dummy variables; these were tted to the data. Table IV


lists the values of a1 (SV ), a2 (P itch) and a3 (V owel). The
value of the constant k1 in equation (5) is equal to the
logarithm of the geometric mean of (e )min across all the
singers. The total contribution of SV, TP, and vowel was
only 21.6%. Although SV had a relatively high contribution, this model did not result in good predictions for the
absolute values of (e )min (see Table IIIb).
The contribution ratio of subject was high (35.8%).
When this factor is added to the above model expressed in
equation (5), we can formulate the following expression:
log10 (e )min a1 (SV ) + a2 (P itch)
+ a3 (V owel) + a4 (Subject) + k2 ,

(6)

where a1 (SV ), a2 (P itch), and a3 (V owel) have values


identical with those in equation (5) and listed in Table IV. The expression a4 (Subject) + k2 denotes the
logarithm of the geometric mean value of (e )min for each
singer. These values ranged between log10 (18) (Sop. 2)
and log10 (100) (Mez. 3). As a result of tting this model,
the total contribution ratio became 57.4% as shown in Table IIIb.
The contribution ratios for the interaction between subject and the other three factors, especially the interaction
of TP and subject, were rather high (Table IIIb). We decided to further examine the data by comparing the linear
prediction models for individual singers, which were formulated as follows:
log10 (e )min

(j)

b1(j) (SV ) + b2(j) (P itch)

(7)

+ b3(j) (V owel) + k3(j) , j = 1, 2, . . .


where b1(j) (SV ), b2(j) (P itch), and b3(j) (V owel) were calculated from the multiple regression analyses with dummy
variables for J individual singers; these values were tted to the data. Figure 7 shows the values of b1(j) (SV ),
b2(j) (P itch), and b3(j) (V owel). The k3(j) values denote
the logarithm of the geometric mean value of (e )min for
each singer and are identical to a4 (Subject) + k2 mentioned in the preceding paragraph. Although the values
of b1(j) (SV ), b2(j) (P itch), b3(j) (V owel), and k3(j) greatly

428

Figure 7. Coecients obtained from multiple regression analyses with dummy variables for the ten singers. : Mean values of
the coecients among the ten singers (corresponding to a1 (SV ),
a2 (P itch), and a3 (V owel) in equations (5) and (6) and listed in
Table IV).

Figure 8. Relationship between the measured (e )min values and


those calculated by tting models formulated using equation (7)
for the individual singers.

varied among the individual singers, SV showed a similar trend for the individual singers; the (e )min values decreased with a louder vocal eort. Table V lists the contribution from the contribution ratios of SV, TP, and vowel
to the (e )min values for each singer. As noted, the values
greatly varied among the singers. The total contribution
ratio for each singers model varied from 41% (Sop. 3) to
88% (Sop. 2). Figure 8 shows the relationship between
the measured values of (e )min and the value calculated
by equation (7) for each singer. Although the residuals
with respect to (1) the interactions between factors and
(2) intra-individual changes within 6 trials were included,
the correlation coecient between the measured and calculated (e )min values was suciently high, amounting to
0.86 (p < 0.01).
Figure 9a shows the distribution of the (e )min values
for the sung vowels (10 singers). The values showed an

ACTA ACUSTICA UNITED WITH ACUSTICA

Kato et al: Autocorrelation of operatic singing

Vol. 93 (2007)

Table V. Contribution ratios [%] of subjective singing volume,


tone pitch, and vowel to the values of (e )min for each individual
singer. The maximum and minimum for each variable are typed
in bold. *: 5% signicant level, **: 1% signicant level.
Singer

Singing volume

Tone pitch

Vowel

Total

Sop. 1
Sop. 2
Sop. 3

46**
7**
19**

14**
79**
17**

3**
2**
5**

63**
88**
41**

Mez. 1
Mez. 2
Mez. 3

41**
32**
4**

4**
<1
31**

9**
31**
22**

55**
63**
57**

Ten. 1
Ten. 2

44**
44**

9**
3**

8**
21**

61**
69**

Bar.

40**

1*

17**

58**

Bas.

34**

12**

47**

Figure 9. Distribution of the quantitative variables for the individual singers: (a) (e )min value, (b) relative SPL, (c) absolute
SPL, (d) vibrato rate, and (d) vibrato extent.

approximately normal distribution on a logarithmic scale


and not on a linear scale. The (e )min values for individ-

ual tones ranged between 6.8 ms and 1482 ms, and the geometric mean for the individual singers ranged between
18 ms (Sop. 2) and 100 ms (Mez. 3). The geometric mean
across the singers was 39 ms.
The relative SPL, absolute SPL, VR, and VE were considered as explanatory variables for (e )min in this study.
Figures 9be show the distribution of these variables for
the sung vowels of all the singers.
The s.d. of the relative SPL, which showed the dynamic
range of the SPL for each singer, varied between 7.5 dBA
(Sop. 3) and 11 dBA (Sop. 2), while the global mean
across the singers was 9.0 dBA (Figure 9b). The absolute
SPL varied between 77 dBA (Mez. 3) and 95 dBA (Sop.
1), while the global mean across the singers was 88 dBA
(Figure 9c).
Figure 9d shows observations of the VR. For individual
tones, this ranged between 2.4 cycles/s and 9.2 cycles/s,
and the mean for the individual singers varied between
5.0 cycles/s (Bar. 1) and 6.9 cycles/s (Mez. 3). The global
mean across the singers was 5.7 cycles/s. The s.d. of each
singers VR, which expressed the intra-individual variation of the VR, varied between 0.3 cycle/s (Sop. 2) and 1.3
cycles/s (Mez. 3); it amounted to 0.6 cycle/s for the global
average of the s.d. for VR across singers. The results of
the one-way ANOVA showed that the variable subject
contributed signicantly to the VR (R2 = 0.34, p < 0.01).
Figure 9e shows the ndings for the VE. For individual
tones, this ranged between 0.0 cents and 120 cents and
the singer mean varied between 8.0 cents (Mez. 3) and
73 cents (Sop. 2). The global mean across the singers
was 36 cents. The s.d. of each singers VE, which expressed the intra-individual variation of the VE, varied
between 5.0 cents (Mez. 3) and 24 cents (Sop. 2),
and it was 13 cents for the global average of the s.d.
of the VE across the singers. The result of the one-way
ANOVA showed that the variable subject contributed
signicantly to the VE (R2 = 0.56, p < 0.01).
Table VI lists the correlation matrix among ve quantitative variables: log10 (e )min , relative SPL, absolute SPL,
VR, and VE. The relative SPL, absolute SPL, and VE had
a signicant negative correlation less than 0.35 with the
log10 (e )min values (r = 0.37, p < 0.01; r = 0.52,
p < 0.01; and r = 0.72, p < 0.01, respectively); the
(e )min values decreased with a relatively louder voice, an
absolutely louder voice, and a greater VE. A small positive correlation was observed between log10 (e )min and the
VR (r = +0.34, p < 0.01).
Figure 10a shows the distributions of (e )min for the ve
vowels. The geometric mean values of (e )min for the vowels were /a/: 32 ms, /e/: 36 ms, /i/: 38 ms, /o/: 41 ms, and
/u/: 51 ms. The result of the one-way ANOVA showed that
vowel contributed signicantly to the log10 (e )min values (R2 = 0.05, p < 0.01). The results of the one-sample
t-test for each pair of vowels showed the relationship /a/
< /e/ = /i/ = /o/ < /u/ (p < 0.05). The s.d. of log10 (e )min
for each vowel, which described the intra-vowel variation,
was /a/: 0.24, /e/: 0.31, /i/: 0.37, /o/: 0.30, and /u/: 0.34.
The distribution of the relative SPL for the ve vowels is
shown in Figure 10b. The mean values of the relative SPL

429

ACTA ACUSTICA UNITED WITH


Vol. 93 (2007)

ACUSTICA

Kato et al: Autocorrelation of operatic singing

Table VI. Correlation coecients among the ve quantitative variables.

log10 (e )min
Relative SPL
Absolute SPL
Vibrato rate
Vibrato extent

log10 (e )min

Relative SPL

Absolute SPL

Vibrato rate

Vibrato extent

1.00

0.37
1.00

0.52
+0.89
1.00

+0.34
+0.01
0.09
1.00

0.72
+0.22
+0.41
0.35
1.00

/a/: 8.8 dBA, /e/: 8.8 dBA, /i/: 8.8 dBA, /o/: 8.8 dBA, and
/u/: 9.2 dBA. The intra-vowel variation was rather large in
vowel /u/ than in the other vowels.
The distribution of the absolute SPL for the ve vowels is shown in Figure 10c. The mean values of the absolute SPL for the vowels were /a/: 90 dBA, /e/: 89 dBA,
/i/: 87 dBA, /o/: 89 dBA, and /u/: 86 dBA. The results of
the one-way ANOVA showed that vowel selection contributed signicantly to the absolute SPL (R2 = 0.02,
p < 0.01). The results of the one- sample t-test for each
pair of vowels showed the relationship /a/ = /o/ = /e/ > /i/
= /u/ (p < 0.05); the singers produced a lower voice at /i/
and /u/ than at the other vowels. The s.d. of the absolute
SPL for each vowel was equal to that of the relative SPL.
Figures 10c and d show the distributions of the VR and
VE for the ve vowels. In both the cases, the mean values for each vowel were similar. The result of the one-way
ANOVA showed that vowel selection does not make a substantial contribution to the VR (R2 = 0.003, p = 0.06) and
VE (R2 = 0.01, p < 0.01). The intra-vowel variations in
both the VR and VE were similar among the vowels; the
s.d. of the values was on average 0.9 cycle/s for the VR
and 21 cents for the VE.

4. Discussion
The present study provided a method for evaluating the
randomness of uctuating acoustical components of the
steady-state part of vowels of operatic singing by using the
parameter (e )min . In the following discussion, the results
are compared with previous studies.
4.1. Range of (e )min values for vowels sung by operatic singers
Figure 10. Distribution of the quantitative variables for the ve
vowels: (a) (e )min , (b) relative SPL, (c) absolute SPL, (d) vibrato rate, and (d) vibrato extent.

for the vowels were /a/: +1.6 dBA, /e/: +0.8 dBA, /i/: 1.3
dBA, /o/: +1.1 dBA, and /u/: 2.3 dBA. The results of the
one-way ANOVA showed that vowel selection contributed
signicantly to the relative SPL (R2 = 0.03, p < 0.01).
The results of the one-sample t-test for each pair of vowels
showed the relationship /a/ = /o/ = /e/ > /i/ = /u/ (p <
0.05); the singers produced a lower voice at /i/ and /u/ than
at the other vowels. The s.d. of the relative SPL for each
vowel, which represented the intra-vowel variation, was

430

Figure 5a shows that the log-normally distributed values of


(e )min for the vowels sung by the thirteen professional operatic performers ranged between 6.3 ms and 246 ms (Experiment I). Figure 9a shows that the log-normally distributed (e )min for the vowels sung by the ten trained operatic performers ranged between 6.8 ms and 1482 ms (Experiment II). Let us suppose that a singer produces nearly
non-periodic ((e )min 0 ms) or nearly periodic signals
((e )min ) with extreme values of (e )min , e.g., by the
combination of the intentional change in the vibrato [14].
The eort of operatic singers to reduce the degree of the
vibrato prolongs the (e )min value. Conversely, the eort
to enhance the degree of the vibrato decreases the (e )min
value.

Kato et al: Autocorrelation of operatic singing

4.2. Potential for the variation of (e )min values by


subjective interpretation styles
The ability of the singers to vary the (e )min values in terms
of (1) subjective singing volume, (2) tone pitch, and (3)
vowel selection was examined in Experiment II. The results for a number of subjects that were formulated on
the basis of equation (5) showed that the predictability
of the absolute value of (e )min by the variables SV, TP,
and vowel was limited (Table IIIb). Thus, at the present
stage, it may be dicult to develop a unique model for the
prediction of (e )min values that can be employed for all
the singers.
It is interesting to note that the contribution ratio of
subject was high (Table IIIb). This may be due to a wide
variation in the voice quality and phonation style [6, 7] of
the individual singers. The present results suggested that
the intra-individual variation of (e )min in terms of SV, TP,
and vowel was relatively more realizable; this was because the common model for all the singers that was formulated using equation (6) resulted in a total contribution
ratio of 57.4% (Table IIIb). Thus, for predicting (e )min
values, it may be more eective for singers to be aware of
their individual averaged values of (e )min , which can be
measured as described in this study. The results presented
in Figure 7 and Table V show that the contribution ratios
of SV, TP, and vowel varied greatly among the individual singers. Only SV had a recognizable predictive trend
toward a higher relative contribution to the (e )min values.
Note that a greater subjective singing volume causes the
(e )min values to decrease (Figure 7). This phenomenon
might be closely related to changes in the vocal production mechanism when the subjective singing volume is
changed. The eort to produce a greater subjective singing
volume may cause non-periodic voice signal production
from the vocal fold. If the voice signals are amplied electrically, the (e )min value is not decreased but remains constant. The present observed phenomenon may be closely
related to changes in the vocal production mechanism
when changing the subjective singing volume. A larger vocal eort may require a thicker vocalis muscle, the main
part of the vocal fold, resulting in non-periodic sound signal production from the voice source [7]. For example,
in a frequency domain with a spectrum tilt that is correlated with high-frequency vocal noise, a louder subjective singing volume may result in a gentle spectrum-tilt
for singers normally having sharp spectrum tilt. In order
to describe the eect of the subjective singing volume relative to the quantitative parameters, acoustic analyses of
signals, such as the spectrum-tilt analysis are required to
be addressed in a future work.
The results obtained from the (e )min values of the ve
vowels (/a/, /e/, /i/, /o/, and /u/) are noteworthy (Figures 6
and 10). The average (e )min values for the ten singers was
in the following order: /a/ < /e/ < /i/ < /o/ < /u/. This nding
is useful for discussing the periodicity of the sung vowels in relation to the production mechanism of the singing
voice. The present result may be closely related to those of
previous studies on vowels that involve factors such as the

ACTA ACUSTICA UNITED WITH ACUSTICA

Vol. 93 (2007)

position of the tongue or the opening of the mouth and jaw


during singing [7]; these physiological aspects of singing
may need to be addressed in future studies. The fact that
the contribution of the contribution ratios of vowel selection to the (e )min values for the three sopranos who participated in Experiment II are smaller than those for the other
singers (Table V) is encouraging. This result is as expected
since vowels tend to be sung in a more similar manner at
high pitches (F0 F1 matching).
The high interaction of TP with subject may be related to variations in the changes of the vocal register of
an individual singer [6, 7, 15]. Although the vocal register
cannot be easily dened on the basis of past literature on
singing voice [15], it is interesting to examine the relationship between (e )min values and the factors extracted from
electroglottography (EGG). This requires a future work
to establish a connection between studies on (e )min and
physiological acoustics.
4.3. Potential for the variation of (e )min values by
referring to the objective musical parameters
The correlation between the (e )min values and relative
SPL was negative both in Experiment I and Experiment
II (Tables I and VI). Further, the correlation between the
(e )min values and absolute SPL was negative in Experiment II (Table VI). These results imply that the randomness of the uctuating component of an operatic singing
voice normally increases with a louder voice. Again, this is
interesting because the (e )min value does not decrease but
remains constant if the voice signals are amplied electrically. This may also partially be due to the changes in
the vocal production mechanism as discussed in subsection 4.2. Further, this phenomenon can be explained as
follows. It is well established that the exponential decay
of the ACF mirrors an equivalent rectangular bandwidth
with the same maximum and energy: any modication of
the signal characteristics that is likely to enlarge the bandwidth, such as vibrato, jitter, and increased number of harmonics, will decrease the (e )min value.
Notice that the (e )min values had a negative correlation
with the VE (Tables I and VI). If pure tones are modulated
with perfect periodicity, the amplitude of the r-ACF again
increases to 0 dB at the modulation period. This may be
closely related to changes in the vocal production mechanism when changing the VE. Voices that have a natural inclination toward a greater VE may add to the nonperiodicity of the voice signal, and they may be accompanied by increased perturbation components such as jitter and shimmer. Further, it is reasonable to measure the
bandwidth of the power spectra of the sound signals in the
manner discussed in the last paragraph. A number of investigations employing acoustic analyses and physiological studies have revealed characteristic variations in both
the perturbation and uctuation of voice signals [7]. However, the relationship between the (e )min value and the results of these studies has not been examined and are to be
addressed in a future work.

431

ACTA ACUSTICA UNITED WITH


Vol. 93 (2007)

ACUSTICA

It appears clear that the (e )min value decreases with a


greater VE, arising mainly from inter-singer dierences in
the VE, as shown in the results from Experiments I and II
(Table I, Table VI, Figure 5, and Figure 9). Since the observed values of the VE from 0.0 cents to 120 cents are
comparable to the range observed by Seaschore (30 cents
to 70 cents) [16] and that reported by Prame (for F. Schuberts Ave Maria, at 34 cents to 123 cents) [9], we can
assume that singers may actually be capable of achieving
lower (e )min values at extremely larger VEs, a condition
we did not observe in this study. Conversely, early baroque
music or a Gregorian chant sung with a smaller VE may
result in signicantly larger (e )min values.
The results of Figures 6a and 10a illustrate that the
(e )min values for the ve vowels showed a statistically
signicant variation. The increase in (e )min for vowel /u/
may be partially due to the decrease in the relative SPL
and/or absolute SPL, which is negatively correlated with
the (e )min values (Tables I and VI); the increase is clearly
illustrated for vowel /u/ in Figures 6b, 10b, and 10c. We
initially assumed that the VE, which was negatively correlated with the (e )min values (Tables I and VI), would
have a similar inuence on the ve vowels and thus would
cause a similar inter-vowel variation in the (e )min values.
Yet, the results of Figures 6d and 10e show that vowel selection has no signicant trend due to the VE. This suggests that operatic singers may control the VE to maintain
an audible eect that is at a xed level for all the vowels.
The present observed inter-vowel dierence in (e )min values suggests the necessity of a future examination of this
result through a comparison with the formant frequency
and the formant bandwidth of vowels for operatic singing
[7].
The intra-vowel variations of the (e )min values were
larger for vowels /i/ and /u/ than for vowels /a/, /e/, and /o/
(Figures 6a and 10a) although the intra-vowel variations of
the relative SPL, absolute SPL, VR, and VE were similar
among these vowels (Figures 6bd and 10be). It is possible to examine this phenomenon with regard to the differences in vocal production mechanisms between speech
and operatic singing. In the case of singing vowel /i/, operatic singers may move the tongue from a frontal position in
the low pitch range to the back at a high pitch range. This
may be a partial reason for the larger intra-vowel variation
in the sung vowel /i/ and suggests that the dierence in the
vocal production mechanism in general should be sought
in studies comparing the variations in vowel sounds between speech and operatic singing.

5. Conclusions
A computational investigation into the characteristics of
operatic singing voices was conducted using a time-domain parameter (e )min derived from the running autocorrelation function (r-ACF) of the sound signals. To gain
good controllability of the (e )min value for singing voice
signals, we proposed the use of a new sound source model
of the steady-state part of vowels of operatic singing that

432

Kato et al: Autocorrelation of operatic singing

Figure A1. Direct method to determine the running autocorrelation function (r-ACF) in the time domain.

can connect the (e )min values to both subjective and objective parameters of voice acoustics and musical acoustics.
The resulting observations were as follows:
1. The (e )min value decreases with a louder voice and also
decreases with a greater vibrato extent. (see Tables I
and VI).
2. The contribution of the contribution ratios of the subjective singing volume, tone pitch, and vowel selection
to the value of (e )min depends on the individual singer
(see Figures 7 and 8 and Table V).
Acknowledgments
The authors are grateful to Ken-Ichi Sakakibara and Dennis Noson for many helpful suggestions and stimulating
discussions. The authors would like to thank the singers
who participated in the recording session. The authors
would also like to thank Kazuki Eguchi for technical support in building a calculation program for our data set. This
work was supported by a Grant-in-Aid for Scientic Research from the Japan Society for the Promotion of Science for Young Scientists.

Appendix
A1. Procedure for calculating running autocorrelation function
The procedure for calculating the r-ACF by using the direct method in the time domain is illustrated in Figure A1.
The ACF is well known as a method for estimating the

Kato et al: Autocorrelation of operatic singing

ACTA ACUSTICA UNITED WITH ACUSTICA

Vol. 93 (2007)

Figure A2. Comparison of the ACFs and


power spectra obtained from dierent calculation methods.

fundamental frequency (F0 ) of a sound signal; this is derived by determining the time lag between the origin and
the rst major peak of the function. Since the fundamental
frequency of a musical sound signal is higher than 100 Hz
in most cases, the required maximum lag (max ) to obtain
the fundamental period is around 10 ms at most. Yet, in
order to obtain the eective duration (e ) of the r-ACF,
the required value of (max ) is greater than 50 ms as far
as the operatic singing voice of vowels is concerned (see
Appendix A2).
Figure A2 shows a comparison of ACFs and power
spectra obtained by dierent methods. ACFs that employ
direct methods are obtained in the time domain. Based on
the Wiener-Khintchine theorem, ACFs that use FFT methods are obtained by a transform in the frequency domain
(by FFT) followed by the performance of an inverse FFT
(IFFT) calculation. It is important to note that the WienerKhintchine theorem is mathematically satised only for
completely periodic or innite-length signals, and it is not
satised for a quasi-periodic signal obtained for the analysis of operatic singing voices. Even in a practical situation,
the variation in both the ACFs and power spectra for different calculation methods is evident (see Figures A2at).
It is not possible to nd even one matched pair of the running ACF and running power spectrum for quasi-periodic
signals. Thus, we reiterate that the transform methods and
their precise denitions should be carefully examined before conducting an analysis of voice signals.

Although FFT method A or FFT method B (method


to avoid circular calculation) is usually used for the purpose of fast computation and is accompanied by a window
function such as Hamming, Hanning, or Blackman, in order to obtain the ACF corresponding to the direct method,
FFT method C (see Figure A2e) must be used. If FFT
method C is chosen instead of the direct method for
performing a fast calculation, the segment over the maximum time lag is omitted because this segment is obtained
from circular calculation, as illustrated in direct method
C (see Figure A2b and the corresponding Figure A2e).

A2. Denition of the initial part of running autocorrelation function (r-ACF)


for calculating its eective duration
(e ).
The denition of the initial part of the r-ACF is briey
described here. In the previous studies [11, 12, 13], the envelope decay of the initial part of the logarithm of the
absolute value of the r-ACF of the source signals of musical sounds has been considered to be linear. However, we
need to dene the initial part of the r-ACF before calculating e because the decay rate of the r-ACF varies, as
illustrated in Figure A3. For this study, the initial part
of the r-ACF was set for both the X-axis (0 ms to 50 ms)
and Y-axis (from the amplitude of the rst major peak to
the amplitude of the rst major peak subtracted by 5 dB).

433

ACTA ACUSTICA UNITED WITH


Vol. 93 (2007)

ACUSTICA

Kato et al: Autocorrelation of operatic singing

[2] T. Tsutsumi: The relationship between music and the concert hall. J. Temporal Des. Arch. Environ. 6 (2006) 7881.
http://www.jtdweb.org/.
[3] Y. Ando: Architectural acoustics blending sound sources,
sound elds, and listeners. Chapters 3, 4, 6, and 7. AIP
Press/Springer-Verlag, New York, 1998.
[4] Y. Ando, H. Sakai, S. Sato: Formulae describing subjective
attributes for sound elds based on the model of auditorybrain system. J. Sound Vib. 232 (2000) 101127.
[5] Y. Ando, H. Sakai, S. Sato: Correlation factors describing
primary and spatial sensations of sound elds. J. Sound
Vib. 258 (2002) 405417.
[6] J. Sundberg: The science of the singing voice. Northern
Illinois University Press, Dekalb, 1987.
[7] I. Titze: Principles of voice production.
1994.

Prentice-Hall,

[8] E. Prame: Measurements of the vibrato rate of ten singers.


J. Acoust. Soc. Am. 96 (1994) 19791984.
[9] E. Prame: Vibrato extent and intonation in professional
Western lyric singing. J. Acoust. Soc. Am. 102 (1997) 616
621.
Figure A3. Examples of r-ACF waveforms and the extracted values of e .

This denition of the initial part of the r-ACF for calculating e was ecient as far as the operatic singing voice
of vowels was concerned.
It is possible to consider the peak at = 0 when we t
the envelope decay plot with a straight line. However, the
regression coecient of the tting model obtained by the
least mean square method by excluding the peak at = 0
has been better than that of the tting model that includes
the peak at = 0, particularly when the fundamental frequency is low (e.g., the bass singer). The lower the pitch,
the weaker is the rst major peak corresponding to the fundamental frequency. Therefore, we chose to exclude the
peak at = 0.
References
[1] T. Hidaka, L. Beranek: Objective and subjective evaluations
of twenty-three opera houses in Europe, Japan, and Americas. J. Acoust. Soc. Am. 107 (2000) 368383.

434

[10] T. Taguti, Y. Ando: Characteristics of the short-term autocorrelation function of sound signals in piano performances. In: Music and concert hall acoustics. Y. Ando,
D. Noson (eds.). Academic Press, London, 1997, Chapter
23.
[11] K. Kato, Y. Ando: A study of the blending of vocal music
with the sound eld by dierent singing styles. J. Sound
Vib. 258 (2002) 463472.
[12] I. Nakayama: CDs singings in Japanese using a common
verse. Singing voices data base 17-18, 2002.
[13] K. Mouri, K. Akiyama, Y. Ando: Preliminary study on recommended time duration of source signals to be analyzed,
in relation to its eective duration of the auto-correlation
function. J. Sound Vib. 241 (2001) 8795.
[14] K. Kato, T. Hirawa, K. Kawai, Y. Yano, Y. Ando: Investigation of the relation between (e )min and operatic singing
with dierent vibrato styles. J. Temporal Des. Arch. Environ. 6 (2006) 3548. http://www.jtdweb.org/.
[15] N. Henrich: Mirroring the voice from Garcia to the present
day: Some insights into singing voice registers. Logopedics
Phoniatrics Vocology 31 (2006) 314.
[16] C. Seashore: Psychology of music. McGraw-Hill, New
York, 1938.

You might also like