Joint Sampling Theory and Subjective Investigation of Plane-Wave and Spherical Harmonics Formulations For Binaural Reproduction

Applied Acoustics 134 (2018) 138–144
Contents lists available at ScienceDirect
Applied Acoustics
journal homepage: www.elsevier.com/locate/apacoust
Technical note
Joint sampling theory and subjective investigation of plane-wave and T

spherical harmonics formulations for binaural reproduction
⁎
Zamir Ben-Hur , Jonathan Sheaffer, Boaz Rafaely
Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
A R T I C L E I N F O A B S T R A C T
Keywords: With the recent proliferation of spherical microphone arrays for sound field recording, methods have been
Microphone arrays developed for rendering binaural signals from these recordings and free-field head related transfer functions
Spherical harmonics (HRTFs). Employing spherical arrays naturally leads to methods that are formulated in the spherical harmonics
Binaural signals (SH) domain, using order-limited SH representations. However, the incorporation of HRTFs and enclosed sound
Binaural reproduction
fields typically leads to methods that are formulated in the space domain using plane-wave (PW) representation.
Although these two representations are widely used, the current literature does not offer a complete theoretical
framework to derive sampled PW representation from the SH representation in the context of binaural re-
production and sound perception. This paper develops a mathematical framework showing that when specific
conditions for the joint sampling of the sound field and the HRTFs are maintained, sampled PW representation
can be derived from the SH representation without error, and the resulting binaural signals are independent of
the employed spatial sampling scheme. Furthermore, analysis of the aliasing error shows that the sound field is
more sensitive to aliasing than the HRTFs. The theoretical analysis is complemented by a listening experiment, in
which both PW and SH representations are perceptually evaluated for different spatial sampling schemes, SH
orders, and levels of aliasing when deviating from the joint sampling conditions.
1. Introduction from the limited measurements, using sparse recovery and compressed
sensing [11,12], for example. These approaches, although appropriate
Rendering binaural signals from microphone array recordings and for sound fields composed of a small number of plane waves, cannot
from computer simulations plays an important role in the auralization guarantee accurate binaural reproduction in the general case. There-
of architectural acoustics [1,2], in hearing research [3], and in virtual fore, most methods for binaural reproduction from spherical micro-
reality [4]. Traditionally, binaural signals are obtained using microphone arrays use the measured sound field in a more direct manner.
phones placed at the left and right ear canals of a manikin, in which These methods can be broadly divided into two categories, one based
case the transfer functions of the room and the head are jointly cap- on plane-wave (PW) representation and one on SH representation. In
tured. Alternatively, if an array of microphones is used for the re- addition, this categorization is also relevant for methods of binaural
cording, it is then possible to synthesize binaural signals in post pro- reproduction of simulated sound fields, although typically allowing for
cessing [5,6]. higher spatial resolution compared to measured fields [13,14].
Spherical microphone arrays have recently gained popularity for In the SH approach, binaural signals are generated by using a
capturing and analyzing sound fields. One of the advantages of using summation of the product of the SH coefficients of the sound field [8]
spherical arrays is the ease of the representation of signals in the with SH coefficients of the free-field HRTFs [15,16]. With this ap-
spherical harmonics (SH) domain, leading to the well known proach, binaural reproduction can be readily integrated with other SH
Ambisonics format in spatial audio [7], and to a full three-dimensional computations, such as head rotations [17] and beamforming [18].
representation of the sound field [8]. However, for spherical arrays While this representation is natural for spherical arrays [19], accurate
with a limited number of microphones, the information on the captured computation of the SH coefficients of HRTFs requires suitable sampling
sound field in the SH domain is order limited [9], leading to sound grids, leading to errors if the grid does not cover the full sphere of
reproduction with limited spatial resolution [10]. One approach to directions [20].
overcome this limitation is to estimate the true, high-order sound field With the aim of relaxing the constraints on the HRTF sampling grid,
⁎
Corresponding author.
E-mail address: zami@post.bgu.ac.il (Z. Ben-Hur).
https://doi.org/10.1016/j.apacoust.2018.01.016
Received 14 October 2017; Received in revised form 13 December 2017; Accepted 17 January 2018
0003-682X/ © 2018 Elsevier Ltd. All rights reserved.
Z. Ben-Hur et al. Applied Acoustics 134 (2018) 138–144
∞ n
researchers have developed methods that use PW representation of both ∼∗ l
the sound field and the HRTFs. While this representation is natural for
pl (k ) = ∑ ∑ anm (k ) hnm (k ),
n = 0 m =−n (2)
HRTFs, as it can employ directly the measured transfer functions, var-
ious novel methods have been developed for the more challenging task where ∼ anm (k ) is the spherical Fourier transform (SFT) of a∗ (k ,Ω),(·)∗
l
of producing PW representation for the sound field. Duraiswami et al. denotes the complex conjugate, and hnm (k ) is the SFT of hl (k ,Ω) . For
[6] employed direct sampling of the order-limited PW expansion of the spatially band-limited functions, with SH orders corresponding to Nh
captured sound field at the measured HRTF directions, which is and Na for the HRTF and PW density function, respectively, the infinite
equivalent to using maximum-directivity beamforming to estimate the summation in Eq. (2) will be truncated to an order that matches the
PW amplitudes [19]. Other beamforming methods were also in- lower order function, N = min(Nh,Na) ,
vestigated within the same context [21–23], including a comparison N n
pl (k ) = ∑ ∑ ∼∗
anm l
(k ) hnm (k ).
between modal and delay-and-sum beamforming [24,25]. While the
PW approach is attractive, the relation between the various approaches n = 0 m =−n (3)
for PW representation of the sound field, the role of the sampling grid in Both functions are indeed expected to be order-limited in practice.
this representation, the effect of the SH orders of the sound field and anm (k ) , derived from spherical microphone array recordings, can be
HRTF functions, and the relation between the PW and the SH re- considered order-limited with a maximum order that depends on the
presentations, has not been clearly established in the current literature. operating frequency and array radius [19]. For example, by capturing a
For example, while Bernschütz et al. [26] noted in a recent study that sound field using the eigenmike [29] with a radius of 4.2 cm, the
the spatial sampling scheme of the sound field affected spatial per- maximum order at the speech frequency range is about Na = 4 . Fur-
ception, no theoretical framework that can explain this observation was thermore, as demonstrated by Zhang el al. [30], the HRTF can also be
provided. considered to be order limited in practice, where the maximum order is
This paper presents the derivation of sampling conditions for the also frequency dependent [16,30]. For example, Rafaely and Avni [16]
joint sampling of sound fields and HRTFs for binaural reproduction. It showed that in order to achieve less than −10 dB error in the re-
extends the existing theory for sampling individual functions on the presentation of the HRTF up to 8 kHz, the order should be Nh = 12 .
sphere [19]. The joint sampling conditions involve the upper and lower The sampling conditions for computing the order-limited functions
bounds of the SH orders of the sound field and the HRTFs, and the from samples are well understood, and have been previously presented
attributes of the joint sampling grid. Furthermore, the sampling grids [19] and applied to compute both the PW density function from mi-
maintaining these conditions, which are not unique, guarantee crophone signals [7] and the SH coefficients of the HRTFs from mea-
equivalence between the PW and the SH representations for binaural sured transfer functions [15]. These are briefly presented in the next
reproduction. section. However, the joint sampling of these two functions, leading to
The contributions of this paper are as follows: discretization of Eq. (1), and its relation to Eq. (3), have not been stu-
died in great detail and are now examined and discussed in Section 4.
1. A mathematical formulation of exact conditions for the joint sam-
pling of the sound field and the HRTFs to obtain equivalence of the 3. Sampling of a single function on the sphere
binaural signals when reproduced using PW and SH representations
(Section 4). Consider a spatially band-limited function, f (Ω) , of order Nf , which
2. Objective validation of the joint sampling conditions, providing in- can be represented in the SH domain by [31]:
sight into the effect of the SH order of the PW and HRTF re- Nf n
presentations on the binaural reproduction error (Section 5). fnm Ynm (Ω),
f (Ω) = ∑ ∑
3. Validation of the joint sampling conditions and objective evalua- n = 0 m =−n (4)
tions by a listening experiment (Section 6).
where fnm is the SFT of f (Ω) and Ynm (·)
are the complex SH functions of
order n and degree m. The function is sampled over the sphere at QS
In addition, Section 2 reviews the basic theory of binaural re-
sample points denoted {Ωq}Qq =S1. This sampling scheme, denoted by S , is
production from spherical microphone array recordings, Section 3
designed to be aliasing-free up to order NS . The SH coefficients of
presents the theory for sampling an individual function on the sphere
f (Ω), fnm , can be estimated from its samples using the discrete SFT [9]:
and Section 7 outlines the conclusions of the research.
QS
fnm̂ = ∑ αq f (Ωq)[Ynm (Ωq)]∗ ,
q=1 (5)
2. Binaural reproduction from spherical microphone array
recordings where {αq}Qq =S1
are the sampling weights. Substituting Eq. (4) and ap-
plying the orthogonality property of the SH functions leads to [32]
This section provides a brief overview of current formulations for Nf n′ QS
binaural reproduction using both SH and PW representations. Consider fnm̂ = ∑ ∑ fn′m′ ∑ αq [Ynm (Ωq)]∗ Ynm′ ′ (Ωq)
a sound field composed of a continuum of PWs, such that it could be n′= 0 m′=−n′ q=1
described by a PW density function, a (k ,Ω) . The pressure observed at Nf n′
the left ear of a listener can be represented by [6,27]: = fnm + ∑ ∑ fn′m′ ∊ (n,m,n′,m′),
n′= NS + 1 m′=−n′ (6)
pl (k ) = ∫Ω∈S2
a (k ,Ω) hl (k ,Ω)dΩ, (1) where,
QS
where k = 2πf / c is the wave number, f is the frequency, c is the speed of δn − n′ δm − m′ n,n′ ⩽ NS
∑ αq [Ynm (Ωq)]∗ Ynm′ ′ (Ωq) = ⎧ .
sound, Ω ≡ (θ,ϕ) ∈ 2 is the spatial angle, and h (k ,Ω) is the HRTF [28]. q=1 ⎩∊ (n,m,n′,m′) n′ > NS
⎨ (7)
The superscript l denotes the left ear (the computation for the right ear
can be done in a similar fashion), p (k ) is the pressure at the ear and It is clear that ∊ = 0 leads to fnm̂ = fnm . Therefore, ∊ represents the
2π π aliasing error. To demonstrate the behavior of the aliasing error, the
∫Ω∈ S2 (·)dΩ ≡ ∫0 ∫0 (·)sin(θ)dθdϕ .
Alternatively, by applying Parseval’s relation to Eq. (1), the pressure elements ∊ (n,m,n′,m′) are plotted in Fig. 1. The figure shows the error as
at the ear can be represented in the SH domain [16] by: a function of n and n′ for different sampling schemes of order NS = 3
and with Nf = 9 , in the range 0 ⩽ n ⩽ NS and 0 ⩽ n′ ⩽ Nf . As reported
139
(a) (b) (c) (dB)

3 0
-20
n
1 -40
0 3 6 90 3 6 90 3 6 9
n'
Fig. 1. Elements of the aliasing error, ∊ (n,m,n′,m′) , with NS = 3, Nf = 9 , and different sampling schemes: (a) equal-angle (64 samples), (b) Gaussian (32 samples) and (c) Lebedev (26
samples). The y-axis represents the values of (n,m) with a running index n2 + n + m , where sections of equal order n are partitioned by a horizontal line. The values of (n′,m′) are
represented similarly in the x-axis. The shadowed region represents the aliasing-free range, i.e. n,n′ ⩽ NS .
NS n
by Rafaely et al. [32], it can be seen that there is no error for n,n′ ⩽ NS , ∼∗ l
p l̂ (k ) = ∑ ∑ anm (k ) hnm (k ) (n,n′ ⩽ NS )
and that for n′ > NS there is some error, whose pattern is sampling- n = 0 m =−n
scheme dependent. To ensure zero aliasing error the sampling scheme Na n Nh n′
∼∗
(k ) hnl ′m′ (k ) ∊j (n,m,n′,m′)
should be designed to be aliasing-free up to at least the order of the + ∑ ∑ ∑ ∑ anm (n,n′ > NS )
n = NS + 1 m =−n n′= NS + 1 m′=−n′
sampled function, i.e. NS ⩾ Nf . Na n NS n′
∼∗
(k ) hnl ′m′ (k ) ∊j (n,m,n′,m′)
+ ∑ ∑ ∑ ∑ anm (n > NS ,n′ ⩽ NS )
n = NS + 1 m =−n n′= 0 m′=−n′
NS n Nh n′
4. Joint sampling conditions for binaural reproduction + ∑ ∑ ∑ ∑ ∼∗
anm (k ) hnl ′m′ (k ) ∊j (n,m,n′,m′) (n′ > NS ,n ⩽ NS ),
n = 0 m =−n n′= NS + 1 m′=−n′
When sampled PW representations of the sound field and the HRTFs (10)
are available, then the integral in Eq. (1) can be reformulated using
discrete summations, and the binaural signal can be written as: where the orthogonality of the SH over the samples is reformulated for
joint sampling as,
Q
QS
p l̂ (k ) = ∑ αq a (k ,Ωq) hl (k ,Ωq), δn − n′ δm − m′ n,n′ ⩽ NS
∑ αq [Ynm (Ωq)]∗ Ynm′ ′ (Ωq) = ⎧ ,
q=1 (8) ⎨ ∊j (n,m,n′,m′) n > NS or n′ > NS
q=1 ⎩ (11)
where Q is the total number of samples or PWs. The notation is p l̂ (k ) and ∊j (n,m,n′,m′) is the joint aliasing error. This extends the orthogon-
used to emphasize the fact that Eqs. (1) and (8) are not necessarily ality property as in Eq. (7) to joint sampling by adding another region of
equal. The conditions for the equality between Eqs. (8) and (1) will be interest, n > NS . This region is added because for joint sampling both
formulated in this section. functions may have orders higher than the sampling scheme order. To
Substituting the finite-order SH representation of a (k ,Ωq) and demonstrate the difference between the sampling of a single function
h (k ,Ωq) into the summation in Eq. (8) leads to and joint sampling, the aliasing error, i.e. the elements of ∊j (n,m,n′,m′)
for this case, are presented in Fig. 2. The figure shows ∊j for different
Q
sampling schemes, S , which are aliasing-free up to order NS = 3, with
p l̂ (k ) = ∑ αq a (k ,Ωq) hl (k ,Ωq) Na = 7 and Nh = 9 . The figure illustrates the manner in which high
q=1
orders are aliased into lower orders in the four regions represented by
Na n Nh n′
∼∗ the four terms in Eq. (10): (i) for n,n′ ⩽ NS it can be seen that all values
= ∑ ∑ ∑ ∑ anm (k ) hnl ′m′ (k )
n = 0 m =−n n′= 0 m′=−n′
are zero, which leads to zero aliasing error; (ii) for n,n′ > NS there are
Q non-zero errors and both functions are aliased at the high orders; (iii)
× ∑ αq [Ynm (Ωq)]∗ Ynm′ ′ (Ωq). for n > NS and n′ ⩽ NS , there are non-zero errors derived from high
q=1 (9) orders in function ∼ ∗
anm being aliased into low orders in function hn′m′;
and (iv) is the opposite case, high orders in hn′m′ are aliased into low
It is now assumed that the samples in Eq. (9) represent a sampling orders in ∼ ∗
anm , in the range n ⩽ NS ,n′ > NS .
scheme, S , which is aliasing-free to order NS , as introduced in Section Now, if both ∼ anm and hnm are zero for orders beyond NS , then all
3, leading to three summations in Eq. (10) that contain error terms ∊j will be zero.
This leads to an equality between Eqs. (8) and (3), and the equivalence
(a) (b) (c) (dB)

0
6
-20
n
3 -40
0
0 3 6 90 3 6 90 3 6 9
n'
Fig. 2. Elements of the joint aliasing error, ∊j (n,m,n′,m′) , with NS = 3, Na = 7, Nh = 9 , and different sampling schemes: (a) equal-angle (64 samples), (b) Gaussian (32 samples) and (c)
Lebedev (26 samples). The y-axis represents the values of (n,m) with a running index n2 + n + m , where sections of equal order n are partitioned by a horizontal line. The values of (n′,m′)
are represented similarly in the x-axis. The shadowed region represents the aliasing-free range, i.e. n,n′ ⩽ NS .
140
Fig. 3. Left ear pressure for a single PW incident at (90°,45°) for Nh = 6 and different orders of Na ; the calculation in the space domain was performed using three sampling schemes
designed to be aliasing-free for order 6. Na and Nh are presented for each figure.
of the discrete PW and SH representations is obtained: studied in this section. The PW density function, a (Ω), and the coeffi-
QS N n cients anm are calculated analytically using the SH representation of a
p l̂ (k ) = ∑ αq a (k ,Ωq) hl (k ,Ωq) = ∑ ∑ ∼∗
anm l
(k ) hnm (k ) = pl (k ). single PW [19], up to SH orders Na that vary from 6 to 14. These re-
q=1 n = 0 m =−n latively high orders could represent PW density functions generated by
(12) spherical arrays in simulated sound fields, for example [14,35], or
sound fields measured by scanning microphone arrays [36,37]. They
From here, the equality between Eqs. (1) and (8) follows directly. In
provide a relatively high quality of the binaurally reproduced signal
summary, the following are sufficient conditions for (12) to hold:
that will be used for the hearing test in the following section. Note that
⎧ QS
⎫ spherical arrays recording sound in real time may typically provide
S = {Ωq}Qq =S1 ∑ αq [Ynm (Ωq)]∗ Ynm′ ′ (Ωq) = δn − n′ δm − m′ for n,n′ ⩽ NS lower orders. The left-ear pressure was computed in the space and SH
⎨ ⎬
⎩ q=1
⎭ domains, using Eqs. (8) and (3), respectively, with HRTFs from the
∼
a = 0 ∀ n > Na Cologne HRTF database [38], which were truncated in the SH domain
Nh, Na ⩽ NS , where ⎧ nm to the desired order Nh = 6 . The computation in the space domain was
⎨
⎩ hnm = 0 ∀ n > Nh
performed using a sampling scheme that was designed to be aliasing-
N ≡ min(Na,Nh).
free for order NS = 6. Therefore, when Na > 6 the joint sampling con-
(13) ditions were not met and the sampling process may suffer from aliasing.
∼
These conditions mean that when anm and hnm are order limited to Na Three sampling schemes were investigated: equal-angle, Gaussian and
and Nh , respectively, the chosen sampling scheme should be designed to Lebedev. Fig. 3(a) shows a comparison of the three computations of the
be aliasing-free for orders max(Na,Nh) and below. pressure at the ear in the space domain and in the SH domain, when the
Note that these conditions do not place any constraint on the spe- joint sampling conditions are met. The figure illustrates that no visual
cific distribution of the sampling points. Therefore, any sampling differences are evident between the plots in this case. Fig. 3(b), (c) and
scheme that satisfies (13) will maintain the equality in (12), and the (d) compare the pressure at the ear when the joint sampling conditions
resulting pressure at the ear will be the same. In other words, when (13) are not met, i.e. when Na ranges from 9 to 14. The figures illustrate that
is satisfied, the superposition of PWs from entirely different directions the differences can be in the order of ± 10 dB for these examples, and
will result in the same pressure at the ear. This provides freedom to that they become more significant as Na increases, and the violation of
select different spatial distributions of PWs (sampling schemes), while the joint sampling conditions becomes more prominent.
maintaining a mathematically correct result (ear pressure). These differences are an example of the spatial aliasing effect on the
binaural signal, when reproduced in the space domain using a sampling
5. Aliasing error when deviating from the joint sampling scheme that does not meet the joint sampling conditions. In particular,
conditions Fig. 3(d) shows that for a significant violation of the conditions, the
aliasing error is evident down to a very low frequency. At low fre-
In some practical cases of employing a sampling scheme for the quencies the pressure at the ear is dominated by the low SH orders [26].
computation of Eq. (8), the joint sampling conditions might be violated. Fig. 3(d) can therefore be explained by variations in the low orders,
One example is when an HRTF database is already available with a and, in particular, in order zero. In the analysis of aliasing presented in
sampling scheme S , but the sound field is composed of high orders that [32], Rafaely et al. showed that for equal-angle and Gaussian sampling
do not satisfy the condition in (13) with respect to S , e.g. a sound field schemes that are aliasing-free to order NS , the aliasing contribution to
derived from room simulations [13,33], or from measurements with a order zero originates from orders of Ncutoff = 2NS + 2 and higher. The
spherical array with a large number of microphones [34]. Another same behavior can be observed in Fig. 2 for the joint sampling case with
example is when a low order PW density function is measured with a the three presented schemes. Now, applying this analysis to Fig. 3, the
microphone array with a limited number of microphones. The sound sampling schemes are designed to be aliasing-free up to order NS = 6,
field is then represented by PWs with a distribution S . In contrast, a and so for this case Ncutoff = 14 . This explains why only in Fig. 3(d), for
high order HRTF is employed, which does not satisfy (13) with the which Na = 14 , significant variations are notable at low frequencies due
given S [26]. In these cases, when the joint sampling conditions are to spatial aliasing.
not satisfied, the non-zero error terms in Eq. (10) will contribute to the
construction of the pressure at the ear. This aliasing error may lead to
errors in localization and externalization of sound. In this section, the 5.2. Error for the case Nh > NS
effect of the aliasing error on the reproduced binaural signals is in-
vestigated for different sampling schemes. This subsection investigates the case in which Nh violates the joint
sampling condition. Fig. 4(a)–(d) show plots similar to Fig. 3, but for
5.1. Error for the case Na > NS the case that Na is fixed to be 6, and Nh varies from 6 to 14. Fig. 4
illustrates that variations in the pressure at the ear due to violations of
As an example, the simple case of a spherical array that samples a the joint sampling conditions are smaller compared to Fig. 3. In parti-
sound field composed of a single PW incident at Ω0 ≡ (90°,45°) is cular, Fig. 4(d) shows a very small error at the low frequencies. These
141
Fig. 4. Left ear pressure for a single PW incident at (90°,45°) for Na = 6 and different orders of Nh ; the calculation in the space domain was performed using three sampling schemes
designed to be aliasing-free for order 6. Na and Nh are presented for each figure.
differences between Figs. 3 and 4 can be explained by the nature of the signal was played-back to the subjects using the SoundScape Renderer
energy distribution of the HRTFs, where low frequencies do not have auralization engine [39], implementing segmented convolution in real-
much energy at high orders [30]. Now, although the aliasing error, or time with a latency of 5.3 ms. All signals were convolved with a
the term ∊j (n,m,n′,m′) in Eq. (11), is the same for both figures, in Fig. 4 it matching headphone compensation filter, which was measured for the
is multiplied by high orders of lower magnitudes, and so the overall AKG-K702 headphones on the KU100 dummy head by Bernschütz [38].
aliasing error is smaller. This explains why in Fig. 4(d), surprisingly, Five different conditions were tested, for different combinations of
orders that are equal to Ncutoff , i.e. order 14, which is aliased into the the HRTF and PW density function orders, Nh and Na . The different
zeroth order, do not cause significant aliasing error at low frequencies. conditions were chosen in order to test the different cases that were
The analysis presented in this section shows that the aliasing error may discussed previously in the paper, as detailed in Table 1. In the first
depend on the functions’ orders and on frequency, in addition to the condition (test #1) the joint sampling conditions are satisfied, in tests 2
dependence on the sampling scheme. and 3 the conditions are not maintained and the highest function order
is at order 14, which equals Ncutoff . In tests 4 and 5 the highest function
6. Listening experiment order is 9, which is lower than Ncutoff . A total of 16 signals were gen-
erated; one signal was reproduced in the SH domain with N = 6 (using
So far, aliasing errors were only studied theoretically, effectively Eq. (3)), while 15 were reproduced in the space domain (Eq. (8)), using
showing that when the joint sampling conditions are followed, the three different sampling schemes (Lebedev, Gaussian and equal-angle)
pressure at the ear is scheme independent and, therefore, no perceptual that are aliasing-free up to order NS = 6, and 5 different order com-
differences are expected between different sampling schemes. However, binations, as detailed in Table 1.
in order to investigate if and when objective differences are perceivable The experiment comprised 17 different pair comparing tests, as
when the joint sampling conditions are not met, a listening test is re- detailed in the second row of Table 2; for example, GSH1 denotes a
quired, which is presented in this section. comparison between reproduction in the space domain using a Gaus-
sian sampling scheme and reproduction in the SH domain with function
orders (Na,Nh) = (6,6) when the joint sampling conditions are satisfied.
6.1. Methodology LE3 denotes a comparison between Lebedev and equal-angle sampling
schemes with function orders (Na,Nh) = (6,14) when the joint sampling
Binaural signals were reproduced using simulated sound fields in a conditions are not satisfied. A triangle test [40] was used to perform the
room, generating room impulse responses (RIRs) that are convolved listening experiment. This method is effective for determining whether
with HRTFs to produce binaural room impulse responses (BRIRs). The (i) a perceptible difference exists between two signals (triangle testing
simulated room was of dimensions (Lx ,Ly,Lz ) = (17.33,11.52,6.20) m for difference), or (ii) a perceptible difference does not exist between
with a frequency-averaged reverberation time T60 = 0.759 s and a cri- two signals (triangle testing for similarity).
tical distance of rd = 2.28 m at 1 kHz. A spherical microphone array, 30 normal hearing subjects aged 20–30 years, 21 male and 9 female,
configured for order Na (that depends on the experimental conditions) participated in the experiment. Each subject performed a total of 68
was placed at the center of the room at the average ear level of a sitting trials of the test, where each paired comparison test was repeated 4
listener (1.27 m above the floor). An omni-directional source was times in a random order. In each trial, the listeners were presented with
π π
( )
placed at a distance of r = 5 m and an angle of Ω0 = 2 , 4 relative to three stimuli, corresponding to the testing conditions shown in Table 2,
the array center. The HRTFs’ coordinate system was also positioned at of which two were identical. Using a GUI, listeners were asked to select
π
the array center, with ( 2 ,0) being the front-looking direction. The RIRs, the stimulus which sounded different.
defined between the source and the array spatial sampling points, or
microphones, were produced using the multi-channel room acoustics 6.2. Results
simulator (MCRoomSim) [35].
The HRTFs were taken from the Cologne HRTF database [38], with The chi-squared test [41] was used to analyze the results. Table 2
a Lebedev sampling configuration of 2702 sampling points, which can
produce HRTFs up to a SH order of 44. Depending on the experimental Table 1
conditions, the HRTFs were truncated in the SH domain to the desired A list of the test conditions. Na and Nh are the SH orders of the PW density function and
order Nh . A pair of AKG K702 headphones were used, fitted with a the HRTF, respectively, and the last column denotes if the joint sampling conditions in Eq.
Razor IMU sensor for horizontal head-tracking (which is known to (13) are satisfied or not, i.e. “✓” means satisfied, and “× ” means not satisfied.
enhance spatial perception). The BRIRs were generated for each head Test # Na Nh Sampling condition
rotation angle (up to 360° with a 1° resolution) by rotating the HRTFs,
thus keeping the reproduced source direction constant and independent 1 6 6 ✓
of the head movements. The HRTFs’ rotation was performed by mul- 2 14 6 ×
l/r 3 6 14 ×
tiplying hnm (k ) by the respective Wigner-D functions [17].
4 9 6 ×
An anechoic recording of drums, which has a broadband frequency 5 6 9 ×
distribution, was convolved with the appropriate BRIRs. The resultant
142
Table 2
The listening test results. L/G/E stands for the Lebedev/Gaussian/Equal-angle sampling scheme used for the reproduction in the space domain (in all cases the schemes are aliasing-free to
order NS = 6 ), SH stands for reproduction in the SH domain, χ 2 is the chi-squared value [41], P-value is the probability that a value of the χ 2 distribution was obtained by chance. The
last row shows if P-value<α , for α = 0.01.
Test # 1 2 3 4 5
Test ID LE1 GE1 LG1 GSH1 LE2 GE2 LG2 GSH2 LE3 GE3 LG3 LE4 GE4 LG4 LE5 GE5 LG5
χ2 1.35 <0.01 0.04 1.35 240 222.34 234.04 139.54 240 12.15 121.84 36.04 0.34 8.44 31.54 0.15 8.44
P-value 0.24 >0.99 0.85 0.24 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 0.56 <0.01 <0.01 0.69 <0.01
P-value<α × × × × ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ × ✓ ✓ × ✓
1
>0.99 0.99 >0.99
0.97 Lebedev - Equal-angle
Fraction of correct answers
Gaussian - Equal-angle
0.8 0.84
0.81 Lebedev - Gaussian
Space(Gaussian) - SH
0.6
0.59 0.57
0.48 0.46
0.4 0.46
0.38 0.38
0.36 0.35
0.33 0.33
0.2
0
1 2 3 4 5
Test ID
Fig. 5. Listening test results - fraction of correct answers for each tested pair.
shows the statistical results for each paired comparison test. The last 7. Conclusion
row of the table shows, for each tested pair, if the P-value is smaller
than α , where α is chosen at a confidence level of 1%. Fig. 5 shows the This paper studies the relation between the PW and the SH re-
fraction of correct responses for each paired comparison test for the five presentations of binaural signals reproduced from spherical micro-
test conditions. phone array recordings. Sampling conditions are developed for the
In test condition 1, where the joint sampling conditions are satisfied, mathematical identity between these two representations, leading to
it can be seen in Table 2 that for all four compared pairs the P-value is conditions on the joint sampling scheme of the PW density function and
above the significance level, meaning that the hypothesis of differences the HRTFs. Within this framework, two different sampling schemes
between the signals could not be proven. Examining the results of test 1 could yield signals that sound the same if they are both samples of
by observing the correct answers (Fig. 5), which are close to 33%, and order-limited functions with the maximum order defined by the joint
the P-values (Table 2), which are much higher than the significance sampling conditions. On the other hand, not adhering to the joint
level, it can be concluded that no perceptual differences are observed, sampling conditions will result in reproduction error. The perceptual
as expected from the equivalence in Eq. (12). discrimination due to this error depends on the extent to which the joint
In contrast, for tests 2 and 3, where the joint sampling conditions sampling condition is violated. The perceptual differences also depend
are not satisfied and the spatial aliasing error is significant (because the on the frequency content of the aliased function. For the case of the
maximum order equals Ncutoff in both cases), the P-values are generally HRTF that has low energy at the high orders and low frequencies, the
lower than 0.01, meaning that there are perceptual differences between aliasing effect was shown to be relatively small at these orders and
the sounds. However, in Fig. 5 it can be seen that the fraction of correct frequencies.
answers for test number 3 is smaller than for test number 2 (average of
0.71 and 0.98 for test conditions 3 and 2, respectively); this suggests that Acknowledgments
the perceptual differences are less significant when the aliased function
is the HRTF and not the PW density function. This could be explained The research leading to this paper has received funding from the
by the effect on the spatial aliasing of the low magnitude of the HRTF at Helmsley Charitable Trust through the Agricultural, Biological and
high orders, as discussed in Section 5. Cognitive (ABC) Robotics Center of Ben-Gurion University of the Negev.
As can be seen in Table 2, the results of tests number 4 and 5 are This work was also supported by the Israel Science Foundation (ISF)
ambiguous and depend on the sampling schemes. It can be seen in Fig. 5 under Grant 146/13.
that even when the P-value is smaller than α , the fraction of correct The research was also partially supported by the European Union’s
answers is much smaller compared to test number 2 (average of 0.47 Seventh Framework Programme (FP7/2007–2013) under grant agree-
and 0.45 for test conditions 4 and 5, respectively, compared to 0.98 for ment No. 609465 as part of the Embodied Audition for RobotS (EARS)
test condition 2), which shows that the perceptual differences between project.
the signals are much less significant and may even not be audible. This
is because in both tests the aliased function is of a maximum order that References
is smaller than Ncutoff ; therefore, the aliasing error into the zeroth order
is less significant and the distortion of the output signal at low fre- [1] Lehnert H, Blauert J. Principles of binaural room simulation. Appl Acoust
quencies is small, as seen in Figs. 3(b) and 4(b). We note that in all GE 1992;36(3):259–91.
[2] Vorländer M. Auralization: fundamentals of acoustics, modelling, simulation, al-
(Gaussian-Equiangle) tests the correct answers are fewer, which implies gorithms and acoustic virtual reality. Springer Science & Business Media; 2007.
that there is similarity between these two schemes and their aliasing [3] Blauert J. The technology of binaural listening. Springer; 2013.
patterns. [4] Begault DR, Trejo LJ. 3-D sound for virtual reality and multimedia. NASA Ames
143
Research Center, Moffett Field, CA United States; 2000. Am 2011;130(4):2063–75.

[5] Brandstein M, Ward D. Microphone arrays: signal processing techniques and ap- [24] Spors S, Wierstorf H. Evaluation of perceptual properties of phase-mode beam-
plications. Springer Science & Business Media; 2013. forming in the context of data-based binaural synthesis. 2012 5th International
[6] Duraiswami R, Zotkin DN, Li Z, Grassi E, Gumerov NA, Davis LS. High order spatial Symposium on Communications Control and Signal Processing (ISCCSP). IEEE;
audio capture and binaural head-tracked playback over headphones with HRTF 2012. p. 1–4.
cues. In: 119th Convention of Audio Engineering Society, Preprint 6540, Audio [25] Spors S, Wierstorf H, Geier M. Comparison of modal versus delay-and-sum beam-
Engineering Society; 2005. forming in the context of data-based binaural synthesis. In: Audio engineering so-
[7] Pomberger H, Zotter F, Sontacchi A. An ambisonics format for flexible playback ciety convention 132. Audio Engineering Society; 2012.
layouts. In: Proc 1st ambisonics symposium; 2009. p. 8. [26] Bernschütz B, Giner AV, Pörschmann C, Arend J. Binaural reproduction of plane
[8] Rafaely B. Plane-wave decomposition of the sound field on a sphere by spherical waves with reduced modal order. Acta Acust United Acust 2014;100(5):972–83.
convolution. J Acoust Soc Am 2004;116(4):2149–57. [27] Noisternig M, Sontacchi A, Musil T, Hóldrich R. A 3D ambisonic based binaural
[9] Rafaely B. Analysis and design of spherical microphone arrays. IEEE Trans Speech sound reproduction system. In: Audio engineering society conference: 24th inter-
Audio Process 2005;13(1):135–43. national conference: multichannel audio, the new reality. Audio Engineering
[10] Avni A, Ahrens J, Geier M, Spors S, Wierstorf H, Rafaely B. Spatial perception of Society; 2003.
sound fields recorded by spherical microphone arrays with varying spatial resolu- [28] Blauert J. Spatial hearing: the psychophysics of human sound localization. MIT
tion. J Acoust Soc Am 2013;133(5):2711–21. Press; 1997.
[11] Wu PKT, Epain N, Jin C. A super-resolution beamforming algorithm for spherical [29] mh acoustics. em32 eigenmike microphone array release notes. 25 Summit Ave
microphone arrays using a compressed sensing approach. 2013 IEEE International Summit NJ 07901; February 2009. < http://www.mhacoustics.com/products#
Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2013. p. eigenmike1 > .
649–53. [30] Zhang W, Abhayapala TD, Kennedy RA, Duraiswami R. Insights into head-related
[12] Fernandez-Grande E, Xenaki A. Compressive sensing with a spherical microphone transfer function: spatial dimensionality and continuous representation. J Acoust
array. J Acoust Soc Am 2016;139(2):EL45–9. Soc Am 2010;127(4):2347–57.
[13] Lentz T, Schröder D, Vorländer M, Assenmacher I. Virtual reality system with in- [31] Driscoll JR, Healy DM. Computing fourier transforms and convolutions on the 2-
tegrated sound field simulation and reproduction. EURASIP J Appl Signal Process sphere. Adv Appl Math 1994;15(2):202–50.
2007;2007(1):187. [32] Rafaely B, Weiss B, Bachmat E. Spatial aliasing in spherical microphone arrays.
[14] Sheaffer J, Van Walstijn M, Rafaely B, Kowalczyk K. Binaural reproduction of finite IEEE Trans Signal Process 2007;55(3):1003–10.
difference simulations using spherical array processing. IEEE/ACM Trans Audio [33] Allen JB, Berkley DA. Image method for efficiently simulating small-room acoustics.
Speech Lang Process 2015;23(12):2125–35. J Acoust Soc Am 1979;65(4):943–50.
[15] Evans MJ, Angus JA, Tew AI. Analyzing head-related transfer function measure- [34] Bernschütz B. Microphone arrays and sound field decomposition for dynamic bi-
ments using surface spherical harmonics. J Acoust Soc Am 1998;104(4):2400–11. naural recording. Doctoral Dissertation. Technical University of Berlin; 2016. p.
[16] Rafaely B, Avni A. Interaural cross correlation in a sound field represented by 213–21.
spherical harmonics. J Acoust Soc Am 2010;127(2):823–8. [35] Wabnitz A, Epain N, Jin C, van Schaik A. Room acoustics simulation for multi-
[17] Rafaely B, Kleider M. Spherical microphone array beam steering using wigner-d channel microphone arrays. In: Proceedings of the international symposium on
weighting. Signal Process Lett IEEE 2008;15:417–20. room acoustics; 2010.
[18] Jeffet M, Shabtai NR, Rafaely B. Theory and perceptual evaluation of the binaural [36] Alon DL, Rafaely B. Spindle-torus sampling for an efficient-scanning spherical mi-
reproduction and beamforming tradeoff in the generalized spherical array beam- crophone array. Acta Acust United Acust 2012;98(1):83–90.
former. IEEE/ACM Trans Audio Speech Lang Process 2016;24(4):708–18. [37] Rafaely B, Balmages I, Eger L. High-resolution plane-wave decomposition in an
[19] Rafaely B. Fundamentals of spherical array processing vol. 8. Springer; 2015. auditorium using a dual-radius scanning spherical microphone array. J Acoust Soc
[20] Ahrens J, Thomas MR, Tashev I. Hrtf magnitude modeling using a non-regularized Am 2007;122(5):2661–8.
least-squares fit of spherical harmonics coefficients on incomplete data. 2012 Asia- [38] Bernschütz B. A spherical far field hrir/hrtf compilation of the neumann ku 100. In:
Pacific Signal & Information Processing Association Annual Summit and Conference Proceedings of the 40th Italian (AIA) annual conference on acoustics and the 39th
(APSIPA ASC). IEEE; 2012. p. 1–5. German annual conference on acoustics (DAGA) conference on acoustics; 2013.
[21] Li Z, Duraiswami R. Headphone-based reproduction of 3D auditory scenes captured p. 29.
by spherical/hemispherical microphone arrays. 2006 IEEE international conference [39] Ahrens J, Geier M, Spors S. The soundscape renderer: a unified spatial audio re-
on acoustics, speech and signal processing, 2006. ICASSP 2006 proceedings, vol. 5. production framework for arbitrary rendering methods. In: Audio engineering so-
IEEE; 2006. p. V. ciety convention 124. Audio Engineering Society; 2008.
[22] Song W, Ellermeier W, Hald J-R, et al. Binaural auralization based on spherical- [40] Standard B, ISO B. Sensory analysis-methodology-triangle test. BS ISO
harmonics beamforming. J Acoust Soc Am 2008;123(5):3159. 2004;4120:2004.
[23] Song W, Ellermeier W, Hald J. Psychoacoustic evaluation of multichannel re- [41] Greenwood PE, Nikulin MS. A guide to chi-squared testing vol. 280. John Wiley &
produced sounds using binaural synthesis and spherical beamforming. J Acoust Soc Sons; 1996.
144

Joint Sampling Theory and Subjective Investigation of Plane-Wave and Spherical Harmonics Formulations For Binaural Reproduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Joint Sampling Theory and Subjective Investigation of Plane-Wave and Spherical Harmonics Formulations For Binaural Reproduction

Uploaded by

Copyright:

Available Formats

Applied Acoustics 134 (2018) 138–144

Contents lists available at ScienceDirect

Joint sampling theory and subjective investigation of plane-wave and T

(a) (b) (c) (dB)

(a) (b) (c) (dB)

Research Center, Moﬀett Field, CA United States; 2000. Am 2011;130(4):2063–75.

You might also like