You are on page 1of 19

PAPERS

'

Prospects for Transaural Recording*


DUANE H. COOPER AND JERALD L. BAUCK

University of Illinois, Urbana, IL 6180/, USA

Transaural stereo, generic for binaural stereo processed for cancellation of loudspeakerto-ear crosstalk, results from the use of minimum-phase filters in shuffler configuration.
Simplifying the filters further at short wavelengths makes the listener position noncritical.
Full spatial qualities appear in a conventional stereo playback that avoids early reflections.
Inverse shufflers provide precise transaural pan functions for multitrack work.

0 INTRODUCTION

Transaural stereo (generic term) is a stereo-system


plan that, like binaural stereo, takes the end point of
the recording-reproducing chain to be the actual sounds
at the ears. It contrasts with the taking of loudspeaker
sounds as the end point, which is necessarily the plan
of conventional stereo. It differs from binaural in that
the sounds for each ear, rather than being supplied by
direct signal chains ending at earphones, result indirectly, instead, from the preparation of structured
composite signals to be supplied to the loudspeakers.
1.1 Crosstalk Cancellation

The composite-signal structure is subsequently inverted (decomposition) in the intervening loudspeakerto-ear transmission to produce the intended sounds at
the ears. On the way to the ears, in addition to the
direct transmission, left to left and right to right, there
occur the cross transmissions of left to right and right
to left. The latter are traditionally called crosstalk (from
telephony), and the composition-decomposition
scheme cited is a nonadaptive precancellation of crosstalk. It consists of the "planting" of a crosstalk process,
in advance, that is devised to be the inverse of the
acoustic crosstalk expected to occur subsequently. When
properly done, the net result is the elimination of all
evidence of crosstalk.

lation, by two pickup methods. One uses microphones


fitted in the ears of an artificial head. The other uses
free-space microphones whose signals have been processed to simulate transmissions around an acoustic
obstacle (human head) to specific points on the obstacle
(ears).
_..
The second of these pickup methods, including its
source-to-ear processing, is known as binaural synthesis, and it may include the processing of as many
different microphone signals as may be suitable for a
given project. It may also include reverberant-field
synthesis as needed. The correspondence with multitrack stereo synthesis is notable: pan functions replaced
by binaural simulation for specific imaging directions
and reverberation units replaced by simulation of spatial
(binaural space) reverberation, such as being developed
by Kendall et al. [1]. After the completion of all binaural
processing, crosstalk canceling is the means of producing the master transaural recording.
For concert-hall recording, an artificial head would
be used, and it and the orchestra would be deployed
for optimal pickup. Under ideal conditions this may
suffice. However, further microphone deployments may
be considered to represent early reflection and latereflection hall-sound pickup. The signals from these
latter would be delayed and subjected to binaural synthesis needed to produce the decorrelated ear sounds
deemed suitable for hall-sound representation. The final
step in the production is conversion to transaural.

1.2 Recording Binaural Signals

Signals representing ear sounds may be recorded


(binaural recording), in advance of crosstalk cancel-

* Presented at the 85th Convention of the Audio Engineering


Society, Los Angeles, 1988 November 3-6.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

1.3 Transaural Options

Some recording engineers may wish to use only a


part of the transaural technology. In multitrack work,
for example, it might be decided that only a few of the
tracks require the precise imaging of binaural synthesis,
3

PAPERS

COOPER AND BAUCK

or that only a portion of the performing ensemble requires the spatial delineation available through artificialhead pickup. Such artistic decisions remain, of course,
with the producing authority, and it is the re!iponsibility
of the engineer to provide incisive imaging, to the extent
possible, where desired. Transaural technology may
be viewed as providing improved options for that purpose, not necessarily a whole new recording style.
A better choice for incisive imaging, however, cannot
be made. In a previous paper, Cooper, using calculations
from Bauck's thesis [2], showed [3, Fig. 8] the required
loudspeaker-signal specifications for two examples of
imaging. None of the conventional stereo methods
produces signals that in any way resemble these specifications, except at low frequencies. Conventional
stereo has not sought to devise loudspeaker signals to
meet imaging-signal specifications at the ears, as was
required in these calculations, except in the low-frequency work of Blumlein [4]. Specifically, none of the
existing pan-pot formulas meet these specifications,
nor do any of the stereo microphone arrays, whether
coincident or spaced, whether using directional elements
or not.
Some recording engineers, seeking a spacious effect,
use widely spaced microphones in a concert-hall setting.
It is known, of course, that the signals so obtained are
highly decorrelated, and it is also a known fact, in
concert-hall acoustics, that highly decorrelated ear
sounds are identified with spacious acoustic impressions. Unfortunately, the interaural correlation wiii always be greater than the correlation at the loudspeakers,
because of crosstalk. The net result is that the spacious
effect is perceived as confined to an "acoustic stage,"
as in a different space from that of the listener. An
important aspect of the concert-hall experience is lost.
The use of widely spaced microphones with binaural
synthesis and suitable delay, however, will give the
recording engineer much greater control over the representation of the sound of the hall. Thus many more
venues may be exploited to advantage. At the same
time, a full spatial envelopment of the listener can be
provided to the extent desired. Many recording engineers will discover, also, that imaging and spaciousness
are not mutually exclusive, but, as has long been known
in concert-hall acoustics, belong together. Placing them
together is natural in transaural technology.
At first the recording engineer wiii want to try only
the simplest things from transaural technology. Indeed,
it is likely that only the simpler equipment wiii become
available at first. Existing techniques wiii necessarily
continue to be used, and the improvements oftransaural
technology wiii, in some instances, be adapted to that.
For reviews of existing techniques, the writings of Eargle may be consulted [5]. The evolution of such techniques to suit a binaural style of recording is not amenable to detailed prediction, and will not be attempted here.
Abbozzare
It is possible, however, to sketch a catalog of specific
kinds of transaural-related equipment, the development
previsto
of which may be foreseen. Some of these items are
discussed in a later section.
4

1.4 Binaural Monitoring

Prospects for binaural monitoring apparently have


advanced substantially in the recent decade. It had long
been the experience that earphones characteristically
produced "in-the-head" (interior) sounds and, with binaural material, sounds that were much more vulnerable
to front-back bias than is the case with natural hearing.
The problem has been traced to a disturbance in the
conch resonance of the human pinna. The conch is the
principal cavity in the pinna, and its resonance involves
its acoustic near field even at some distance from the
ear. Disturbance of this near field causes an "at-theear"judgment, as may be easily demonstrated by placing
a hand near the ear. Earphones are ordinarily placed
near enough to disturb this resonance (besides possibly
deforming the pinnas). Equalization to restore the resonance restores natural, exterior hearing. A complication is that a significant part of the resonance effect
varies with direction, so that a direction assignment
for equalization seems necessary.
A way to avoid a directional assignment has been
sought via the use of a diffuse-field reference [6]. An
alternate approach uses a frontal-incidence plane wave
(free field) as the reference. The argument for the latter
is that it avoids a front-back bias while not impairing
back localizations. In a later section we find evidence
to support a variation in this free-field approach.
Th se issues take on a sharper focus if related to the
style of equalization used for the artificial head. Obviously, a free-field equalization for the head mandates
the corresponding free-field equalization for the earphones to be used with that head. In this case, a large
part of the conch resonance is removed in the head
equalization (a use of natural ear molds in artificial
heads accounts for this resonance being modeled, although modeling of canal resonance has long been
omitted) and then restored in the earphone equalization.
Presumably a similar rationale supports diffuse-field
equalization, both for the head [7] and for the earphones.
We are unable to report any complete experience with
diffuse-field equalization, but we can report remarkably
good experiences with free-field equalization.
The recording engineer should understand that the
equalizations discussed here and below are not matters
to be accommodated with the EQ facilities on a mixing
board. It is appropriate to regard these design equalization requirements as to be met internally, to be inherent characteristics of the device or of an accessory
specific to the device. In the same way, the sometimes
strenuous equalizations undertaken in some highly
valued microphones are of concern to the design engineer, not the recording engineer. It is sufficient if
the recording engineer deems the overall characteristic
as apt for his or her needs.
The advent of binaural monitoring will prove to be
a substantial convenience in comparison to loudspeaker
monitoring, especially for location work or other situations in which access to a proper listening room is
inconvenient. Transaural monitoring (with loudspeakers) can, of course, be made available as needed.
J. Audio Eng. Soc., Vol. 37, No. 112, 1989 January/February

PAPERS

1.5 Beginnings of Transaural Recording

Transaural stereo had its first trials in 1962 by B. S.


Atal and M. R. Schroeder [8]-[12]. They used a powerful (for its day) mainframe computer, an IBM 7090,
to perform digital finite-impulse response (FIR) filtering
for crosstalk "planting" combined with equalization,
using functions derived from testing an artificial head.
They also saw binaural synthesis as a related process,
and they were striving for a spatial synthesis of reverberation. Their transaural trials, however, were designed
to reproduce the actual sounds of known concert halls
and were based on binaural recordings made in those
halls. In those days, earphone listening produced only
interior sounds, so that transaural conversion was
mandatory for their purposes. The Atal-Schroeder results were described [12] as "nothing less than amazing."
The listener experienced authentic, exterior, spatial
envelopment as well as authoritative imaging to the
front and sides, in elevation, and even behind.
Unfortunately the reports of this work left lasting
impressions of a heroic technology producing fragile
results: the listening space had to be anechoic, and the
listener could not move by more than about 10 or
3 in (75 mm) without spoiling the effect. Later work
by Damaske [13], with "90 crosstalk filters," a codeword designation, did little to dispel these disheartening
impressions. He found that reverberant listening spaces
degraded the effect, damaging side imaging and causing
front-back ambiguity. Other work over the past quarter
century, including the Q-Biphonic development [14],
has not significantly advanced the technology nor
changed overall impressions of dim prospects for transaural recording.

1.6 Present Prospects


Brightened prospects are suggested by our work,
reported herewith. By casting the crosstalk-canceling
filters in shuftler form, we are able to greatly simplify
the technology: a handful of operational-amplifier chips,
or the equivalent on a digital signal-processing chip,
suffice. This economy (not having to use FIR filters)
is a consequence of our discovery that the shuftler consists entirely of minimum-phase filters. The simplification also reveals a structure that allows secure control
over the design of equalization as independent of the
crosstalk canceling. Thus we are able to simplify the
crosstalk function, more particularly at short wavelengths, to make the effect of cancellation quite tolerant
of listener movement.
Listeners find that a 30 head rotation produces a
benign, albeit noticeable to some, change in auditory
perspective. Imaging at 90 is less tolerant. Comparable effects are noticed for lateral movement over
a range equal to the loudspeaker spacing, but there is
more tolerance for forward-backward motion. We have
no data for transaural systems designed for a wider
loudspeaker spacing, and we are not entirely satisfied
with explanations we offer in Sec. 1.7. Perhaps some
1
credit is owed to good equalization.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

PROSPECTS FOR TRANSAURAL RECORDING

The significance of equalization has become clear


to us through o1r.experiences with recordings made
with a Neumann KU-80 head, which is equalized tq
provide a correct ear-canal entrance signal, arid with
the Aachen head (Aachener Kopf, or AK) devised by
Gierlich and Genuit [15], which is equalized for a flat
free-field response for frontal incidence. Recordings
from the KU-80 are unfit for loudspeaker playback (the
KU-81 should do better), showing a poor stereo effect
with a very large "hole in the middle," while the same
playback from the AK shows a stereo excellence unattainable by any other known stereo array.
With equalizations as given, crosstalk cancellation,
besides revealing the qualities noted by Atal and
Schroeder, actually "covers" the hole in the KU-80
presentation, while for the AK, it corrects image
placements, sharpens the images, extends the range of
image placement, removes front-back ambiguity, extends the perception of depth, and completes the spatial
envelopment. Thus the equalization we would have
used (see explanation in Sec. 1.5), based on free-field
incidence at 30 and being nearly the same as in the
AK, was confirmed in its correctness.
As a result of our experience with actual trials of
differing equalizations in different rooms, we are in a
position to be more precise than Damaske could be,
about the significance of listening-space acoustics. It
is, in fact, a misunderstanding that an anechoic space
need be used. Atal .and Schroeder egarded listenerspace reverberation to be a contaminant in their studies
of concert-hall sound, and they wished to exclude it.
Specifically, we did identify one minor aspect of listener-space acoustics, one easily avoided, that accounts
for the effects noted by Damaske.
Jhe integrity of the crosstalk paths from loudspeakers
to ears can be compromised by competing ttfl.ected
paths that differ in delay from the primary paths by
amounts of less than 1 (or perhaps 2) ms. Substantial
contributions from such paths can begin to impair side
imaging and allow some appearance of front-back
ambiguity. Ordinary care taken in the setup to avoid
significant early-reflection paths obviates any deleterious effects. Longer delayed reflections merely appear
as "early" reflections in the concert-hall sense. These
are attributed by the listener to the performing space,
usually as minor augmentations in its reverberance.
Ensuring good equalization guarantees that if a user
is so careless with the setup as to allow early reflections,
the playback of a transaural recording will exhibit a
gradual degradation from a quality that is "nothing less
than amazing" to one that is at least "excellent."

1.7 Summary
The principal purpose of this paper is to report on
improvements we have discovered in a particular signalprocessing scheme, the crosstalk-canceling scheme of
Atal and Schroeder. These improvements, which are
largely practical, offer the possibility of a significant
restructuring of stereo recording to make for extraordinary improvements in stereo quality.
5

PAPERS

COOPER AND BAUCK

We have cited economies in processing following


from the discovery that minimum-phase filters may be
used exclusively, have cited a revealed structure that
allows the shaping of crosstalk filters independently
of equalization filters, and have cited bases for robust
results, in contrast to a previously supposed fragility.
This robustness consists of a tolerance for listener
movement and a tolerance for nonanechoic listeningspace acoustics.
In the following sections we review the AtalSchroeder scheme and show how its lattice-arrayed
filters may be seen to be equivalent to a shuffler array,
develop the corresponding formulas, and illustrate these
with plots based on the spherical-model head. We cast
the shuffler functions in a form that exhibits a factoring
into an equalization part and a crosstalk-canceling part,
and we illustrate the significance of these with plots
taken from old data for the so-called CBS-NASA head
[16].
In so doing, we point out that crosstalk canceling is
a process that is an inverse of the process we have
called binaural synthesis, and we provide a block diagram of a multiple-input binaural synthesizer.
Finally, we turn to aspects of transaural technology
that are less related to recording. We introduce the
per cui
concept of virtual loudspeakers, whereby a given pair
of actual loudspeakers may be replaced by a number
of virtual loudspeakers at arbitrary positions. This may
be used to solve the problem of too closely spaced
loudspeakers in stereo television, for example. It also
may be used to present cinema-surround stereo via only
two loudspeakers without loss of surround effect, or
to similarly present full-sphere ambisonic surround.
Also, we can resurrect a contribution by Bauer [ 17] to
provide binaural-like listening to stereo
material,
making for inexpensive, accurate "Bauer boxes."
1 CROSSTALK-CANCELING FILTERS
1.1 Atai-Schroeder Filters
The Atal-Schroeder crosstalk canceler is shown in
Fig. 1(a), adapted from [12]. In Schroeder's notation,
S represents the transfer function from a source (loudspeaker) to a same-side (ipsilateral) ear, while A is the
transfer function to an alternate-side (contralateral) ear.
The acoustic layout is symmetric, so that the transfer
functions from the LF loudspeaker to the ears equal
those for the RF loudspeaker. The notation C = -AIS
is used for the filter in the cross path. Elementary algebra may be used [11] to show that a signal introduced
at the top left does indeed appear, unchanged and uncontaminated, at the left ear of the listener, and so
on.
Schroeder also treats the requirements of causality
(realizability) [11]. It is clear from first principles that
A involves a greater delay than does S, so that C is
causal. This is also seen from Fig. 1(b), adapted from
M!IJller [18], which shows a plot of the impulse response
c(T) for the cross filter C(w), as determined for the
Neumann KU-80i head at 45 incidence. Thus C is
6

realizable as an FIR digital filter, or a transversal analog


filter, and so also for C 2 Then 1/( 1 - C2 ) is realized
by placing the C2 filter in a recursive loop. The terminal
filter 11S is not causal on its face, but with its impulse
response padded with sufficient delay, the same in both
channels, causal representations are obtained. These
realizations were signal-processing routines in an IBM
7090 computer.
The impulse response plotted in Fig. l(b) is of short
duration, which shows that crosstalk cancellation is
speedily completed, requiring the listening space to be
anechoic for only the first few milliseconds. This is
equivalent to the finding, stated in the Introduction,
that it is sufficient, in the listening setup, to exclude
early reflection paths.
The brevity of this impulse response bears also on
questions of equalization style, as will be seen later.

1.2 Shuffler Filters


The Atal-Schroeder scheme may be seen to be
equivalent to the lattice arrangement of filters shown
in Fig. 2, provided that the filter in the cross path is

(la)

and that the one in the same-side path is

(lb)

These may be seen to be the matrix elements (S' on


the diagonal, A' on the counterdiagonal) of the 2 X 2
matrix that is inverse to the acoustic matrix evident
from Fig. 1.
The shuffler arrangement of filters, also shown in
Fig. 2, may be seen to be equivalent to the lattice.
There the filter for the sum of inputs with both parts
positive is denoted by P', while the filter for the difference (sum with one input negative) has been denoted
by N'. Equivalence demands that
N'

S'

A'

(2a)

and

P'

S'

A'

(2b)

Division by 2 would be omitted for difference-sum


networks designed with uniform 3-dB losses so that,
without loss of generality, we write
N'

S-A

(3a)

and
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

PROSPECTS FOR TRANSAURAL RECORDING

PAPERS

P'

was then identifi d.as being of frequency-independent


slope. The result was experimental in that mel_lsured,
and smoothed response functions were used in thecalculations. Thus, Sand A have excess-phase parts that
differ in the amount of frequency-independent delay.
Considering S alone, however, Schroeder found the
delay to be ignorable for the purposes of constructing
liS, as we have seen.
To discuss pairs of filter functions, we introduce the
concept of joint minimum phase. To be of joint minimum
phase, a set of filter functions is to have a common
excess phase, and this excess is to be a (bounded) frequency-independent delay. Then removing the excess
phase to a common factor leaves a delay-normalized
set of filters that are of minimum phase in the ordinary
sense. They are also at least conditionally stable, so

(3b)

S +A

Thus the matrix of the shuffter transfer functions, diagonal with elements N' and P', is the inverse of the
diagonal acoustic matrix for difference-and-sum ear
sounds with elements N = S - A and P = S + A.
1.3 Minimum-Phase Characteristics
In 1977 Mehrgardt and Mellert [19] showed experimentally that the head-related transfer functions are
of minimum phase to within a frequency-independent
delay, a delay that is incident-angle dependent. They
proceeded via the Hilbert transform of the log-magnitude
response to calculate the minimum-phase part of the
phase response. The remainder, or excess-phase part,
L

(a)
(/)

t:
wZ

(/) :::> 0

Z>-

00::

a.c:x

(/)0::

w ..... -1

O::ii)

AA\ A

.A l A. ft.n_.

vv-vv

vv

""

a:
<X

--2

DELAY (ms)
(b)

Fig. I. (a) Atal-Schroeder crosstalk-canceling filter arr y. (b) Plot of the impulse response of its cross filter, .c = - (11S.
The filter matrix of (a), adapted from [ 12], is the inverse of the matrix of acoustic transfer functions, the. matnx showmg S
for the transmission from a loudspeaker to the same-side ear and A for the transmission to the a ternate-s1de e r: The curve
of (b), adapted from [ 18], is the impulse response of C and shows the process to be completed m very few milliseconds.
J.

Audio

Eng.

Soc.,

Vol.

37,

No.

1/2,

1989

January/February
7

PAPERS

COOPER AND BAUCK

that products, ratios, and reciprocals are in the set.


Thus A and S are not of joint minimum phase, and
neither are A and S
Of course, in the Atal-Schroeder filters, joint minimum phase is not at issue, since the 2 X 2 matrix has
S along the diagonal and A along the counterdiagonal.
On the other hand, the shuffler filter has N and P
along the same diagonal (the counterdiagonal is zero)
so that it would seem odd if N and P were not of joint
minimum phase-odd because then the difference signal
would be required to become more and more out of
step with the sum, as frequency increases. However
that may be, we made the same sort of check that
Mehrgardt and Mellert had made and found that N
and P are indeed of joint minimum phase, so also for
Nand P.
A practical consequence is that magnitudes alone,
1Nl and IPI, or their reciprocals are a sufficient specification (phase is redundant as being calculable by
Hilbert transform), whether for filter synthesis or in
determining the head functions to be measured experimentally. Also, since any common frequency-independent delay may be omitted, the programming and
hardware requirements of an FIR realization are reduced. In fact, a non-FIR (or IIR) filter may be programmed at low cost. Successive fitting of a cascade
of biquadratic forms (ratios of frequency-dependent,
second-order transfer functions) is a natural approach,
and these take scarcely more than a half-dozen lines
of code each in a typical DSP chip. In analog-filter
synthesis, the "biquad" is also a natural choice of synthesis element.
1

Hilbert transform of log lA! SI. Nevertheless, it is easy


to see that these data are sufficient to determine 1Nl
and I PI, since the magnitudes of the phasor difference
and sum are, according to the triimgle rule, simply
1Nl= (IAI 2

+ ISI 2

2IASI cos w-r)l-2

(4a)

2IASI cos WT)\-2.

(4b)

and

IPI = <IAI 2

ISI 2

Thus it is seen that frequency-response plots of these


functions will show a pattern of interleaved alternations
in curves that swing between an upper envelope of

1.4 Structure of 1Nl and IPI


Head data are most often measured in the form of
lA I, IS I, and IT I, of which the last is the interaural
phase delay, redundant in that part calculable from the

IN, Plmax = ISI

+ IAI

(5a)

and a lower envelope of


IN, Plmin = IS - IAI

(5b)

These alternating curves will intersect one another along


a locus for which the cosine is null, and this locus is

(6)

Of course, whereN andP are equal, there is no crosstalk,


so that I , PI rms may be referred to as a null-crosstalk
locus. Actually, zero crosstalk requires N and P to be
equal in phase as well as magnitude, and this is approximated only after IN I and I PI have tracked each
other over an extended frequency interval. As defined
by Eq. (6), however, the curve will be used below to
define an equalization reference.
1.5 Equalized Head Functions
The form of the null-crosstalk locus, Eq. (6), is that
of total rms transmission,
IE(w, e)l = IN. Plrms

(7)

a function dependent on frequency and incidence angle.


We take this to define, for a given artificial head, the
frequency trend jointly for the two ears, for a particular
incidence angle. We take E to be of minimum phase,
and we use it to define a free-field equalization for a
particular reference (incidence) direction eo.
The equalized transfer function for the difference
signal is designated 0N:
Lattice

Shuffler

Fig. 2. Lattice-array and shuffter-array filters compared. The


transmission from inputs x and y to outputs X 1 and Y 1 may1 be
made the same for the two filters by making P' = (S 1 +A )1
1
2 andN 1 = (S 1 - A )12.
8

N
E(eo)

(8a)

and designated op for the sum


J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

PAPERS

PROSPECTS FOR TRANSAURAL RECORDING

p
(8b)

E(eo)

The reference direction has been taken to be 0 for the


AK (Aachen head), but for loudspeakers to be placed
at 30, a 30 reference would be more appropriate.
When plotted with the same incidence angle as the
reference, the frequency-response curves for I 0NI and
I 0PI intersect one another at the constant level 0 dB.
Such a plot for a spherical-model head [2], [3] is shown
in Fig. 3, solid line (a). These plots are actually for
the reciprocals lWNI and lii0PI, as would be used in
a crosstalk canceler, but the decibel scale makes it easy
to interpret the plots also for the direct functions loNI
and loP I. The dashed line shows a possible modification,
to be discussed later. Curve (b) shows the equalization
11I E 1. A crosstalk canceler based on these curves has
been tried, and its performance is extremely satisfying.
Plots for more realistic models of the human head
(see Fig. 13) resemble the solid-line plot (a) in Fig. 3
but differ remarkably in equalization from plot (b).
The reason is that the spherical-model head is one whose
functions are quite smooth [20] because pinnas are
omitted. Inclusion of pinnas on a realistic model invokes
the large conch resonance, profoundly altering the 11
curve.
For example, data [21] for the CBS-NASA head provide the equalization curves of Fig. 4. A curve for 30
and one for 0 are shown. For clarity, the plot was
made with a 3-dB displacement inserted between the
curves. It will be seen that the curves differ by little
in comparison to the range of variation that they encompass. Thus a 0 equalization could substitute for a
30 one with little effect, but omission of such equalization would be a serious matter.
The seriousness of the matter became evident from

our loudspeaker playback of recordings from the Neumann KU-80 h;a l. As indicated in the Introduction,
the stereo effect"was of a "hole in the middle you could
drive a truck through," as one listener said. Wli n co verted to transaural, using the crosstalk canceler built
with the functions of Fig. 3, Schroeder's description
of "nothing less than amazing" spatial and imaging
qualities certainly applied, but it was possible to notice
that the equalization was "a little off." Later, the appearance of a "hole" tendency in this recording would
alert us to early reflections in a listening setup. As we
also noted, recordings from the Aachen head (0 equalization) provided stereo of unequaled excellence by
ordinary standards. Certainly, no "hole" was observed,
even without cancellation.
1.6 System Transfer Functions
In the following, M will be used to designate either
N or P. It will be understood to be a function of frequency and incidence angle. Thus for natural directional
hearing, either member of the pair of overall transfer
functions from a source at angle e to the ears is designated
Hn

(9)

Mn(e)

and it is the transfer function for the difference or sum


in ear signals, depending on whether Nor Pis substituted
for M. The sources are to be consiaered one at a time,
whether a direct source or one of the many components
of reverberation. Superposition is applicable in linear
acoustics. The subscript n is used to designate a natural
head, the head of the listener. A signal-theoretic basis
for understanding directional hearing would begin at
tbis point.
.I
8.--- --,- -rrT

12r--r-r--

----

-r

---- -r-r-r

6r--- ------ ----+----- ---- --

10

6r----+

-- ---+

-- ----

CD

Fig. 3. Shuffier filter characteristics in crosstalk canceling


for spherical-model head. (a) Magnitudes of l!N and 1/P
normalized against curve shown in (b). Because curves (a)
are free of the idiosyncratic detail for specific heads (as in
Fig. 13), such characteristics are tolerant of variations in
listener-head shapes and positions. Dashed curves show a
possible modification of the envelope of the alteri:i'ations.
Because the filters are of joint minimum phase, the ',Phase
data are redundant and not shown.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

Frequency

Fig. 4. Equalization curves from data [21] for the CBS-NASA


head [ 16]. Curves for both 30 and 0 reference directions
are shown, displaced from one another by 3 dB. The decibel
range of variation (conch resonance) greatly exceeds any
difference.
9

PAPERS

COOPER AND BAUCK

For listening via loudspeakers to a binaural recording,


the transfer functions (subscript b) are designated as
(10)

in which the artificial head (or equivalent in binaural


synthesis), subscript a, is designated as being equalized
for the loudspeaker positions So. This equalization
prevents the nondirectional part of the conch resonance,
already present in the listener's ears, from being introduced a second time, a minimum requirement of
equalization.
In this instance, the sounds at the listener's ears will
be drawn from an extremely restricted set, in comparison
to all possible sound combinations. The restriction is
to that of linear combinations of the sounds at the ears
of the artificial head, namely "shuffled" ear sounds,
but combinations that otherwise closely resemble ear
sounds themselves.
For a reasonably apt artificial head the joint angledependent spectral magnitudes will be determined by
the artificial head, except for shuffling, to closely resemble those of natural hearing. The result, as confirmed
in listening, is an extremely plausible directional portrayal, much more so than available by any means in
conventional stereo. Even so, listeners do greatly appreciate the improvement they experience in being
provided "unshuffled" ear sounds, those that embody
the alternations in their difference-and-sum spectra that
are characteristics of their own ear sounds. Since these
alternations depend strongly on the cosine of the interaural phase, the significance of this element is confirmed. This unshuffling is provided in the crosstalk
cancellation of transaural stereo.
For transaural listening, the transfer function (subscript t) is

H
t

Ma(S)Mn(S 0)
oMx(So)

(11)

in which subscript x designates transfer functions of


the head used to model the crosstalk cancellation. This
equation shows the use of equalized functions, for the
reference direction S0 , for both Ma and Mx If these
differ in any of their characteristics, each is to be
equalized against its own characteristics. The appearance of a conch resonance (for So) is, as in the above,
reserved for the listener's head M 0
If Mx is the same as M 0 , for example, then Eq. (11)
describes the simulation of natural directional hearing
except for the substitution of the artificial head (and
ears) for the listener's own. Clearly, if all three heads
are the same, then Eq. (11) is identical to Eq. (9), and
an exact simulation of natural hearing would be the
result.
In this last statement direct proof is seen of what is
generally regarded as the "unquestionable validity" of
the transaural plan for stereo recording and reproduction. Of course, this provides the transaural design
engineer an extremely strong vantage position from
10

which to undeitake departures in the service of practicality. It is usually one of the strengths of starting
from an optimal position that departures from the optimum in design parameters usually produce remarkably
small effects.
1.7 Practical Design Considerations
Except for a custom-designed crosstalk canceler, it
is not to be expected that Mx will be the same as M 0 ,
and a commercial release of a transaural recording would
have to embody an Mx that would be required to be
satisfactory for a wide range of listener heads, each
with its own M 0 Generally this is not a difficult requirement. It has been found, for example, that the
crosstalk canceler based on a spherical-model head [2],
[3] produces immensely satisfying results for a wide
range of listeners' heads. Heads that are somewhat
small may be placed somewhat nearer the loudspeakers,
and those that are somewhat large may be placed at a
somewhat greater distance, as may be seen from the
structure of head functions, but the exact placement
does not seem to be a critical matter for most listeners.
What is probably the case is not that a sphere is
necessarily a best fit, but that it is a "comfortable" fit
for most heads just because of its inexactness. While
the advantages of inexactness merit further exploration,
we have tried another aspect for inexact treatment, the
domain of wavelengths shorter than about 50 mm (frequencies higher than about 6 kHz). The first experimental crosstalk-canceler filters followed, after a
somewhat abrupt transition, the null-crosstalk contour
of Eq. (6) for the shorter wavelengths. We attribute
the tolerance in listener movement to these aspects of
inexactness in Mx filters.
The choice of a rather abrupt "cut" in our first experimental canceler may have been somewhat extreme.
We do notice a tendency for sibilantlike sounds and
clicklike sounds to be mislocated, generally toward
the front. This is a confirmation, extended to short
wavelengths, of the importance of interaural phase.
Although this style of design variation has proved instructive, we are now inclined to rely on a more uniform
distribution of inexactness, of which the spherical
functions are a good example. Another variation of
interest is that of introducing a gradual taper, as shown
in Fig. 3(a), dashed line, wherein the upper and lower
envelopes approach the null-crosstalk contour in a
somewhat less accelerated manner for short wavelengths, replacing the more abrupt cut.
We visualize these styles of inexactness as defining
a volume of space near each ear of the listener, a space
over which cancellation is satisfactorily accurate. We
visualize this volume as being of smaller extent for the
shorter wavelengths, and we suppose that it is appropriate to be less exact at these shorter wavelengths.
We also believe, despite our successes with spherical
functions, that we need to continue to investigate this
problem. Thus the tolerance we have gained for listener
movement, already satisfactory for most purposes, may
be extended.
J. Audio Eng. Soc., Vol. 37, No. 112, 1989 January/February

PAPERS

PROSPECTS FOR TRANSAURAL RECORDING

2 BINAURAL SYNTHESIS
2.1 Synthesis Filters

Shuffler filters based on the direct functions N and


P are used to simulate the progress around the head to
the ears of two sounds of incidence angles ei at once.
Instead of an inverse crosstalk filter, it is the direct
crosstalk filter that is to be constructed. Of course, if
only one of these signals is desired, one of the inputs
to the shuffler may be left silent. Degenerate forms of
the shuffler are used for 0 and 180.
The shuffler synthesizer implements Eq. (9) in effect,
but provides equalized ear signals instead, thus actually
simulating the use of an equalized artificial head. The
transfer function may be written
(12)

using the same convention that M may stand for either


of Nor P. The subscript s is used to denote head functions used in synthesis, even though these might have
been measured for an artificial head, a natural head,
or derived from a mathematical model. The transfer
function is for a source simulated at position eio where
i is a symbol for the indexing over a discrete set of
incidence angles.
This transfer function may be written in greater detail
as

H
s

(6i)Ms(6i)
(60 )(6i)

(13)

which may be written in factored form as


(14)

in which the factors are


(6i)

(15)

(6o)

and
(16)

particular, "tonal-color" characteristics, for the two


ears jointly, are represented by the factor oE. It seems
that this color is used in a part of the directional earing
process (spectral pattern recognition) at a level below
consciousness, but that only the directional result is
presented at the level of consciousness. For example,
speaking voices from behind would have an extremely
"hollow" sound, as will be seen, if the hearing mechanism did not function as indicated.
This hollowness can be heard only under exceptional
circumstances, such as a binaural recording played
without crosstalk cancellation. In this example, a voice
was recorded with the AK while the speaking person
moved around the artificial head. Listeners heard the
voice move outside the space between the 30 loudspeakers, barely into the side quadrants, in the listener
space. While the original movement had been through
to the back quadrant, the listeners heard movement
that turned forward again into the front quadrant, but
with an altered vocal quality, that "hollow" quality.
Some listeners, when particularly neutral, transparentsounding loudspeakers were used, would hear the voice
"jump" to the back quadrant before the change in quality
had become explicit. The listeners that stayed with the
frontal localization presumably did so because of the
"visual knowledge" that the loudspeakers were in front
and because the equalization of the AK is not exactly
suited to 30 presentation. With, crosstalk cancellation, the transitioq to the back quadrant was characterized by continuity.
Changes in tonal color for sounds presented in elevation seem also to be responsible for impressions of
elevated localization, and, again, the coloration is prevented from appearing in consciousness. One of us
re<;.alls a certain loudspeaker with a flawed crossover
network; instead of producing a colored sound/, it produced an elevated impression. Under ordinary conditions, experiments with finger snaps show that localization determinations are formed too rapidly to be
significantly aided by head motions. To the extent that
these color variations are adequately presented in the
joint spectral transmission function, 0 would represent
a signal-theoretic basis for understanding these calordetermined directional judgments. We examine horizontal-direction spectral plots for 0 below.
The factor M may be written in magnitude as
A

Here M is to be distinguished from 0M in that the latter


represents a normalized (equalized) transfer function,
normalized against the joint spectral transmission E
evaluated for the reference direction eo. while M is
the transfer function normalized against the joint spectral
transmission for the same direction as the one for which
it is evaluated, in this case ei. Also 0 is the joint
spectral transmission normalized (equalized) with respect to the reference-direction joint transmission.
A

2.2 Factors in the Synthesis Functions

The factoring shown in the preceding section 'tends


to allocate certain elements of directional hearing to
one factor and certain others to the other factor.' In
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

(17)

in which the + sign is for


I AN 1. The coefficient K is

I PI and the - sign is for


A

K=

IS!AI + IA!SI .

(18)

It is seen that
(19)

Thus M, defined to be of minimum phase, is a function


A

11

PAPERS

COOPER AND BAUCK

in increase and decrease. A ray-acoustic approximation


[3, Fig. l(b)] of this limit is (alc)(Sc -lee- SI+ sin
8), where a is the head radius, for example, 90 mm; c
is the speed of sound, for example, 345 mls; and Se =
-rrl2. This is 671 f..LS at 8 = Se Increments in midfrequency delay arise from Hilbert transforms of log I
SIAj. The Rayleigh, low-frequency, total-delay limit
3(alc) sin 8, with 3alc = 783 f..LS, disagrees With the
140-Hz plot. Heads other than this one of papiermache agree more closely with Rayleigh.
However that may be, it is seen from Fig. 5 that the
interaural phase delays for directions less than 90 very
nearly mirror those for directions exceeding 90. Thus
it may be supposed that interaural delay is not relied
upon, to any great extent, in distinguishing sounds in
back from those in front. On the other hand, it is seen
that interaural delay is a steep function of direction,

exclusively of the phasor interaural transmission ratio


AIS. Its magnitude determines K, and its phase is equal
to the argument of the cosine in Eq. ( 17). Thus 'M and
AI S are essentially equivalent signal-theoretic bases
for their role in directional hearing. This role appears
largely to be the determination of the lateral aspect of
the localization angle, as distinguished from frontback and elevation aspects. Plots of ISIA I and interaural
phase delay may be found in Mertens [22]. These are
adapted here as Fig. 5, showing interaural phase delay
in microseconds, and Fig. 6, showing the interaural
level difference ISIAI in decibel units, both plotted
versus incidence angle.
The plot of interaural phase delay, Fig. 5, shows
clearly that a substantial part of that delay is frequency
independent, seeming to plot a trend toward a highfrequency-limit curve (lower bound) that is ramplike

1300

1200

1100

I
I

,.,+--l I

I "

/
I

I I
I

'\

1/1

j.
>-

<! 800
...J

I
I

140

Vl

<!
I

z
H

600

oo

l 1/'

u..
u..

400

.'7 \

I
I

'

AV

!I

I
I
I

rtrli
!-' 7f..
. 'I.

1\

""' '\
"'-I

\\
\

"'

200

100

0
0

71. iVi
/,: lt i'--.

. (\

\j \

I \ 1\ '\ \\- \ \
.,\. ' \'\

!M

1\[\. \\
. I

2200

'"-

'\.

1100

\ -

ll';(1
;(/;

\1\

7800

1;' .M

I ',, \\

I /,.300

'
-- - --- - " \I

,
,

j/

I 1/

w 500
a::
w

Ci

31

a..

.- -- -..

/; ..

700

/-

- 900

1000

\ ,\

4200

\\
20

40

60

80

100

120

140

160

180

INCIDENCE ANGLE (degl


Fig. 5. Plots of interaural difference in phase delay versus angle of incidence for various frequencies (Hz). Diffraction theory
indicates a maximum delay at low frequencies that is much less than shown for 140Hz. A low-frequency mechanical resonance
in this papier-mache head is suspected. Adapted from [22].
12

J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

PROSPECTS FOR TRANSAURAL RECORDING

PAPERS

highest curves in Fig. 6 have been computed, with the


results plotted i i Fig. 7.
Generally it may be said that Figs. 6 and 7 show that
the upper and lower envelope contours lie closer together.
at higher frequencies and for directions near 90, conditions making for the deepest head shadow for the
contralateral ear. Also, the alternations in magnitude,
along the frequency scale, are most rapid near 90.
Toward the front, less than 60, the alternations are
largest in magnitude, as they also are toward the back,
beyond 120. Thus the most reliance, for directional
hearing, on interaural phase should lie toward the front
or toward the back, and there should be somewhat less
reliance on interaural phase near 90, but only at the
higher frequencies. At 4 kHz and below, the reliance
would appear to be substantial. Some front-back
asymmetry is seen in these curves, but it is not clear
whether directional hearing can rely on these asym-

for directions in front, and a steep function again, for


directions in back. Thus a strong reliance may be placed
on interaural phase, from a signal-theoretic point of
view, for a precise determination of the lateral component of direction.
In'Eq. (17) it is seen that interaural phase appears,
for the sum-and-difference ear sounds, as an alternation
in their spectral magnitudes because of the cosine. The
extent to which the alternation appears does depend
on the value of K. The upper and lower envelopes of
these alternations, the extrema in I 'MI. are given by

I 'M lex

(20)

(1 K)'h

in which K is determined from ISIA I, as shown in Eq.


(18). Thus Fig. 6 may be studied to estimate the directional dependence of these envelope functions. To
assist in this estimation, envelope contours for the three

28

I I
I I I

26

I I I

I I

I
I

I \

I \

I I

I
I

I !
i i I I 'I !
I
! I I ,t, I ,I
I I
24
1
I
I I I 1 \ I I I 1\1
I
I I
I /I \ I \ (
I
22
I
f
'I II : I I 1
,! w'I
II
I I
I
20
I
'Ji ! I I ;
I I
!
I
I \
I
:
I
18
I
I
I I
I
I
I
I I
I
r
!\
I
I
i
I I
16
I

CO
'0

....J

UJ
....J

14

I
..

UJ

',

'

I
I

u..

10

/1 '

':

:/

i/

..- 2(0

--.
-

I 1:

,.,/1/

/)

f,Y
I

!\

'

''

.-

'/ft ..
4

\,j

I .,

1\

.
1 lI I
-r 1 v \\

=-40
20

,j

- --60

\'-

--. -- --..

310

'

!--

80

100

II

,...\

)0

\
\

\\I\

I'

I{

II

/:

J:

f\1

I\ i

Y.

.i

\i

/!J'I09

I!

.\lJ

\
\

1\

I:

!If

4Zoq

1\

,_J

-f.,./'

.r

12

1/

I!

8obl

120

INc;IDENCE ANGLE (deg)

,\_

rA

'J . \
\\ \

,..
140

\
160

'
180

Fig. 6. Plots of interaural difference in leveliStAI ver ys angle of incidence for various frequencies (Hz). Adapted from
[22].
13

J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

PAPERS

COOPER AND BAUCK

are shown in Fig. 9. In Fig. 8 it is seen that differences


in spectral transmission between back angles are not
very large, and are mostly in the region above 4kHz.
Between front angles, the differences are seen from
Fig. 9 to be not very large either, and are mostly in
the "presence" region from 1 to about 4kHz. Thus we
come to the conclusion that there is little to rely upon,
in terms of spectral color, for distinguishing among
the back angles or among the front angles, although
we cannot dismiss the possibility entirely that some
reliance is placed there. It is clear, however, that the

metries to resolve front-back determinations. These


asymmetries are not consistent with frequency, to be
sure, but one should be wary of hearing's potential of
making much of seemingly insignificant, even idiosyncratic, detail. Nevertheless it behooves us to look
elsewhere for the front-back cues.
We have used head data supplied by Torick [21] to
plot 0(6) against frequency for a reference direction
60 = 90, a medial angle. Plots for 6 of 120, 150,
and 180, a back-angle family, are shown in Fig. 8,
while plots for the front-angle family, 0, 30, and 60,

-=-
-......:

.............

''..._

......

"";.
f

.-:-

...

............... }-..

--

t ......

'
\

7;

{/

ocl

CD

-6

...._.1

., V /
7

7I
/,

1i

r\

I i\

V
V

20

\ \i \
\ l V\
\ !\ 1/i\ \
i.
\
\ .I

fli
0

J;

200

./.,

...

.... / ......

-2

. /

"'

'

-----

.........
178oot-- J ........... /r t-./...

/'

,._,

.,

.
/

....\

.!

40

60

80

100

120

INCIDENCE ANGLE (deg)

'"

140"

i
160

180

Fig. 7. Plots of alternation envelopes square root of 1 K, versus angle of incidence for three highest frequencies of Fig.

6.
8
6

2r----r--

or-

-n

--

rn---- --r-

-2

CD

CD

3-4

-2
(I)

-6 ---+------4-----+--- ---+
(I)

-.:::::-

>

j -4
-10

_J

-8

-12L,--*-..l.-..l.....,..J.,,.W.....L...I..-!---+-

Y....J.....:.'-W.:', 0.1

-10

-8
Frequency

Fig. 8. Plots of normalized rms transmission to two ears


jointly for a family of back angles of incidence. The reference
direction for normalization is taken to be 90. Data are for
the CBS-NASA head. The contrast with Fig. 9 is remarkable.
14

-12
0.1

0.2
Frequency

Fig. 9. Plots as in Fig. 8, except that a family of front angles


of incidence are shown. The contrast with Fig. 8 is remarkable.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

PROSPECTS FOR TRANSAURAL RECORDING

1-'AI-'t:H:>

two families are remarkably different from each other.


A direct indication of front-back difference may be
shown as in Fig. 10, plots of 0(6) for the reference
direction of 0 and e equal to 90' 120' 150' and
180. This is the front-back difference against 0. These
curves indicate the front-back difference as characterized by a marked depression in level in the range
from about 1 to 6 kHz, along with elevations that are
equally striking in the range from about 6 to 10kHz.
The 90 curve is included as a "back" spectrum because
its shape qualifies it as a family member and one whose
emphasis on high frequencies tends to elevate interaural
level difference to an importance not identified elsewhere. It is not difficult to imagine the "hollow" sound,
as discussed above, that these transmission characteristics would cause if they were ordinarily consciously
heard. This altered spectral quality does indeed, however, appear to be the principal determinant of back
sound in discrimination from front sound.
This review of hearing characteristics allows certain
rules of thumb to be identified, namely, that interaural
phase, represented as amplitude alternations in the
spectra of difference-and-sum ear signals, is the discriminant for the lateral component of image position,
while variations in joint spectral transmission are the
discriminant for the front-back component and, presumably, for the elevation component. However, it
also allows certain areas of uncertainty to be noted.
These various points of observation will be of limited
help in the design of the binaural-synthesis filters because the best rule will doubtlessly prove to be slavish
simulations of the best measured head-related transfer
functions available. At least these observations, together
with those the designer's own experience may develop,
will make for a certain intelligent slavishness.

2.3 Basic Synthesis Array

'

An array of filters for the sum and difference signals


is shown in Fig. 11. On the right-hand side, several
inputs are designated, one for each of the incidence
angles to be simulated. For each left-angle input, a
matching input is shown for the symmetric right-hand
angle, and sum and difference signals are shown as
being formed from these symmetric inputs. The signal
pairs are then transmitted through 0N and op filters, 0N
for difference signals and op for sum signals. Each of
these filters is designed to match the specific angle
designations for the inputs, 0N(6i) and 0P(6i) separately
for each Si designated at the input. The filtered difference
signals are then combined in a common sum, and the
filtered sum signals are then combined in a common
sum. These common sums are then further combined
in difference-and-sum fashion to form simulated binaural outputs.
This basic array may be thought of as a discreteangle, binaural, panoramic mixer. Variations may include linear mixing arrangements and level adjustments
to be provided at each of the inputs. Also, some of the
inputs may receive outputs from an array of reverberators to form synthetic binaural-space reverberation
systems in the manner of Kendall et al. [1]. Low-cost
versions could provide for a very limited set of angles
as supplementary pa functions for tise with standard
mixing boards, and transaural outputs could be provided

oo
12.---.--.-.-.

.----.--r-r -rrrn

.---'1'------------'1'----1'----o

Inputs

10

-85

6 -- ------+--- -----r------r+

-90

4
CD

Outputs

Frequency

Fig. 10. Plots showing front-back difference as a family of


back-angle joint transmissions normalized against transmi sion for the incidence angle of 0, taken as the reference
direction.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 J nuary/February

Fig. 11. Basic binaural synthesizer. Inputs for a discrete,


symmetric set of simulated incidence angles are sho"Yn: A
multiplicity of shu fier filter , J; ased on head tran.smtsswn
functions, each spectfic to the mctdence angle to be stmulated,
are used. The output is binaural.
Hi

PAPERS

COOPER AND BAUCK

for productions that are being developed primarily in


conventional stereo. Further variations may be conceived. Some of these are described hereafter.
3 PROSPECTIVE PRODUCTION PROCESSORS
3.1 Equalizers

All commerically available artificial heads of quality


suitable for professional applications stand in need of
further equalization, since neither ear-canal nor diffusefield equalizations are appropriate in transaural recording. Ear resonances can be allowed only once in
the signal chain-in the listener's ear. Also, Fig. 1(b)
shows that directional information is recieved too rapidly for any diffuseness to develop.
Among those heads providing ear-canal signals, we
list Neumann KU-80, KEMAR, 1 B&K 4128, and the
Aachen head (AK). The extent of the 30 free-field
equalization required may be estimated from Fig. 4,
although those data are not specific to these heads. The
Aachen head is also available with external free-field
equalization for 0 incidence. The diffuse-field equalization devised by Killion [7] for KEMAR, and provided
by Neumann for their newer KU-81, reduces further
equalization needs to moderate corrections, as is also
true of the AK. Such further equalization may be provided by the manufacturer or a third party.
All commercially available earphones for binaural
monitoring similarly stand in need of 30 free-field
equalization. A few manufacturers are currently providing diffuse-field equalization [6], and an (inadvisable) interest in such standardization continues [23].
We are aware of only one earphone set, the Stax ProLambda, that has been accurately equalized against a
free-field reference by a third party [15], but for 0,
not 30. A decision by a third party to supply external
equalization for any but a selected few models entails
a substantial risk that only professional needs could
justify. Volume distribution of earphones suitably
equalized by the manufacturer probably lies some distance in the future.

the prospective availability of postproduction consumer


equipment, such as Bauer boxes and loudspeakerplacement compensators described below, requires such
standardization.
The acoustic characteristics of a loudspeaker-monitoring facility demand the usual attention. In addition,
the most accurate, most transparent, most self-effacing
loudspeakers should be chosen for such use and placed
to avoid early reflections.

3.3 Haii-Sound-Pickup Synthesizer


An arrangement involving hall-sound-pickup microphones is shown in Fig. 12. Two omnidirectional
microphones are flanking the artificial head. The signals
from these are delayed and provided to a binaural synthesizer. The latter may need inputs only for 90,
120, and 150 to provide sufficient flexibility, especially if more than two hall-sound-pickup microphones are needed in particular halls.
For the flanking microphones not too far back from
the orchestra, the hall-sound pickup would enhance
early reflections (concert-hall concept) in the 10-20ms range, and 90 synthesis angles would be suitable,
along with a choice of delay only somewhat more than
the microphone-head distance. For microphones placed
far enough back to represent the whole reverberation
field, synthesis would be at 120, with a delay somewhat more than the microphone-head distance. The
150 synthesis angle would probably be used infrequently. The relative level would follow the usual prescription of several decibels below that for a plainly
audible effect. For good concert halls, an almost subliminal contribution, if any, would be sufficient.
3.4 Transaural Panoramic Mixer

A transaural panoramic mixer is meant primarily as


a supplement to the pan functions of an ordinary stereo
mixer. It would be capable of replacing some of the
existing facilities solely to enhance the imaging qualities
by accurate synthesis for a limited number of channels,
or for special effects. A transaural converter would be
a part of the equipment.

3.2 Monitoring

Facilities for earphone monitoring require 30 freefield equalization as above, if it is not internal to the
earphone. If the program material to be monitored is
in the form of loudspeaker signals (whether transaural
or conventional stereo), there would also be needed a
binaural-synthesizer version of a circuit devised by
Bauer [17], the so-called Bauer box. The two inputs
would be processed to simulate 30.
Loudspeaker monitoring would require transaural
monitor equipment to derive the proper signals from
binaural material. It could embody a crosstalk canceler
of standard grade adopted for mass distribution. Some
means of assurance of adherence to a standard would
be needed for full reliance on such monitoring. Also,
1

16

KEMAR is a registered trademark ofKnowles Electronics.

r -------------------------- 1
L------------------------1

Ensemble
1

Art.
Head

Binaural Output
L--::...._--.J

_r--u

Fig. 12. Layout for use of a hall-sound-pickup synthesizer.


Flanking-microphone signals are delayed and subjected to
binaural synthesis simulating incidence angles from a limited
set of back angles. The binaural signals so derived are mixed
at reduced level with the signals from the main-pickup artificial
head.
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

PAPERS

PROSPECTS FOR TRANSAURAL RECORDING

tual-loudspeaker expander is, in principle, exact.

3.5 Binaural Panoramic Mixer


A binaural panoramic mixer would be a full elaboration of the basic synthesis array discussed in connection with Fig. 11. It would otherwise correspond
to a full stereo mixing console, except that binaural
pan functions would be used and the signals would be
in binaural format. Monitoring would be possible in
either binaural or transaural format.
3.6 Transaural Converter

Transaural conversion need be done only once, except


for monitoring, in the processing of a complete production, and there are good reasons for doing it only
once. The conversion would adhere to standards specified for mass distribution, and it would be executed
in an off-line facility capable of providing standards
assurance. At present, many producers use an off-line
facility for the conversion of digital masters to a final format as release masters. A similar concept applies here.

3.7 Processing Technology


All of these processing concepts may be realized in
either digital or analog form. Conversions between analog and digital data streams are, of course, to be kept
to a minimum, and this consideration will determine
the technology to be used in each instance. Equipment
for some of the processing steps should be made available in both technologies.
4 VIRTUAL LOUDSPEAKERS

A virtual loudspeaker is a transaural image synthesized to simulate the effect of a loudspeaker placed at
a specified image location. The process involves binaural synthesis followed by transaural conversion.
For example, an experimental processor has been constructed that makes a pair of loudspeakers placed at
15 sound as if the loudspeakers had been placed at
30. Applications are indicated below.
4.1 Correction of Loudspeaker Placement

Some users may find that a loudspeaker placement


that is convenient for their listening-room layout, and
that avoids early reflections, may make for an inconvenient listening position unless the equal-distance 30
rule is violated. In such cases, virtual-loudspeaker
electronics can provide a 30 impression for loudspeakers placed at other angles. An adjustment for unequal distances may also be provided.
4.2 TV Expander

Another example of correction of loudspeaker


placement is found in the so-called TV expander. Television receivers usually offer cabinet-mounted loudspeakers that are spaced much too close together (less
than 15) to provide a good stereo effect. Presentday TV expanders, usually involving some kind.of ad
hoc processing of the difference channel, are to '.imprecise to preserve the producer's intentions. The virJ. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

4.3 Centered Virtual Loudspeaker

Sound systems for large-screen television applications.


often lack a facility found in cinema exhibition, a centered, behind-the-screen loudspeaker, important for a
realistic presentation of dialogue. The substitute phantom image from two loudspeakers unfortunately does
not sound the same as that single loudspeaker. A centered virtual loudspeaker would be a significant improvement.
4.4 Virtual Loudspeakers in Back

Some television sound systems are designed to supply


special-effects signals (derived from cinema sound
tracks) to loudspeakers placed behind the viewer. Unfortunately many viewers cannot provide space behind
their favorite viewing position nor bear the expense of
such loudspeakers. Virtual loudspeakers may be substituted. Similarly, certain ambience-enhancing systems
require loudspeakers placed behind the listener. These
also can be virtual.
4.5 Surround Stereo

Ear-sound-oriented, transaural stereo is a full-sphere


(includes imaging in elevation) surround-stereo system.
While it is most naturally used as a straightforward
enhancement of the basic virtues of qonventional stereo,
it may certainly be u:;ed to provide any of the astonishing
demonstrations of loudspeaker-oriented quadraphonic
systems of a previous era.
An exemplary sound-field-oriented surround-stereo
system [3] is the Ambisonic system UHJ, for which a
substantial body of program material, in full-sphere B
format [24], exists. Some of this may be recast, using
virtual-loudspeaker processing, for rerelease,fn transaural format.
5 INSTRUMENTATION-GRADE CANCELER

A need exists for a crosstalk canceler satisfying the


original aspirations of Atal and Schroeder. Accurate
documentation of the subjective experience of a sonic
event requires an instrumentation-grade artificial head
and recording means, together with an acoustic presentation means of equal quality. Loudspeaker presentation through an instrumentation-grade crosstalk canceler is the option that will provide full assurance that
the sounds will be heard as exterior to the listener's
head.
Such a canceler will use head functions as closely
modeled on a replica of a representative head as possible
and, where necessary, will use data taken for a specific
listener. An example of canceler curves for a specific
head is shown in Fig. 13. A digital canceler would be
able to accept data files for different listeners and adjust
the filters accordingly. In any case, the canceler would
be accurately faithful to its head model over the whole
audio-frequency range.
Applications abound in environmental acoustics,
17

PAPERS

COOPER AND BAUCK

psychoacoustic and otological research laboratories,


and audiometric and otological clinics, to name a few.
In critical applications, replacement of earphones of
dubious characteristics and flawed exteriorization could
prove decisive.
6 CONCLUSIONS
We have shown that crosstalk canceling of well-prepared binaural-stereo program material, to make transaural recordings, can be accomplished with a technology
that is simpler than previously supposed, and can produce recordings that may be played as ordinary stereo
recordings, but that reveal "amazing" natural spatial
and imaging effects that are more robust, with respect
to listener movement and playback acoustics, than previously supposed. The recording of such "well-prepared" binaural material is seen as a crucial starting
point for making a good transaural recording.
Artistic considerations are of major importance, of
course, and we have also shown that recent technical
advances in understanding the importance of correct
equalizations must be implemented to support the artistic
intent. We have argued that this support requires implementation at the equipment-design level. We have
explored the relation of equalization with respect to
the maintenance of an excellent stereo effect under all
conditions of playback, with respect to the prospects
of monitoring with binaural headphones and with respect
to preserving the integrity of localization.
We have provided a brief survey of the variety of
processing that may be accomplished within our conception of transaural-binaural technology. This has included the processing necessary in record production
and a few items that the consumer could use to advantage. We also note instrumentation applications.
The expectation is that some of this transaural-bi-

Fig. l3. Shuftler filter characteristics in crosstalk canceling


for a specific listener head [16] and a loudspeaker placement
of 30. Solid-line curves show magnitudes of 1/N and
11P normalized according to solid-line curve of Fig. 4. Dashed
curves show envelopes of alternations. Extensive idiosyncratic
detail indicates that a crosstalk canceler based on curves of
Fig. 3(a) would be more tolerant of variations in listeners'
heads and positions.
18

naural technology would be implemented in the near


future as the industry begins to see how the technology
will help its practitioners reach their .goals more directly
and more easily. The eventual outcome of the infusion
of new technology may not be predicted with assurance,
but the prospects for a dramatic improvement in stereo
quality do appear bright.
7 ACKNOWLEDGMENT
We wish to thank the many persons who have listened
to our experimental transaural recordings and offered
their critical comments. We tried their patience with
recordings of differing equalizations, and some with
not-the-lowest noise floor, and their patience survived.
We would particularly like to thank those who offered
us playback facilities that happened to prove instructive
in regard to early reflections. Their patience was sometimes not rewarded by hearing the merits we claimed.
In other cases, our own ineptness left a bad impression.
We are grateful, also, for those listeners who delighted
us by being entirely enthusiastic.
We owe special thanks to Wade Bray of Jaffe Acoustics for providing us with digital tapes made with the
Aachen head. Our studies of these recordings impressed
us with the importance of reconsidering the whole
question of equalization.
Finally, y<e wish to thank those colleagues who
worked most directly with us, James Cunningham who
constructed our prototype circuits, and Richard Mintel
who made the first binaural recordings we used.
8 REFERENCES
[1] G. Kendall, W. Martens, and D. Freed, "Image
Model Reverberation from Recirculating Delays,"
presented at the 81st Convention of the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts), vol.
34, p. 1032 (1986 Dec.), preprint 2408.
[2] J. L. Bauck, "On Acoustical Specification of
Natural Stereo Imaging," M.Sc. thesis, University of
Illinois, Urbana ( 1979 Dec.); presented under authorship
of D. H. Cooper and J. L. Bauck at the 65th Convention
of the Audio Engineering Society, J. Audio Eng. Soc.
(Abstracts), vol. 28, pp. 368, 370 (1980 May), preprint
1616; and presented under authorship of J. L. Bauck
and D. H. Cooper at the 66th Convention of the Audio
Engineering Society, J. Audio Eng. Soc. (Abstracts),
vol. 28, p. 552 (1980 July/Aug.), preprint 1649.
[3] D. H. Cooper, "Problems with Shadowless Stereo
Theory: Asymptotic Spectral Status," J. Audio Eng.
Soc., vol. 35, pp. 629-642 (1987 Sept.).
[4] A. D. Blumlein, British patent 394,325 (1933
June 14); reprinted in J. Audio Eng. Soc., vol. 6, pp.
91-98, 130 (1958 Apr.).
[5] J. Eargle, Handbook of Recording Engineering
(Van Nostrand Reinhold, New York, 1986).
[6] G. Theile, "On the Standardization of the Frequency Response of High-Quality Studio Headphones,"
J. Audio Eng. Soc., vol. 34, pp. 956-969 (1986 Dec.).
J. Audio Eng. Soc., Vol. 37, No. 1/2, 1989 January/February

COOPER AND BAUCK

PAPERS

[7) M. C. Killion, "Equalization Filter for EardrumPressure Recording Using a KEMAR Manikin," J. Audio Eng. Soc., vol. 27, pp. 13-16 (1979 Jan./Feb.).
[8] B. S. Atal and M. R. Schroeder, "Apparent Sound
Source Translator," U.S. patent 3,236,949 (1966 Feb.
22).
[9] M. R. Schroeder and B. S. Atal, "Computer
Simulation of Sound Transmission in Rooms," IEEE
Conv. Rec., pt. 7, pp. 150-155 (1963).
[10] M. R. Schroeder, "Digital Simulation of Sound
Transmission in Reverberant Spaces," J. Acoust. Soc.
Am., vol. 47, pp. 424-431 (1970 Feb.).
[11] M. R. Schroeder, "Computer Models for Concert Hall Acoustics," Am. J. Phys., vol. 41, pp. 461471 (1973 Apr.).
[12] M. R. Schroeder, "Models of Hearing," Proc.
IEEE, vol. 63, pp. 1332-1350 (1975 Sept.).
[13] P. Damaske, "Head-Related Two-Channel
Stereophony with Loudspeaker Reproduction," J. Acoust.
Soc. Am., vol. 50, pt. 2, pp. 1109-1115 (Oct. 1971).
[14] T. Mori, G. Fujiki, N. Takahashi, and F. Maruyama, "Precision Sound-Image-Localization Technique Utilizing Multitrack Tape Masters," J. Audio
Eng. Soc. (Engineering Reports), vol. 27, pp. 32-38
(1979 Jan./Feb.).
[15] H. W. Gierlich and K. Genuit, "Processing Artificial-Head Recordings," J. Audio Eng. Soc. (Engineering Reports), vol. 37, this issue, pp. 35-40. Also,
W. Bray, private communication (1987 Nov.)
[16] E. L. Torick, A. Di Mattia, A. J. Rosenheck,

L. A. Abbagnaro, and B. B. Bauer, "An Electronic


Dummy for Acoustical Testing," J. Audio Eng. Soc.,
vol. 16, pp. 397-403 (1968 Oct.).

'
[17] B. B. Bauer, "StereophonicEarphonesandBi
naural Loudspeakers," J. Audio Eng. Soc., vol. 9, pp.
148-151 (1961 Apr.).
[18] H. MfZiller, "Cancellation of Crosstalk in Artificial-Head Recordings Reproduced through Loudspeakers," J. AudioEng. Soc., vol. 37, this issue, pp.
31-34.
[19] S. Mehrgardt and V. Mellert, "Transformation
Characteristics of the External Human Ear," J. Acoust.
Soc. Am., vol. 61, pp. 1567-1576 (1977 June).
[20] D. H. Cooper and J. L. Bauck, "Corrections
to L. Schwarz, 'On the Theory of _Diffraction of a Plane
Soundwave Around a Sphere' ['Zur Theorie der Beugung einer ebenen Schallwelle an der Kugel,' Akust.
Z., vol. 8, pp. 91-117 (1943)]," J. Acoust. Soc. Am.,
vol. 80, pp. 1793-1802 (1986 Dec.).
[21] E. L. Torick, private communication (1975
Nov.).
[22] H. Mertens, "Directional Hearing in Stereophony- Theory and Experimental Verification," EBU
Rev., pt. A, no. 92, pp. 146-168 (1965 Aug.).
[23] J. S. Russotti, T. P. Santoro, and G. B. Haskell,
"Proposed Technique for Earphone Calibration,''
J. Audio Eng. Soc., vol. 36, pp. 643-650 (1988 Sept.).
[24] M. A. Gerzon, "Ambisonrcs in Multichannel
Broadcasting and Video,'' J. Audio Eng. Soc., vol. 33,
pp. 859-871 (1985 Nov.).

THE AUTHORS

D. H. Cooper

J. L. Bauck

Duane H. Cooper was born in 1923. He earned a


Ph.D. in physics at California Institute of Technology
in 1955 and is currently associate professor of physics
and electrical engineering at the University of Illinois.
He teaches circuits, systems, modulation, random
processes, electrodynamics, and acoustics. He contributed to the theory of disk recording, invented the
skew-sampling method of tracing-error correction, and
contributed to the theory of multichannel stereo. He
made the first prototype Cooper Time Cube, and he
invented the first working version (UMX) of the
soundfield stereo system now called Ambisonics. Dr.
Cooper is a member of the American Physical Society,
the Acoustical Society of America, a senior member
of the Institute of Electrical and Electronics Engineers,
and a fellow and honorary member of the Audio Engineering Society. He has served the AES as governor,
vice president, and president. He is now vice president

of the AES Educational Foundation. Dr. Cooper holds


the Society's Emile Berliner Award and Gold Medal.

J. Audio Eng. Soc., Vol. 37, No. _1/2, 1989 J l1 l!_ry1Fe I'1J 'Y

Jerald L. Bauck was born in 1955. He earned a B.S.


degree in electrical engineering at Kansas State University in 1977 and an M. S. degree in electrical engineering at the University of Illinois in 1979. He is
currently an electrical engineering doctoral candidate
at the University of Illinois. He worked for five years
with Motorola' s government electronics group in
Scottsdale, Arizona, where he earned four patents and
the Motorola Engineering Award in 1983. Mr. Bauck
is a member of the Institute of Electrical and Electronics
Engineers and of the Audio Engineering Society. His
current interests include tomographic imaging in synthetic aperture radar and audio imaging.

19

You might also like