You are on page 1of 4

IMAGE-BASED SPATIALIZATION The second section of the piece focuses on

sampled drum sounds. The initial treatment of drums


Eric Lyon starts with a hi-hat, which is reminiscent of the opening
filtered noise sound. Very shortly, a full ensemble drum
School of Creative Arts texture emerges, and then gradually destabilizes as
Sonic Arts Research Centre individual attack times are subject to increasing random
Queen’s University Belfast deviation. Individual samples are assigned to fixed
Belfast BT7 1NN virtual locations, which are then articulated according to
pixel amplitudes. Random delays, ring modulation and
United Kingdom filtering are gradually introduced, complicating the
timbral space. The texture gradually thins until only the
ABSTRACT a JavaScript program that writes its output directly to a cowbell sample remains.
Csound score. Figure 2. An image from the opening of Spaced Images
An approach to spatialization is described in which the
with Noise and Lines. The work closes with two contrasting sections.
pixels of an image determine both spatial and other First comes the only slow section, in which pixels
attributes of individual elements in a multi-channel
5. MAPPING STRATEGIES articulate sinusoids of 10 seconds duration, with
musical texture. The application of this technique in the frequencies that glide slowly downward. The time
author’s composition Spaced Images with Noise and interval between line scans is approximately 500 ms.
The simplest mapping strategy involves routing a single
Lines is discussed in detail. The relationship of this The concluding section uses a line drawing to articulate
sound to each virtual location, scaled by the amplitude
technique to existing image-to-sound mappings is enveloped, filtered noise, with resonance frequencies
determined by the darkness of its corresponding pixel.
discussed. The particular advantage of modifying spatial locked to virtual locations. With only lines used in this
This is the duplication technique described by McGee
properties with image filters is considered. image, the spatial trajectories of this concluding passage
and Wright [6]. This strategy is applied to the first
sound heard in the piece, which is a synthetic noise with are exceedingly clear to the ear.
1. INTRODUCTION
a percussive envelope and a resonant peak. The changes
Image-based spatialization involves the mapping of an in the image with its erasures, noise and lines, produces 6. RELATED WORK
image to the spatial properties of a sound. Whereas a spatial result that fluctuates between a somewhat
existing literature tends to discuss image-based, or indeterminate sense of motion, and clear, dramatic Relatively little has been published on image-based
video-based, spatialization systems from the standpoint spatial trajectories. The use at the outset of a simple, spatialization, and much of it is focused on automated
of introducing compositional tools and software systems, spectrally stable sound focuses the listener’s attention mapping of video to space. There is also some
the primary contribution of this paper will be to discuss on spatial attributes. Successive mappings introduce unpublished work, which I have encountered informally.
the specific application of image-based spatialization to greater variety among the sounds. In one instance, the y- When I first decided to try these experiments, I
the creation of a recent computer music composition. axis is mapped to resonant frequency, so that the pitch discussed the idea with Shawn Greenlee, knowing of his
descends as the corresponding image is scanned expertise with the image-processing software Jitter.
2. IMAGE-BASED SPATIALIZATION downward. Several sections assign fixed resonant Shawn informed me that not only was what I proposed
frequencies to different virtual locations, while still rather easy to implement, but that he had already done
In the work described here, spatialization is achieved by some spatial mapping from video to audio in a stereo
implementing the darkness-to-amplitude mapping to
mapping pixel values to sound grains. Each horizontal Figure 1. A Max patch to convert an image to articulating sounds. The result is a spatialized chord, field, and reckoned that it would not be difficult to
line is virtually wrapped around the perimeter of a a Csound score (based on a design by Shawn Greenlee). when the image is noisy, and melodic when the image is extrapolate his work to a multi-channel context. Shawn
surround-sound speaker array. The technique for Spaced graciously shared his patch, which I then modified for
line-based. Both visual elements were generally
Images with Noise and Lines targets an eight-channel 4. SPACED IMAGES WITH NOISE AND LINES admixed in the source images. the current work. [2] I also recently learned that
surround speaker array, with four virtual sound source Augustus Leudar, an MA student at SARC, has worked
locations calculated using power panning across each Spaced Images with Noise and Lines was composed in In another section, enveloped sine waves are with video to space in a quadraphonic context, using
pair of adjacent speakers. Thus, this configuration calls 2011 and premiered at a Spatial Music Collective employed. The frequency for each sinusoidal grain is ambisonics for the spatialization, mapping a single
for images with 32 pixels along the x-axis, to be mapped concert at The Joinery in Dublin. All passages in the randomly chosen from a fixed, equal-tempered scale. In video frame to the geometric square defined by four
to 32 virtual sound locations circling the audience. The work were generated as described above, using images homage to Karlheinz Stockhausen, the frequency scale corner speakers. [3]
vertical dimension maps to time, and thus may be that the author created with the free image-processing is taken from his Studie II. [11] In this section, the pixel
arbitrarily determined for each image. The taller the amplitude controls two synthesis parameters: the Another technique considerably closer to the
software, The Gimp [1]. The images are comprised of
image, the more perimeter lines are converted to sonic amplitude and duration of a corresponding sinusoidal work described here maps image onto parameters other
noise, generated by The Gimp, and freehand, mouse-
moments, each with its own spatial profile. Scanning grain. This section was generated from an image that than space, often to frequency. Here could be mentioned
drawn lines. Prior to audio conversion, the images were
down the lines of the image advances time at a fixed superimposes two kinds of noise – a low-amplitude, Oramics[7] and Iannis Xenakis’s Upic system [5].
subject to various forms of image processing, such as
interval. (The time interval could be warped, rather than high-density noise, and a high-amplitude, very low- Examples in the digital domain include Christopher
convolution, erasure, and sheering. (See Figure 2.) Most
constant, if desired.) In a given line, the darkness of each density noise, with just a few pixels activated. The result Penrose’s HyperUpic [8] and Metasynth by UI Software
passages were generated from a single image, but in one
pixel represents the amplitude of an event, or grain, at is a constant, quiet, sort of thudding background [12].
case, sounds generated from two complementary images
the virtual spatial perimeter location associated with that were superimposed to create a two-voice spatial articulation (albeit spatially complex) from the low-
pixel. An image with a single black line drawn trajectory counterpoint. amplitude pixels, and a sporadic, bell-like surface
diagonally downward, with a pixel-width of one, would melody with clear spatial articulation from the high 7. ADVANTAGES OF IMAGE-BASED
result in a perimeter-panning spatialization effect. amplitude-pixels. Although sine waves are often SPATIALIZATION
difficult to resolve spatially, the single pixel mapping to
3. IMPLEMENTATION a fixed virtual location, combined with the percussive Working from an image provides many of the
attack envelope, locates the individual sinusoids starkly advantages of a graphical score, including the ability to
A Max patch is used to convert an image to a Csound easily specify gestures and density at multiple time
in different regions of the space.
score, which is then compiled into an 8-channel audio
file. (See Figure 1.) Each 32-pixel line is transformed
into a list of floating point values from 0-1, which drive
_200 _201
IMAGE-BASED SPATIALIZATION The second section of the piece focuses on
sampled drum sounds. The initial treatment of drums
Eric Lyon starts with a hi-hat, which is reminiscent of the opening
filtered noise sound. Very shortly, a full ensemble drum
School of Creative Arts texture emerges, and then gradually destabilizes as
Sonic Arts Research Centre individual attack times are subject to increasing random
Queen’s University Belfast deviation. Individual samples are assigned to fixed
Belfast BT7 1NN virtual locations, which are then articulated according to
pixel amplitudes. Random delays, ring modulation and
United Kingdom filtering are gradually introduced, complicating the
timbral space. The texture gradually thins until only the
ABSTRACT a JavaScript program that writes its output directly to a cowbell sample remains.
Csound score. Figure 2. An image from the opening of Spaced Images
An approach to spatialization is described in which the
with Noise and Lines. The work closes with two contrasting sections.
pixels of an image determine both spatial and other First comes the only slow section, in which pixels
attributes of individual elements in a multi-channel
5. MAPPING STRATEGIES articulate sinusoids of 10 seconds duration, with
musical texture. The application of this technique in the frequencies that glide slowly downward. The time
author’s composition Spaced Images with Noise and interval between line scans is approximately 500 ms.
The simplest mapping strategy involves routing a single
Lines is discussed in detail. The relationship of this The concluding section uses a line drawing to articulate
sound to each virtual location, scaled by the amplitude
technique to existing image-to-sound mappings is enveloped, filtered noise, with resonance frequencies
determined by the darkness of its corresponding pixel.
discussed. The particular advantage of modifying spatial locked to virtual locations. With only lines used in this
This is the duplication technique described by McGee
properties with image filters is considered. image, the spatial trajectories of this concluding passage
and Wright [6]. This strategy is applied to the first
sound heard in the piece, which is a synthetic noise with are exceedingly clear to the ear.
1. INTRODUCTION
a percussive envelope and a resonant peak. The changes
Image-based spatialization involves the mapping of an in the image with its erasures, noise and lines, produces 6. RELATED WORK
image to the spatial properties of a sound. Whereas a spatial result that fluctuates between a somewhat
existing literature tends to discuss image-based, or indeterminate sense of motion, and clear, dramatic Relatively little has been published on image-based
video-based, spatialization systems from the standpoint spatial trajectories. The use at the outset of a simple, spatialization, and much of it is focused on automated
of introducing compositional tools and software systems, spectrally stable sound focuses the listener’s attention mapping of video to space. There is also some
the primary contribution of this paper will be to discuss on spatial attributes. Successive mappings introduce unpublished work, which I have encountered informally.
the specific application of image-based spatialization to greater variety among the sounds. In one instance, the y- When I first decided to try these experiments, I
the creation of a recent computer music composition. axis is mapped to resonant frequency, so that the pitch discussed the idea with Shawn Greenlee, knowing of his
descends as the corresponding image is scanned expertise with the image-processing software Jitter.
2. IMAGE-BASED SPATIALIZATION downward. Several sections assign fixed resonant Shawn informed me that not only was what I proposed
frequencies to different virtual locations, while still rather easy to implement, but that he had already done
In the work described here, spatialization is achieved by some spatial mapping from video to audio in a stereo
implementing the darkness-to-amplitude mapping to
mapping pixel values to sound grains. Each horizontal Figure 1. A Max patch to convert an image to articulating sounds. The result is a spatialized chord, field, and reckoned that it would not be difficult to
line is virtually wrapped around the perimeter of a a Csound score (based on a design by Shawn Greenlee). when the image is noisy, and melodic when the image is extrapolate his work to a multi-channel context. Shawn
surround-sound speaker array. The technique for Spaced graciously shared his patch, which I then modified for
line-based. Both visual elements were generally
Images with Noise and Lines targets an eight-channel 4. SPACED IMAGES WITH NOISE AND LINES admixed in the source images. the current work. [2] I also recently learned that
surround speaker array, with four virtual sound source Augustus Leudar, an MA student at SARC, has worked
locations calculated using power panning across each Spaced Images with Noise and Lines was composed in In another section, enveloped sine waves are with video to space in a quadraphonic context, using
pair of adjacent speakers. Thus, this configuration calls 2011 and premiered at a Spatial Music Collective employed. The frequency for each sinusoidal grain is ambisonics for the spatialization, mapping a single
for images with 32 pixels along the x-axis, to be mapped concert at The Joinery in Dublin. All passages in the randomly chosen from a fixed, equal-tempered scale. In video frame to the geometric square defined by four
to 32 virtual sound locations circling the audience. The work were generated as described above, using images homage to Karlheinz Stockhausen, the frequency scale corner speakers. [3]
vertical dimension maps to time, and thus may be that the author created with the free image-processing is taken from his Studie II. [11] In this section, the pixel
arbitrarily determined for each image. The taller the amplitude controls two synthesis parameters: the Another technique considerably closer to the
software, The Gimp [1]. The images are comprised of
image, the more perimeter lines are converted to sonic amplitude and duration of a corresponding sinusoidal work described here maps image onto parameters other
noise, generated by The Gimp, and freehand, mouse-
moments, each with its own spatial profile. Scanning grain. This section was generated from an image that than space, often to frequency. Here could be mentioned
drawn lines. Prior to audio conversion, the images were
down the lines of the image advances time at a fixed superimposes two kinds of noise – a low-amplitude, Oramics[7] and Iannis Xenakis’s Upic system [5].
subject to various forms of image processing, such as
interval. (The time interval could be warped, rather than high-density noise, and a high-amplitude, very low- Examples in the digital domain include Christopher
convolution, erasure, and sheering. (See Figure 2.) Most
constant, if desired.) In a given line, the darkness of each density noise, with just a few pixels activated. The result Penrose’s HyperUpic [8] and Metasynth by UI Software
passages were generated from a single image, but in one
pixel represents the amplitude of an event, or grain, at is a constant, quiet, sort of thudding background [12].
case, sounds generated from two complementary images
the virtual spatial perimeter location associated with that were superimposed to create a two-voice spatial articulation (albeit spatially complex) from the low-
pixel. An image with a single black line drawn trajectory counterpoint. amplitude pixels, and a sporadic, bell-like surface
diagonally downward, with a pixel-width of one, would melody with clear spatial articulation from the high 7. ADVANTAGES OF IMAGE-BASED
result in a perimeter-panning spatialization effect. amplitude-pixels. Although sine waves are often SPATIALIZATION
difficult to resolve spatially, the single pixel mapping to
3. IMPLEMENTATION a fixed virtual location, combined with the percussive Working from an image provides many of the
attack envelope, locates the individual sinusoids starkly advantages of a graphical score, including the ability to
A Max patch is used to convert an image to a Csound easily specify gestures and density at multiple time
in different regions of the space.
score, which is then compiled into an 8-channel audio
file. (See Figure 1.) Each 32-pixel line is transformed
into a list of floating point values from 0-1, which drive
_200 _201
scales. Enhancing this basic functionality is the ability 9. AESTHETIC RESULTS [3] Leudar, A. Personal email communication.
to bring the full power of sophisticated image 2012.
In “Space-form and the Acousmatic Image,” Denis
processing software to bear on the creation of spatial [4] Kendall, G. and Cabrera, A. “Why Things
Smalley defines space-form as “An approach to musical
properties. Modern spatialization software packages Don’t Work: What You Need to Know About
form, and its analysis, which privileges space as the
such as Zirkonium [9] nonetheless still conceptualize Spatial Audio” Proceedings of the International
primary articulator. Time acts in the service of space”
panning as the linear motion from one location to Computer Music Conference. Huddersfield,
[10]. In the same article, Smalley makes a distinction
another. With image-based spatialization, a line could UK, 2011.
between interventionist and naturalist approaches. “With
gradually increase in thickness, darkness, or bifurcate. [5] Marino, G., Serra, MH., and Raczinski, JM,
the interventionist approach the composer’s hand is in
Patterns can be complicated with the addition of noise. “The UPIC System: Origins and Innovations,”
evidence, and the stamp of the technology and
Segments of trajectories can be blurred, or subjected to a Perspectives of New Music 31(1): 258-269.
techniques is apparent in the kind of material and the
wide range of other image filtering process. This allows 1993.
way it is manipulated, whereas in the naturalist work
the composer to shape the evolution of spatial properties
there will be some attempt to hide techniques, and avoid [6] McGee, R. and Wright, M. “Sound Element
in a detailed manner, since it is easy to apply filters to
exposing technological signifiers” [10]. Spatialization,” Proceedings of the
small segments of a larger spatial trajectory. Practically
Spaced Images with Noise and Lines could International Computer Music Conference,
speaking, the relationship between the image and the
accurately be described as an interventionist space-form Huddersfield, UK, 2011.
resulting spatial effect is rarely isomorphic; the cautions
composition. From the outset, the spatial aspect of the
about spatial composition provided by Kendall and [7] Oram, D. “Oramics,”
work is kept in the compositional foreground through
Cabrera remain in effect [4]. However, often enough the http://daphneoram.org/oramarchive/oramics/
dramatic panning trajectories. Sonic elements are
results of the process are predictable, and musically [8] Penrose, C. “HyperUpic” (1991, no longer
deliberately kept simple throughout in order to highlight
effective. More importantly, the spatial results described publicly available)
the spatial aspect. Different approaches to texture
here would be difficult or impossible to achieve using http://www.music.princeton.edu/~penrose/soft/
creation, and the admixture of noise and lines in the
more traditional control interfaces for spatialization, [9] Ramakrishnan, R. “Zirkonium”.
images, provide a broad variety of spatial experiences to
which tend to focus on the locations or trajectories of http://ima.zkm.de/zirkonium/ZirkoniumManual.
the audience. Perhaps most importantly, the spatial
coherent sonic objects within a desired spatial pdf
structures are integral to the music. The complexity of
environment. [10] Smalley, D. “Space-form and the Acousmatic
the spatial relations is built into the musical structure; it
Image” Organised Sound 12(2): 107-126. 2007.
could not have been imposed externally through
8. COMPARISONS WITH VIDEO-BASED [11] Stockhausen, K. Studie II: Nr. 3 Elektronische
performed manual diffusion techniques.
SPATIALIZATION Studien, Universal Edition, 1956. Vienna.
[12] UISoftware. MetaSynth.
Spatialization with video would seem to have certain 10. FUTURE WORK
“http://www.uisoftware.com/MetaSynth”
advantages over image-based spatialization. For
Only greyscale image processing was used in Spaced
example, there is a more direct conceptual mapping
Images with Noise and Lines. It would be interesting to
between an individual video frame, and a surround
use the separate RGB channels to affect different
speaker array, at least for 2D surround sound arrays.
aspects of synthesis, and/or spatialization. The use of
Video lends itself to real-time processing in a way that
image masks could facilitate the creation of various
is unavailable for image-based spatialization. However,
forms of spatial polyphony. The mapping described here
the temporal association between the video and audio is
makes the most sense in the context of 2D speaker
a mixed blessing. Images used for spatialization are not
arrays. A rethink of the spatial image mapping strategy
to be viewed during performance, and therefore their
may be appropriate for 3D arrays, such as the Klangdom
aesthetic qualities need not be considered. (Of course it
at ZKM, or the Sonic Lab at SARC.
is possible to create video-based spatial sound in which
the video itself is not presented to the audience.
11. CONCLUSIONS
However, every example I have encountered of video-
based audio spatialization has been presented as an
integrated audio-visual composition.) In audio-visual The technique of image-based spatialization was
work, the imagery must be as aesthetically strong as the discussed in the context of the author’s composition
sound, and the two must work well together. This Spaced Images with Noise and Lines. The creative
requirement could put constraints on the content of the potentials of the technique were shown to be somewhat
video. A situation in which both audio and video are different from more traditional trajectory-based
presenting the same information introduces the danger spatialization techniques. Given the growing number of
of aesthetic redundancy. Applying filters to modify multi-channel performance venues, and opportunities
spatial behavior is much easier and more direct for for composers and audiences to explore spatial music,
image processing than for video processing. techniques such as image-based spatialization may be of
Nonetheless, video-based spatialization can still be quite increasing interest to composers of computer music.
effective, especially when drawing on the advantages of
live performance. In any case, despite some overlap in 12. REFERENCES
the manner of data processing, video-based [1] GNU. “The Gimp.” http://www.gimp.org/.
spatialization and image-based spatialization remain two [2] Greenlee, S. Personal email communication.
distinctly different approaches to spatialization. 2011.

_202 _203
scales. Enhancing this basic functionality is the ability 9. AESTHETIC RESULTS [3] Leudar, A. Personal email communication.
to bring the full power of sophisticated image 2012.
In “Space-form and the Acousmatic Image,” Denis
processing software to bear on the creation of spatial [4] Kendall, G. and Cabrera, A. “Why Things
Smalley defines space-form as “An approach to musical
properties. Modern spatialization software packages Don’t Work: What You Need to Know About
form, and its analysis, which privileges space as the
such as Zirkonium [9] nonetheless still conceptualize Spatial Audio” Proceedings of the International
primary articulator. Time acts in the service of space”
panning as the linear motion from one location to Computer Music Conference. Huddersfield,
[10]. In the same article, Smalley makes a distinction
another. With image-based spatialization, a line could UK, 2011.
between interventionist and naturalist approaches. “With
gradually increase in thickness, darkness, or bifurcate. [5] Marino, G., Serra, MH., and Raczinski, JM,
the interventionist approach the composer’s hand is in
Patterns can be complicated with the addition of noise. “The UPIC System: Origins and Innovations,”
evidence, and the stamp of the technology and
Segments of trajectories can be blurred, or subjected to a Perspectives of New Music 31(1): 258-269.
techniques is apparent in the kind of material and the
wide range of other image filtering process. This allows 1993.
way it is manipulated, whereas in the naturalist work
the composer to shape the evolution of spatial properties
there will be some attempt to hide techniques, and avoid [6] McGee, R. and Wright, M. “Sound Element
in a detailed manner, since it is easy to apply filters to
exposing technological signifiers” [10]. Spatialization,” Proceedings of the
small segments of a larger spatial trajectory. Practically
Spaced Images with Noise and Lines could International Computer Music Conference,
speaking, the relationship between the image and the
accurately be described as an interventionist space-form Huddersfield, UK, 2011.
resulting spatial effect is rarely isomorphic; the cautions
composition. From the outset, the spatial aspect of the
about spatial composition provided by Kendall and [7] Oram, D. “Oramics,”
work is kept in the compositional foreground through
Cabrera remain in effect [4]. However, often enough the http://daphneoram.org/oramarchive/oramics/
dramatic panning trajectories. Sonic elements are
results of the process are predictable, and musically [8] Penrose, C. “HyperUpic” (1991, no longer
deliberately kept simple throughout in order to highlight
effective. More importantly, the spatial results described publicly available)
the spatial aspect. Different approaches to texture
here would be difficult or impossible to achieve using http://www.music.princeton.edu/~penrose/soft/
creation, and the admixture of noise and lines in the
more traditional control interfaces for spatialization, [9] Ramakrishnan, R. “Zirkonium”.
images, provide a broad variety of spatial experiences to
which tend to focus on the locations or trajectories of http://ima.zkm.de/zirkonium/ZirkoniumManual.
the audience. Perhaps most importantly, the spatial
coherent sonic objects within a desired spatial pdf
structures are integral to the music. The complexity of
environment. [10] Smalley, D. “Space-form and the Acousmatic
the spatial relations is built into the musical structure; it
Image” Organised Sound 12(2): 107-126. 2007.
could not have been imposed externally through
8. COMPARISONS WITH VIDEO-BASED [11] Stockhausen, K. Studie II: Nr. 3 Elektronische
performed manual diffusion techniques.
SPATIALIZATION Studien, Universal Edition, 1956. Vienna.
[12] UISoftware. MetaSynth.
Spatialization with video would seem to have certain 10. FUTURE WORK
“http://www.uisoftware.com/MetaSynth”
advantages over image-based spatialization. For
Only greyscale image processing was used in Spaced
example, there is a more direct conceptual mapping
Images with Noise and Lines. It would be interesting to
between an individual video frame, and a surround
use the separate RGB channels to affect different
speaker array, at least for 2D surround sound arrays.
aspects of synthesis, and/or spatialization. The use of
Video lends itself to real-time processing in a way that
image masks could facilitate the creation of various
is unavailable for image-based spatialization. However,
forms of spatial polyphony. The mapping described here
the temporal association between the video and audio is
makes the most sense in the context of 2D speaker
a mixed blessing. Images used for spatialization are not
arrays. A rethink of the spatial image mapping strategy
to be viewed during performance, and therefore their
may be appropriate for 3D arrays, such as the Klangdom
aesthetic qualities need not be considered. (Of course it
at ZKM, or the Sonic Lab at SARC.
is possible to create video-based spatial sound in which
the video itself is not presented to the audience.
11. CONCLUSIONS
However, every example I have encountered of video-
based audio spatialization has been presented as an
integrated audio-visual composition.) In audio-visual The technique of image-based spatialization was
work, the imagery must be as aesthetically strong as the discussed in the context of the author’s composition
sound, and the two must work well together. This Spaced Images with Noise and Lines. The creative
requirement could put constraints on the content of the potentials of the technique were shown to be somewhat
video. A situation in which both audio and video are different from more traditional trajectory-based
presenting the same information introduces the danger spatialization techniques. Given the growing number of
of aesthetic redundancy. Applying filters to modify multi-channel performance venues, and opportunities
spatial behavior is much easier and more direct for for composers and audiences to explore spatial music,
image processing than for video processing. techniques such as image-based spatialization may be of
Nonetheless, video-based spatialization can still be quite increasing interest to composers of computer music.
effective, especially when drawing on the advantages of
live performance. In any case, despite some overlap in 12. REFERENCES
the manner of data processing, video-based [1] GNU. “The Gimp.” http://www.gimp.org/.
spatialization and image-based spatialization remain two [2] Greenlee, S. Personal email communication.
distinctly different approaches to spatialization. 2011.

_202 _203

You might also like