You are on page 1of 17

Aaron Lieb 02 . 19 .

09
Feature Descriptions
This document outlines the core functionalities of the ProZeuxis system, organized into

different categories that directly relate each feature to a physical aspects of live performance; Space,
Movement, Dynamics, Rhythm, Pitch, and Mood. Each group of features will be listed in a prioritized
order. The first item in each group represents a necessary feature, or “core functionality”, while each
subsequent feature may be categorized as “would be nice to have.” This list of features can be seen as
the larger picture of what the final system would be capable of, even if everything listed here can not be
implemented within the scope of thesis execution. By describing each feature individually, I am also
outlining the overall system architecture as modular components. These components can then be more
easily understood as they pertain to their counterparts in the more technical data model.

Space
Composition Projection – The primary feature of this system is the capability of projecting
composited visuals toward the performance area, while also presenting the user with
a completely different set of user interface (UI) graphics. Running visual elements for
both of these purposes within one application would be problematic in the area of
performance. A more efficient way to implement both the table UI and the projected
composition would be to run each as separate client applications. In this design, a
ProZeuxis server application would act a liaison by providing a common protocol for
communication between the two. Each of these components be defined and
abbreviated as:

Console Client (CC) - Runs the UI for the VJ performer


- Sends and receives messages with a ProZeuxis
Server application via socket communication

Presentation Client (PC) - Processes a visual composition based on


messages from the ProZeuxis Server
- Processes the composition using these messages
and a predefined Visual Bank (VB) of images,
movie clips, and procedural effects
- Sends the final composition to a dedicated
projector

ProZeuxis Server (PS) - Sends messages between the CC and PC via a


specified socket
- Acts as a data adapter for incoming camera and
audio feeds
- The PS will be able to analyze these inputs and
send the results as a simple message to all
clients listening on the appropriate socket

This client-server architecture will allow the system to better process visuals
through a dedicated Presentation Client. This design will allow the clients to run on
separate machines, allowing for a variety of possible configurations. For a setup
where the PC and CC are running on independent machines, it will be the PS
application's task to make sure that each client has a copy of the same Visual Bank
elements. A master copy of this Visual Bank, or (VB), containing the images, clips,
and procedural effects will be located on the same machines as the PS. As a user
connects to the PS, via an instance of the CC, they will have the ability to access the
server's file system and load up the desired VB elements that they wish to use for
that performance. The server will then make sure that both the CC and PC machines
(if not the same) have a copy of the VB data. This transaction should occur before a
performance begins. By prepping the system in this way, the clients can send
messages to one another describing how to manipulate this media without having to
send the media itself.
For the prototype system that will be developed for the first test
performance, these three components will most likely reside on the same physical
machine. However, nothing will be lost in terms of development time to design the
system in this way. The system will still be able to perform more efficiently than if all
three components were running concurrently in the same bloated application.
The following, are five possible implementations of the architecture as it
supports the functionality of “Composition Projection” via one or more instances of the
Presentation Client. Each example becomes more complex, but also more capable in
terms of allowing the PC to processes incoming messages and generate visuals.
1. Simple – Machine A: runs all three applications as separate processes

Fig. 1.1

2. Distributed – Machine A: runs Console Client


Machine B: runs ProZeuxis Server and Presentation Client

Fig. 1.2

3 Complex – Machine A,B,C: run one of the system applications each

Fig. 1.3
4. Multi-Console – Machine A: runs Console Client A
Machine B: runs Console Client B
Machine C: runs ProZeuxis Server and Presentation Client

Fig. 1.4

5 Multi-Presentation -Machine A: runs Console Client


Machine B: runs ProZeuxis Server and
Presentation Client-A
Machine C: runs Presentation Client-B

Fig. 1.5
From these examples, the potential for scaling the system's capabilities to
fit the needs of differently sized performances can be seen. The larger the demand
for visual output, the more equipment and visual performers may also be needed.
It may be necessary to clarify the significance of the components of Audio
and Video Inputs as shown in the diagram. They were added to emphasize the
design consideration concerning the way the system will process these inputs. It
shows all audio and video inputs being sent to the PS to be analyzed once in order
to distribute the results as simple messages to all connected CCs, thus saving
processing power.
The exception to this, according to fig. 1.3 and 1.5, would seem to be that
PCs not residing on the same physical machine as a PS receive their own split copy
of the video input. The purpose of this input is not to have the video analyzed by the
PC, but rather to make it more quickly accessible for visual effects that would project
any portion of the video back toward the stage such as a “jumbotron” effect.

Positive Space Guide – In many cases, performers and set designers want to be creative, and have
sets with elaborate or oddly shaped projection screens. For example, the Steely Dan,
“Think Fast Tour 2008” featured a screen comprised of large square sections pieced
together in a haphazard, pixelated arrangement (fig. 2.1). A Positive Space Guide
would be a tool that the VJ uses to address this issue of custom shaped screens. It
would function by allowing them to set up regions representative of these surfaces
before the performance begins.
Steely Dan performing “Josie”, 2008

Fig 2.1 To set up the guide, they will need to enter an initial calibration mode in
which the system begins by projecting no light at the stage (fig. 2.2). The user will
then use a set of touch sensitive drawing tools presented to them via the reacTable
console to block out simple polygons of a desired color to represent the positive
space of the projection screens. As they interactively tweak the positions of their

Fig 2.2 polygons, they will be able to observe the coverage that the polygons are creating as
they are projected toward the screens (fig 2.3).
This process can be repeated to block off different portions of a screen,
designating them as unique regions of positive space. The user could also designate
internal areas of an existing region to mark them as separate regions (fig 2.4). When
the process is complete, and all screens are sufficiently covered by the guide, the
calibration mode will be exited and the guide will be made accessible for the duration
of the performance.
Fig 2.3
This will allow the user to see on the reacTable interface where their visuals
will be projected, rather than needing to look back and forth between the console and
the stage each time they want to move a visual element's position. This guide would
also be a useful feature, as some generative effects for the system could be designed
to use these regions as a parameter. Such effects may be able to be “snapped” to
the boundary of a Positive Space Region so they appear to be contained within it.
Also, for less sophisticated content such as video clips, a Positive Space Region
could be applied as a master alpha channel before it is projected to the stage
Fig 2.4 (fig 2.5).
The capabilities gained by sectioning off a performance space using this
feature could allow for some interesting possibilities. This could be combined with the
previous concept of Composition Projection, in which the notion of multiple Projection
Clients and VJ performers were outlined as possibilities. Perhaps the projection

Fig 2.5 screen is a normal rectangular surface, covering the entirety of the stage behind the
performers. One VJ performer could section off shapes that fall behind the stage
performers as positive space. The second VJ performer could then calibrate their
Presentation Client, with its own projector, to use the first VJ's negative space, as
their region of positive space (fig 2.6). The positive space regions of the two
performers could even overlap to cause some areas of the screen to be projected on
twice for live compositing of images using multiple projectors (fig 2.7).

Fig. 2.6

Fig. 2.7
Movement
Live Motion Tracker – A cornerstone component of any system capable of extended reality effects, is the
ability to track motion in real-time. The system will address this by including a feature
with basic color and planar image tracking capabilities. There are already
existing libraries with techniques worked out for achieving this goal. The unique
feature will be how the tracker is able to be configured by the user and applied to
visual aspects of the composition. Possible usages of a Live Motion Tracking node
via these libraries include:

Color Tracking Average-to-Point – A color tracker could be configured to detect a


specific color value specified by the user. As an example, let's say the user
creates a tracker to detect a specific shade of blue to match the singer's
outfit. If the tracker is set to the “Average to
Point” setting, it will first detect tracking
“globs” from the live camera feed based on
where it sees close matches to the set color.
It will then determine the largest glob and
create a tracking point for it's center point.
Fig. 3.1 A tracker, created in this way, would then

be linked to a tangible node located on the reacTable. This node would have the
capability to share this (x,y) point parameter to any other node parameter that
accepts points. For example, lets say we also have a second node that is configured
as a generative visual that emits randomly colored bubbles from a center point

By default, this type of


effect would probably
take it's own x,y
position on the
reacTable to use as the
Fig. 3.3 emitter center point.

We could instead link this node's center point with the center point provided by the
tracker. To accomplish this, the two nodes would be slid next to one another. When
this is done, for any node, all compatible input and output parameters of each will
appear as small selection bubbles floating around the nodes (fig 3.3). The user could
then use their finger to drag the matching node parameters toward one another in
order to link them. In this case, we would be linking the Emitter Node's center
position with the Tracking Node's Color Average Point.
Linking the node parameters together in
this way would allow for automatic tracking
of the emitter node's center point to the
position of the singer (fig 3.4). The musical
performers would not need to worry about
being in a specific location in order to to
Fig. 3.4 properly hit an effect cue.
Color Tracking Vector Shape – The color tracking library is also capable of turning
color tracked data into a vector shape rather then a single point. Configuring this
kind of tracker would work much in the same way as the Average To Point, but
would have a vector shape as it's output parameter (fig 3.5). A vector shape
can be thought of as just another parameter that can be used to plug into other
effects. Just like the other example, the user would slide other generative effect
nodes toward a Tracking marker set to Vector Shape in order to assign any
compatible parameter.
For example, you might have an effect that is capable of using a vector
shape to draw a colored halo emitting from the shape's edge (fig 3.6). This
effect, too, would then be able to track where the halo should be drawn based
on the movements of the singer wearing bright blue clothing.

Fig. 3.5 Fig. 3.6

Planar Image Tracking – This tracking component is somewhat different from the
previous two color based tracking examples. It will require more effort to set up,
but will also provide Z-depth parameters for position and rotation in 3D space.
To accomplish this setup, the user would be required to specify an image area
that they wish to track as it is seen on the stage from the tracking camera. To
do the configuration, the tracked object would need to be somewhere on stage
where it can be clearly seen by the tracking camera (fig 3.7). The user would
Fig. 3.7
then capture a still frame from this camera, and use a drawing tool on the
reacTable console to draw a region of interest that they wish to detect with this
tracker (fig 3.8). A user would most likely want to set up tracking markers for
each image plane before the performance. These could include posters held by a
performer, graphic t-shirts, logos on amps, kick drums, guitar bodies, etc.
The benefit of this type of tracking over color trackers, as mentioned, is the
ability to determine x,y and z position and rotation information from the tracked
Fig. 3.8 objects. The collaboration of the Image Planar Tracking node with other nodes
would, again, work in the same way by sliding the nodes near one another and
connecting compatible parameters. The parameters of these trackers would be
ideal to connect to generative effect nodes that have a corner pinning capability,
or a 3D cameras component.
Imaging the band logo on a performers shirt is setup as a tracking plane. As
the performer moves and pivots in relationship to the tracking camera, a
generative effect with a 3D camera could automatically react to these
movements. If this effect were used with some basic game engine type 3D room
effect, it might be possible for the performer to use his body movements to
pantomime a walking actions in order to to navigate through this virtual space.
This could be a useful visual effect to tell a story using the visuals and song
simultaneously. For example, the singer could guide the audience though
different spaces that contain objects that remind him of a lost love, while also
singing about the things as they are seen.
These 3D rooms could even be designed to switch to a more realistic pre-
rendered video clip of an interactive environment if the performance required a
more intimate interaction with the objects in the virtual space. This could be a
nice addition to techniques currently being employed for large arena concerts.
Roger Waters,
Roger Waters' “Dark Side of the Moon Tour 2006-2008” featured a living
“Dark Side of the Moon Tour 2007”
background environment of a hyper realistic tabletop setup (fig 3.9). The scene,
Fig. 3.9
containing memorabilia symbolic of post WWII America, was comprised of
prerendered sequences of cigarette smoke, and a hand changing the radio dial.
As the radio changed it also triggered corresponding diagetic sounds that acted
as an introduction to several of the songs. The Planar Image Tracking system
could allow the position and movements of a performer to rudimentarily trigger
certain events contained in this type of living background environment.

Gestural Interaction – Of course, there would be limitations to the Image Planar Trackers. Trackers
are easily lost by the system when lighting and other conditions do not allow for
a clear view of the image region. For example, tracking markers on a performer's
chest would “disappear” each time they bend or turn away from the camera.
Also, some visual effects may work better if manipulated using some more
intelligent form of tracking. Because of these shortcomings, some situations
would probably be more suited to a type of Gestural Interaction.
Implementing a Gestural Interaction node would first require a number of
gesture event listeners to be defined. A gesture, in this sense, can be thought
of as any unique path made by a cursor defined in 2D space. An existing
gesture class, listens for a cursor to move in four gestural patterns; clockwise,
counter clockwise, horizontal shake, and vertical shake. Simple gestures, such
as these, are easily disambiguated from one another, which helps prevent
gestural misfires. Properly implementing just these four would already provide a
fair amount of possibility for a motion tracked performer.
In order to get this, “cursor” data to feed into our Gestural Tracker, however,
we would first need to define a motion capture technique that suits the needs of
the performance. One way would be to use one of our two, fore mentioned
tracking nodes to provide this x,y data. This could allow for a performer with a
uniquely colored glove and an aptly configured color tracker to deliver swirling or
shaking gestures toward the camera.
For a situation where the performer may want to be facing away from this
tracking camera, or would be in some other way obscured, an alternative live
motion capture technique could be used instead. The existing gesture class,
“ezGestures”, also contains a WiiMote© implementation that could be used as
an alternative. This setup of capturing audio, video, and also OSC data from
a WiiMote© would add another layer of complexity to an already complex project
description. This would most likely be outside of the scope of this thesis, but is
worth mentioning that WiiMote© data or any other alternative data source could
absolutely by included if the need happened to arise.
Assuming a user has successfully set up a Gestural Tracking node, and
completed the steps necessary for defining one of many possible cursor sources,
it would then be ready to be used with a generative patch. The Gestural
Tracking node's output parameter, in this case, would be somewhat unique
compared to others previously mentioned. The Gestural Analysis Node would
be capable of outputting the currently detected state of its predefined gestural
listeners. For short, we can call this a Gesture State Object (gsObj). The gsObj
would contain information about which gesture is currently being performed and
it's current state.
A generative patch, “ActorVate”, could utilize this data object in many
different ways. Animated sequences could start and stop based on which
gesture is detected. Colors could be instructed to lighten or darken with a
vertical shake gesture. A string of
sequentially performed gestures could
also fire off an action. If the performer
gestures clock wise, vertical shakes,
and stops in short amount of time, this
sequence may cause some specific,
preprogrammed visual response (fig 3.10). Fig. 3.10

Dynamics
Audio Level Detection – For these following three aspects of performance, I will focus on their
relationship to the audio signal that will be coming into the system from the
stage. The first of which, dynamics, involves the techniques for interpreting
decibel levels of this audio. First, however, it may be necessary to elaborate on
some general facts about how the system will retrieve this signal, as it will
directly relate to how the system will be used.
Since real time capture of an audio signal, on most computer systems
comes in the form of one master line-in, we will assume that ProZeuxis will,
at least support this one input signal. This will still be able to provide a useful
source of information about the audible qualities of the performance. A simple
setup may derive this audio signal from the final mix coming from the venue's
PA system, as provided by the audio engineer. For more control, and a broader
set of capabilities, however, it would be better to assume that the VJ would be
capturing two or more separate audio signals with the usage of a wireless
microphone system and a mixer connected to the ProZeuxis Server (fig 4.1).

Fig 4.1
This might seem like a strange thing to do, for a visual artist to set up their
own system of controlling individual audio signals. However, this would need to
be done in order to filter our audio, based on which audio source we would like
to focus our, “visual” attention. For a performance with multiple instrumentalists,
it would be beneficial to have access to an unmixed input of what each is
playing for a more precise system of audio detection. Because of the way audio
detection algorithms function, any visual patch attempting to use the audio data
would first need to convert it into a Fast Fourier Transform or an (FFT). The
FFT is a sampling of the audio signal overtime, in which the sound is
represented as a spectrum of sub-bands. With the simplest usage of our Audio
Detection node, with no extra parameters configured, we could send the entire
FFT stream directly to any visual patch expecting this as an incoming parameter.
There are countless visual patches that have their own methodology for using
the entire audio FFT to perform some visual effect. This is a fine approach in
its own right for specially coded effects, whose results could not be otherwise
achieved by linking it to our ProZeuxis style data nodes and providing more
simplistic data.
However, if every visual patch we had was using its own unique routine
for breaking down the FFT into a usable form, this would quickly become costly
on the processor and slow down the performance in more ways than one.
Instead, it would be better to allow the user to specify some extra parameters
that effectively set up beat detectors for specific ranges of frequencies. First you
would need to specify which band of the signal you wish to analyze for
beats, or sudden changes in volume. We can always detect the mixed signal
for this information. This kind of detection would react whenever the overall
audio signal exceeds a specified threshold. For honing in on a specific aspect
of the song, however, this simplistic approach may not be accurate enough.
For example, lets say we would like to set up a simple Audio Level Detector
that is configured to listen to a specific range of frequencies. This range could
be explicitly set by the user or selected from a preset. An existing library for
such audio detection, “minim”, already defines preset ranges to listen for kick,
snare, and hi-hat. With this, you could detect the kick drum range to feed into a
generative patch that would react to the kicks. However, without first filtering the
drummer's audio using the mixer, signal crosstalk from other musicians playing
sounds in that same “kick drum” defined frequency range would make your
desired visual effect perform quite miserably (fig 4.2).
Mixed Signal (cross talk) With one of the microphones located directly in front of the kick drum,
or even inside, there should not be a problem isolating this signal and sending
it to our desired visual effect by way of the properly configured Audio Level
Detection node (fig 4.3). The same preconditions will apply for the proper
linkage of this new type of node and a visual patch. First, we can identify that
Fig 4.2 this Audio Detection Node will be capable of suppling a boolean variable over
Kick Drum (isolated with mixer) time stating “a beat was detected now”, and any generative patch asking for this
type of “pulse” as an input parameter would be capable of linking to it.
The second way this exact node could communicate with a visualization
would be to also provide the decibel value that was detected in the specified
range, as opposed to a thresholded, on-off, beat. This could tell the patch that
Fig 4.3 there was a beat in our range, and also provide a parameter indicating how loud
it was. Since the process for arriving at both of these types of data are the
same, it might be best to leave it up to the user to specify which they would
like to send to their visual at the time of linkage. If we are telling the visual
patch that we want to send it the “pulse” data, it might say, “Great, now pick a
midi knob that will control the Threshold for this connection”. Any time the user
would want to adjust the, “sensitivity” of this beat detector, they would simply
need to refer back to their configured knob, slider, etc. and tweak it until it
seems to be reacting to their liking.
The second way of connecting the Audio Beat Detector, of course, would be
to supply the numerical data of these beats to a visual patch. With this kind of
connection, you would not necessarily need to have any other mechanism in
place to control a threshold value. It would not be impossible to imagine,
however, that some patches might be designed to use a threshold controller for
this kind of audio data. You could effectively control how much the detected beat
Saxophone
Kick Drum

data effects your patch by tweaking the threshold in this situation.


Guitar

At first glance, this approach might not seem like it would be capable of
Bass

handling multiple Beat Detection Nodes operating at the same time. This is
not entirely true, however. Although the cross talk issue that we mentioned does
put a damper on any chance absolute accuracy, the proposed setup would have
a built in way for combating this. Beings that each of our four separate
microphone signals are being run through a mixer, the VJ could equalize the
individual channels to limit each signals' dynamic frequency range (fig 4.4).
By limiting the frequencies that are capable of being produced by each mixer
Fig 4.4 channel though equalization, you can manipulate each signals' chances
of being picked up by this or that Beat Detection node, even after the signal has
been mixed. In this way, you would be able to limit cross talk, and achieve a
separation between frequencies for each instrument, or other audio signal. Once
this has been successfully accomplished, a multitude of Beat Detection Nodes,
checking beats on different ranges could be configured and working
independently of one another.

Rhythm
Tempo Detection – The function of the previous Audio Detection Node is an essential step
toward doing any other type of further audio analysis such as Tempo
Detection. Typically, detecting a tempo of a mixed composition is an
exceedingly difficult task, again due to issues involving cross talk. Since,
we will have the ability to somewhat isolate certain key musical signals,
however, it might be possible to achieve a satisfactory tempo detection.
A good place to start, would be to have a kick drum detecting node,
or some other element of the composition that is dictating the overall
pulse of the music isolated. This initially configured Beat Detection node
could then be used to for the secondary purpose of Tempo Detection.
Essentially, it would be an extra output parameter, whose entire purpose
would be to keep count of the beats provided to it through this beat
detection. Based on the count, it could calculate a relative Beat Per
Minute (bpm) that could be used in any visual patch requiring this type of
“tempo” parameter.
As a note, it would also be possible for a visual patch to have
its own method for determining a tempo or bpm just from a given Beat
Detection Node connection directly. However, sticking with my concerns
with this type of design, it would always be better to do this analysis
once, and have the ability to share it many times over.
As a an example, this type of parameter could be shared to serve
two visual patches in order to accomplish different effects. The first patch
could be a simple one, called “Clip Looper” that displays a looped video
clip. The bpm could be used to manipulate the overall frame “Rate” at
which the video plays. The second visual, “Small Pollocks”, could be a
procedural effect that draws splashes of paint that fade out at a specific,
configurable speed, or “Decay”. Both of these nodes, while visually
different, have a parameter that could utilize a “tempo” input (fig 5.1)
Fig 5.1

Pitch
Tonality Detection – Apart from Pulse, Velocity, and Tempo, there are also tonal qualities of sound
that can be approximately detected by a computer system. Tones or notes can be
determined by analyzing individual peaks of an audio signal over time. This process
is most easily accomplished when analyzing a single instrumental voice, and would
be more difficult or impossible to arrive at a satisfactory result when multiple
instrument voices, and percussion are mixed in. The function of audio isolation using
the mixer would be a necessary step to accomplish any kind of meaningful tone
detection.
The tones or pitches that would be approximated for this function would be
converted into the standard MIDI pitch value mappings. The methods for converting
a digitized audio signal into MIDI values is yet another well established software
ability. Also, the MIDI standard is already a conducive system for assigning pitch
values to accomplish some task. For our purposes, this would be one more
parameter calculated and shared within a Beat Detection Nodes.
This Pitch sub-node could provide the user with its own set of menu operations
whose purpose would be to translated pitch into other types of parameters other than
just the raw MIDI value integers. For example, you might want to translate the
pitches coming in as 2D points, 3D vectors, or Colors. This is one more example of
calculating something once, and allowing it to be propagated down to multiple effects,
which can be useful in a variety of circumstances.
Lets say the user has a few visual effects that allows for a color to be patched
in as a parameter. In the course of the performance they might want to have more
than one of these visuals in their composition at the same time. They may also want
the color parameters of both to remain in sync with one another. To accomplish this,
they could configure their Pitch value to do color translation. The menu to configure
this may look much like a ramp creator in other graphics packages. They could
choose colors to be mapped along a color ramp which would then be used to map
each pitch value to a corresponding color (fig 6.1)

Fig 6.1
.

Mood
Visual Categorization – To make the process of going through necessary steps of selecting colors, video
clips, and other parameterized settings, and continue to maintain a composition that
looks cohesive, it might be useful to have a node in charge of visual categorization.
We could call this this Mood Node. Essentially, the Mood Node would be a place
that a user could configure presets of node configurations that might express a
specific emotion. For example, a user might want to create a “Calming” preset.
Through the Mood Node's menus, they will be presented with options representing
different common parameters found in the systems' other nodes (fig 7.1).
For color, the user could pick a pallet of
stand-by or default colors that they find calm.
Maybe this would be a scheme of several
shades of light blues, yellows, and greens.
Since video clips are also used as pluggable
parameters for visual nodes, these may also
need to be marked as “Calming”. The next
menu could bring the user through a menu
Fig 7.1 that previews each clip in the system's
Visual Bank so that they may include or
exclude it from this Mood preset. Third, it might also be useful to have the user set
some default values concerning other parameters' settings. As an example, take the
threshold value for beat detected. For “Calming” this should probably be set to a low
sensitivity. It may seem more calming for a minimal amount of beat detection to
prevent jumpiness in visuals that utilize this parameter.
As long as the Mood Node is present and configured, each time the user is
configuring the options for other effects, they will not have to explicitly fill in these
colors, clips and other parameters. If their calming preset is selected, a node asking
for a color or ramp would pick a random color from their predefined pallet of calming
colors. Similarly threshold, sensitivity, and easing parameters within the settings of
Audio and Video connection nodes could be “Calmed” down by taking on the settings
dictated by the Mood Node. This could occur during the time that a visual node is
being set up for the first time, or even effect linked chains of existing nodes.
The addition of this feature would greatly increase the amount of automation
involved when switching between effects and configuring new ones on-the-fly during
a performance. This could allow a user to easily create a cohesive look to their
performance composition while at the same time, aiding to their ability to be more
improvisational. This feature would bring to the surface, the benefits of using a
tangible user interface for a more tactile and organic user experience.

You might also like