Streaming Real-Time Audio

Streaming Real-Time Audio
Yogi Dandass
Yogi Dandass is a researcher at Mississippi State Universitys High Performance
Computing Laboratory and is a lecturer at the Department of Computer Science. He
designs software libraries in C++ for numerical computing, artificial intelligence, and
real-time communication middleware. He can be reached at yogi@cs.msstate.edu.
Windows isnt a real-time operating system, and the internet provides no real-time
packet delivery, so how can you handle a real-time task like internet audio
communication? This article shows how you can attack the problem with
compression and redundancy.
Many Windows-based personal computers connected to the Internet are also

multimedia capable. The combination of networking and multimedia has enabled an
explosion of World Wide Web content that is far richer than was possible with static
text and images. There is now a plethora of Web-based services that provide movies,
music, and other multimedia content. This article describes how you can digitize live
audio at one computer, transmit the audio data over the Internet, and play back the
audio at a remote computer in real time. By real time, I mean that the audio is
digitized, transmitted, and played back in a continuous stream, allowing users at a
pair of computers to engage in interactive conversation.
It is a common misconception that real time is equivalent to fast processing. While

fast processing is helpful, the usefulness of the result in a real-time application
depends on timely delivery as well as accuracy. In other words, each task has a
deadline associated with it and must complete within the deadline; late completion
has undesirable consequences. Ideally, in the real-time multimedia application
described here, the audio data should arrive at the remote machine exactly when it is
time for the data to be played back.
Unfortunately, the Internet does not provide guarantees for timely delivery of data.
Furthermore, while the audio devices in PCs are capable of smooth audio capture
and playback, Windows (which isnt a real-time operating system) does not provide
guarantees that the applications threads will be scheduled in a manner conducive to
producing on-time results. The non-real-time nature of the operating system and
internetworking introduces delays of unpredictable magnitude (jitter) during the
delivery of audio data. Therefore, I implement a simple jitter control mechanism in this
application to reduce the unpleasant gaps that inevitably occur in the audio stream.
Dataflow Overview
Figure 1 depicts the flow of data in the application. The application supplies a number
of data buffers to the audio capture device (waveIn). The device fills each buffer with
digitized data and returns it to the application. The application inserts the filled buffer
into the transmission queue, and after the data is put on the network, the application
returns the emptied buffer to the waveIn device, starting the process all over again.
At the playback computer, packets of audio data are received from the network into
buffers taken from the Free List. The application inserts the filled buffers into the
Playback List. The Playback List is used to assemble the buffers in the proper order
needed for playback. After ordering the buffers, the application hands the buffers to
the playback device, waveOut. After playing the data in the buffers, waveOut returns
the buffers to the application for reinsertion into the Free List.
Figure 1: Streaming audio overview
In this application, on-time delivery of data is more important than error-free delivery,
so I chose to avoid using the connection-oriented Transmission Control Protocol
(TCP). TCP guarantees error-free and ordered delivery of data packets by repeatedly
transmitting a damaged or lost packet before delivering subsequent packets. These
repeated transmissions could cause significant delays in packet delivery, making jitter
control difficult.
The User Datagram Protocol (UDP), on the other hand, makes no guarantees about
the delivery of packets. Packets can be delivered in a damaged condition and out of
sequence. Some packets may not be delivered at all if the network is congested.
Most importantly, each packet is routed independently, and a datagram packet is
transmitted regardless of whether the previous packet was delivered successfully. As
long as only a small percentage of packets are damaged or lost while traversing the
Internet, UDP/IP is well suited for this application.
To provide high-quality audio in the face of unbounded delays and lost datagrams
over the Internet, I send a copy of the previous datagrams audio data with each new
datagram. Under this scheme, the audio is played accurately even when every other
packet is dropped. However, this redundancy does increase the bandwidth
requirements of the application and can be removed when operating under
constrained bandwidth conditions.
mysound.cpp (Listing 1) contains the C++ source for a sample program that can run
simultaneously on two computers for a full-duplex conversation. The executable must
be linked with Windows socket library (wsock32.lib or ws2_32.lib) and the
multimedia library (winmm.lib). To test the program on a single computer, you can
have the program send packets to an echo server (port 7) and play back the echoed
packets. soundeco.cpp (Listing 1) contains the source code for an echo server that
drops random packets at a user specified rate.
Audio Acquisition
Typically, audio is sampled at 8.0KHz, 11.025KHz (telephone quality), 22.05KHz

(radio quality), and 44.1KHz (CD-ROM quality). CD-ROM quality audio in stereo with
16-bit resolution per sample generates 176,400 bytes per second of audio data. Even
audio acquired at an 8KHz sample rate with 8-bit resolution produces 8,000 bytes of
data per second. These large volumes of data far exceed the data transmission
bandwidth provided by most modems used for Internet access. Therefore, high-
quality data compression algorithms are required in order to minimize the bandwidth
requirements of multimedia applications.
There are a number of audio compression/decompression (codec) drivers with

varying audio quality and data compression characteristics available for Win32. In
this application, I use the GSM 6.10 codec. This codec is designed specifically for
mobile voice applications and uses the relatively low 8KHz sampling rate and a
lossy compression algorithm. Lossy compression implies that the input and output
audio signals will not be identical. However, GSMs 1,625 bytes-per-second data
stream is well suited to bandwidth-limited applications. If you plan to transmit music
and have a higher bandwidth network available, you may prefer to use a different
codec. PCM, for example, digitizes in a variety of sample rates with 8- or 16-bit
resolution in mono or stereo without performing any compression.
You open a waveIn device and request a particular codec and a set of audio
acquisition characteristics (sample rate, resolution, channels, etc.) by passing a
WAVEFORMATEX structure to waveInOpen(). The WAVEFORMATEX structure
holds the attributes common to all codecs. If the codec requires additional
information, the WAVEFORMATEX structure is included as the first member of a
codec-specific wave format structure. GSM, for example, needs the number of
samples per data block and therefore uses the GSM610WAVEFORMAT structure.
See winmm.h for additional examples of wave format structures.
In mysound.cpp (Listing 1), I initialize the wave input device in the OnConnect()
method of CSoundDialog. I request a GSM codec with a sample rate of 8Khz, 1
channel (mono), 1,625 bytes-per-second data rate, and a block alignment of 65. The
block alignment specifies the smallest quantity of data the codec can process at a
time. The GSM-specific field, samples-per-block, is set to 320. This combination of
fields specifies that the codec will produce 65-byte blocks of data, 25 times per
second (1,625/65 = 25) and each 65-byte block will contain 320 samples. The bits-
per-sample field is determined by GSM and therefore is set to zero. Note that each
65-byte block contains 40 milliseconds of audio data.
The set of 25 blocks holds one second of audio, allowing sufficient capacity to handle
any jitter in data acquisition and transmission. In other words, if the transmission of a
filled buffer is delayed for a second, all the other buffers will have audio data ready to
transmit, but the waveIn device will be starved of buffers in which to put new data.
However, delays greater than a few hundred milliseconds will have already reduced
audio quality to such an extent that the loss of data caused by transmission delays
greater than one second will not be a significant factor.
In waveInOpen(), I also specify that the waveIn device shall send messages to the
applications window to report status information and return filled buffers to the
application. (Alternatively, a callback function or thread can also be used for this
purpose.) The window receives MM_WIM_OPEN, MM_WIM_CLOSE, and
MM_WIM_DATA messages when waveIn is opened, closed, and returns filled
buffers to the application, respectively. waveInOpen() returns a handle to the wave
input device that is used in subsequent wave input functions.
The MM_WIM_OPEN message is handled in OnWimOpen(). In this function, I

prepare 25 input buffers, each corresponding to one 65-byte block of audio data. The
input buffer class CSendBuffer contains data members m_WaveHeader and
m_Data of type WAVEHDR and XMITDATA, respectively. The WAVEHDR structure
(declared in winmm.h) identifies a data block to the wave audio functions (i.e., the
pointer to WAVEHDR is passed to and returned by the audio devices). The contents
of the XMITDATA structure (declared in mysound.h) are transmitted to the remote
computer; it contains the 65-byte block of audio data, the size of the audio data, and
the blocks sequence number. The remote computer uses the sequence number to
play back the audio blocks in proper order.
A buffer must be prepared before it is given to the waveIn device.

CSendBuffer::Prepare() initializes the buffers WAVEHDR member before calling
waveInPrepareHeader(). The WAVEHDRs dwBufferLength member is set to the
blocks size (65 bytes), and the lpData member is set to point to the location where
the audio data is to be stored in XMITDATA. The dwUser member can contain any
DWORD-compatible data useful to the application and is ignored by the audio
functions. I use this member to store the pointer to the instance of CSendBuffer that
contains this WAVEHDR and use the member to extract the CSendBuffer instance
when the waveIn device returns the WAVEHDR. All the other members of
WAVEHDR are initialized to zero.
After preparing the 25 buffers, I call waveInAddBuffer() repeatedly to add all of the
buffers WAVEHDRs to the waveIn devices queue. Once the waveIn device fills a
data block, it sets the number of bytes of audio data returned in the associated
WAVEHDRs dwBytesRecorded member and returns the WAVEHDR to the
application window. The resulting MM_WIM_DATA message is handled in the
applications OnWimData() method.
OnWimData() extracts the pointer to the CSendBuffer from the returned

WAVEHDRs dwUser member. I immediately call the buffers Unprepare() member
to clean up the preparation performed previously. Next, I complete the buffers
XMITDATA member by copying the number of bytes returned from WAVEHDR, and
copying the sequence number of the data block from the applications m_dwOutSeq
member. Finally, the buffer is queued for transmission only if the device is not being
closed. Since all unfilled buffers in the input devices queue are also returned to the
application in MM_WIM_DATA messages when the device is reset, I use the
applications m_fInClosing member to prevent the buffers from being transmitted
during shutdown.
Data Transmission
I initialize Windows sockets implementation in WinMain() just before displaying the

applications dialog box. The user specifies the IP address of the remote computer in
the dialog box edit control before clicking on the Connect button. In OnConnect(), I
create the UDP socket and call connect() to establish the remote address for socket
communication. Since this is a datagram socket, no connection is actually
established, but I can now use send() and achieve minor optimization by not having
to specify the destination address required in every call to the sendto() function. For
this application, I have hard-coded the sockets to communicate over port 1500.
In OnWomOpen(), which is invoked when the playback device is opened, I call

WinSocks WSAAsyncSelect(). This function puts the socket into a non-blocking
asynchronous mode such that whenever a datagram packet can be sent to or read
from the remote machine, a user-defined message (WM_USR_SOCKIO in this case)
is posted to the application. I cannot use blocking mode for socket I/O because the
application must process messages related to audio capture, playback, and the user
interface in a timely manner.
The WM_USR_SOCKIO message results in calls to OnSockWrite() or

OnSockRead() depending on the event indicated in the message parameter. In
OnSockWrite(), I remove a buffer from the head of the m_qpXmitBufs queue and
transmit the contents of its XMITDATA member to the remote computer. After the
buffers data is sent, the buffer is again prepared and given to the waveIn device for
additional audio data.
Packet Reception and Jitter Control
OnSocketRead() handles packet reception; its called when WinSock notifies the
application of incoming datagrams. In this function, I take a buffer (an instance of
class CRecvBuffer) from m_lpFreeBufs (the list of free buffers), receive the data
from the socket into the buffer, and insert the filled buffer into m_lpPlayBufs (the
playback list). A total of 50 instances of class CRecvBuffer are available to the
application.
The code maintains the buffers in the playback list in ascending sequence number
order. If a buffer with the sequence number of the newly received buffer already
exists in the playback list, the new buffer is a delayed duplicate datagram and is
discarded. Because there is a remote chance that a burst of delayed datagrams
arrives faster than they can be played back, the code discards datagrams when no
free buffers are available.
Since I cannot rely on the non-real-time Internet and Windows to deliver packets to
the waveOut device precisely when they are needed for playback, I wait for 400
milliseconds (i.e., 10 data buffers) before starting to play the audio stream. This way,
I build some laxity into the packet delivery time (i.e., when packet n is being played
back, packet n+10 is being received). If a packet does not arrive 200 milliseconds
before it is required for playback (i.e., packet n+5 has not arrived by the time packet n
is being played), I assume it is lost and prepare the next available buffer for playback.
In the application, JitterControl() is called whenever a datagram is received and also

when a buffer is returned to the application after playback. The m_iCountOut
member tracks the number of buffer blocks currently queued with the waveOut
device (i.e., committed for playback). m_dwSeqExp contains the sequence number
of the data block expected at the head of the playback list. m_fDelay indicates
whether data blocks are to be accumulated for the 400- millisecond playback delay. If
m_iCountOut is 0 (indicating that there is no data available for playback in the
waveOut device), m_fDelay is set, indicating that incoming buffers should be
accumulated instead of being played. I resume playback when there are at least 10
buffers in the playback list.
If the sequence number of the block at the head of the playback list is the expected
sequence number, JitterControl() removes the buffer from the Playback list and
gives it to the waveOut device for playback. If the expected block is missing, but the
next sequential block is available (i.e., the sequence number of the block at the head
of the list is equal to m_dwSeqExp+1), the data for the expected block is recovered
in RecoverPrevData() from the redundant copy in the next data block.
If m_iCountOut is less than five and the sequence number of the block at the head
of the playback list is not equal to m_dwSeqExp (i.e., the expected block has not
arrived yet), the application cannot wait any longer for the late block. The expected
block is assumed to be lost, and the next available buffer in the playback list
regardless of sequence number, is given to the waveOut device for playback.
If several consecutive buffers are available in the Playback list, they are all processed
for playback in JitterControl(). JitterControl() updates m_dwSeqExp to reflect the
next expected buffer at the head of the Playback list, and increments m_iCountOut
whenever a buffer is given to the waveOut device.
Audio Playback
OnConnect() opens the waveOut device in a manner similar to the waveIn device.
Opening the waveOut device generates a WOM_OPEN message that is handled by
WomOpen(); it initializes the m_lpFreeBufs list and m_iCountOut. In
JitterControl(), every playback buffer is prepared by calling its Prepare() method
before being given to the waveOut device. The preparation process of CRecvBuffer
is essentially identical to the preparation of CSendBuffer.
After the waveOut device plays back the data in the buffer, the buffer is returned to
the application in the WOM_DONE message, handled by WomDone(). WomDone()
extracts the pointer to the associated CRecvBuffer instance from the dwUser
member of the returned WAVEHDR structure. It then calls the buffers Unprepare()
method, inserts the buffer into the list of free buffers, and decrements m_iCountOut.
Program Termination
Before exiting, the application must stop the audio capture and playback process,
retrieve all enqueued buffers from the waveIn and waveOut devices, and close the
devices. Exiting before the devices are closed can cause the multimedia subsystem
to hang. However, the devices cannot be closed until they have returned all the
buffers to the application. Therefore, in OnCancel(), I record the users termination
request in m_fExiting and call OnDisconnect().
In OnDisconnect(), I close the socket, reset the devices, and set m_fOutClosing
and m_fInClosing to indicate that the devices are being closed and that
OnWomDone() and OnWimData() should not prepare the returned data buffers for
reuse. OnWomDone() decrements m_iCountOut as each enqueued playback buffer
is returned by waveOut. Once the count reaches zero, the waveOut device is
closed, resulting in the MM_WOM_CLOSE message that is handled by
OnWomClose(). Audio capture is similarly terminated by decrementing m_iCountIn
and closing the device when all of the buffers are returned to OnWimData(). Closing
the waveIn device generates the MM_WIM_CLOSE message that is handled by
OnWimClose(). In OnWomClose() and OnWimClose(), if both devices are marked
as closed, and application termination is requested, EndDialog() is called to exit the
application.
Further Enhancements
Because of limited space, I have used a relatively simple form of jitter control and
packet recovery. A more sophisticated approach would entail sending more than one
redundant copy of audio data over several packets. The redundant copies can be of
reduced quality, created with aggressive lossy compression algorithms, in order to
reduce the network bandwidth required. Sending smaller size packets when silence is
detected can further reduce bandwidth requirements. This is a particularly effective
technique because conversation is mostly half- duplex.
In this application, I make very little effort to synchronize the two computers. High-
quality applications can use TCP to send status information to each other in order to
indicate the extent of end-to-end delays, percent of packet loss due to network
congestion, and the termination of the remote end. Also, a complete application will
provide controls to allow users to select an input source (microphone, line-in, or CD-
ROM) and to set the playback volume. In this application, you can use the volume
control multimedia accessory application supplied with Windows for this purpose.
Finally, this application demonstrates that a Windows application can deliver
multimedia content over the Internet with controllable jitter. Furthermore, it is also
possible to devise error recovery techniques that deliver adequate quality. This
application also shows that the end-to- end delay can be kept sufficiently small so as
to make interactive conversation feasible.
Listing 1 (mysound.cpp)
// mysound.cpp by Yoginder Dandass, September 2000

// Win32 dialog based application to demonstrate the
// full-duplex capture and playback of live audio
// over the Internet in real-time. This application
// allows full-duplex conversation if the audio devices
// at the two computers are capable of simultaneous
// capture and playback.
#include <winsock2.h>
#include <windows.h>
#include <windowsx.h>
#include <stdlib.h>
#include <mmsystem.h>
#include <mmreg.h>
#include <list>
#include <queue>
#include "mysndrc.h"
#define WM_USR_SOCKIO (WM_USER+1) // Socket notification msg

#define PORT_NUMBER 1500 // Socket port number
#define BLOCK_ALIGN 65 // Min block size for GSM
#define BLOCK_SIZE (BLOCK_ALIGN) // Multiple of BLOCK_ALIGN
#define AVG_BYTES_PER_SEC 1625 // GSM data rate
// 1 second worth of blocks
#define NUM_BLOCKS ((AVG_BYTES_PER_SEC / BLOCK_SIZE))
// For jitter control
#define THRESHOLD 10 // Delay threshold
#define PLAYBACK_THRESHOLD 5 // Playback threshold
typedef WORD T_BSIZE; // Type for size of block

typedef struct {
DWORD m_dwSeq;
T_BSIZE m_nSize;
T_BSIZE m_nSizeP;
BYTE m_abData[BLOCK_SIZE];
BYTE m_abDataP[BLOCK_SIZE];
} XMITDATA;
class CSendBuffer {
public:
WAVEHDR m_WaveHeader; // wave header for the buffer
XMITDATA m_Data; // Data block to be transmitted over UDP
MMRESULT Prepare(HWAVEIN hWaveIn) { // Prepare for playback

ZeroMemory(&m_WaveHeader, sizeof(m_WaveHeader));
m_WaveHeader.dwBufferLength = BLOCK_SIZE;
m_WaveHeader.lpData = (char*)(m_Data.m_abData);
m_WaveHeader.dwUser = (DWORD)this;
return waveInPrepareHeader(hWaveIn, &m_WaveHeader,
sizeof(m_WaveHeader));
}
MMRESULT Unprepare(HWAVEIN hWaveIn) {

return waveInUnprepareHeader(hWaveIn, &m_WaveHeader,
}
MMRESULT Add(HWAVEIN hWaveIn) {

return waveInAddBuffer(hWaveIn, &m_WaveHeader,
}
};
class CRecvBuffer {
public:
WAVEHDR m_WaveHeader;
XMITDATA m_Data;
MMRESULT Prepare(HWAVEOUT hWaveOut) {

ZeroMemory(&m_WaveHeader, sizeof(m_WaveHeader));
m_WaveHeader.dwBufferLength = BLOCK_SIZE;
m_WaveHeader.lpData = (char*)(m_Data.m_abData);
m_WaveHeader.dwUser = (DWORD)this;
return waveOutPrepareHeader(hWaveOut, &m_WaveHeader,
}
MMRESULT Unprepare(HWAVEOUT hWaveOut) {

return waveOutUnprepareHeader(hWaveOut, &m_WaveHeader,
}
MMRESULT Add(HWAVEOUT hWaveOut) {

return waveOutWrite(hWaveOut, &m_WaveHeader,
}
};
typedef std::queue<CSendBuffer*> CSendBufQ;

typedef std::list<CRecvBuffer*> CRecvBufL;
typedef CRecvBufL::iterator CBufLIter;
class CSoundDialog {
protected:
HWND m_hWnd; // Dialog handle
bool m_fInClosing; // Stopping wave capture?
bool m_fOutClosing; // Stopping playback?
HWAVEIN m_hWaveIn; // Handle to capture device
HWAVEOUT m_hWaveOut; // Handle to playback device
CSendBuffer m_aInBlocks[NUM_BLOCKS]; // Capture bufs
CRecvBuffer m_aOutBlocks[NUM_BLOCKS*2]; // Playback bufs
T_BSIZE m_nPrevSize; // Size of previous data block
BYTE m_abPrevData[BLOCK_SIZE]; // Copy of block
SOCKET m_Socket; // UDP socket
struct sockaddr_in m_SockAddr; // Remote address
DWORD m_dwOutSeq; // Sequence counter
int m_iCountIn; // Items in capture queue
int m_iCountOut; // Items in playback queue
DWORD m_dwSeqExp; // Sequence of next out buffer
CRecvBufL m_lpPlayBufs; // List of playback buffers
CRecvBufL m_lpFreeBufs; // List of free recv buffers
CSendBufQ m_qpXmitBufs; // Transmission queue
bool m_fDelay; // In delay mode?
bool m_fExiting; // Shutting down?
void Report(char *pszBuffer) {

HWND hWndEdit = GetDlgItem(m_hWnd, IDC_EDIT_DETAILS);
SendMessage(hWndEdit, EM_SETSEL, 64000, 64000);
SendMessage(hWndEdit, EM_REPLACESEL, FALSE,
(long)pszBuffer);
return;
}
BOOL OnInit(HWND hWnd) {

// Initialize various member variables
m_hWnd = hWnd;
m_fExiting = false; // Not exiting
m_fInClosing = m_fOutClosing = true; // Devices are closed
m_hWaveIn = 0; // Capture device
m_hWaveOut = 0; // Playback device
m_fDelay = false; // Playback delay off for now
EnableWindow(GetDlgItem(hWnd, IDC_BUTTON_DISCONNECT),
FALSE);
return TRUE;
}
void OnCancel() {
if ((m_hWaveOut != 0) || (m_hWaveIn != 0)) {
OnDisconnect();// Close socket/devices before exiting
m_fExiting = true; // Set exit indicator
}
else
EndDialog(m_hWnd, 0); // Exit if devices closed
}
void OnConnect() {
char szIPAddress[128];
unsigned long ulAddrIP;
struct hostent *pHostEnt;
GSM610WAVEFORMAT WaveFormatGSM;
MMRESULT mmRC;
ZeroMemory(&m_SockAddr, sizeof(m_SockAddr));
m_nPrevSize = 0; // Initialize size of previous buffer
// Obtain remote host's IP address

GetDlgItemText(m_hWnd, IDC_EDIT_REMOTEIPADDR, szIPAddress,
sizeof(szIPAddress));
ulAddrIP = inet_addr(szIPAddress);
if (ulAddrIP != INADDR_NONE) // Is it dotted decimal?
memcpy(&(m_SockAddr.sin_addr), &ulAddrIP,
sizeof(m_SockAddr.sin_addr));
else { // Use DNS to get IP address
pHostEnt = gethostbyname(szIPAddress);
if (pHostEnt == NULL) {
MessageBox(m_hWnd, "Error resolving remote name",
"Error", MB_OK | MB_ICONSTOP);
return;
}
memcpy(&(m_SockAddr.sin_addr), pHostEnt->h_addr,
pHostEnt->h_length);
}
// Create and bind the local socket to a port

m_Socket = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
m_SockAddr.sin_family = AF_INET;
m_SockAddr.sin_port = htons(PORT_NUMBER);
bind(m_Socket, (sockaddr*)&m_SockAddr, sizeof(m_SockAddr));
// Set the remote address for future communication
connect(m_Socket, (struct sockaddr*)&m_SockAddr,
sizeof(m_SockAddr));
// Open wave capture and playback device for GSM 6.10
WaveFormatGSM.wfx.wFormatTag = WAVE_FORMAT_GSM610;
WaveFormatGSM.wfx.nChannels = 1;
WaveFormatGSM.wfx.nSamplesPerSec = 8000;
WaveFormatGSM.wfx.nAvgBytesPerSec = 1625;
WaveFormatGSM.wfx.nBlockAlign = 65;
WaveFormatGSM.wfx.wBitsPerSample = 0;
WaveFormatGSM.wfx.cbSize = 2;
WaveFormatGSM.wSamplesPerBlock = 320;
mmRC = waveOutOpen(&m_hWaveOut, (UINT)WAVE_MAPPER,

(LPWAVEFORMATEX)&(WaveFormatGSM.wfx),
(DWORD)m_hWnd, (DWORD)NULL, CALLBACK_WINDOW);
if (mmRC != MMSYSERR_NOERROR)
Report("Error opening wave playback device\r\n");
else
m_fOutClosing = false;
mmRC = waveInOpen(&m_hWaveIn, (UINT)WAVE_MAPPER,

(LPWAVEFORMATEX)&(WaveFormatGSM.wfx),
(DWORD)m_hWnd, (DWORD)NULL, CALLBACK_WINDOW);
if (mmRC != MMSYSERR_NOERROR)
Report("Unable to open wave input device\r\n");
else {
m_fInClosing = false;
waveInStart(m_hWaveIn);
}
if (!(m_fInClosing && m_fOutClosing)) {

// If at least one of the devices was started
EnableWindow(GetDlgItem(m_hWnd, IDC_BUTTON_CONNECT),
FALSE);
EnableWindow(GetDlgItem(m_hWnd, IDC_BUTTON_DISCONNECT),
TRUE);
}
}
void OnDisconnect() {
if (m_hWaveOut != 0) { // do if playback device is open
m_fOutClosing = true;
// Disable notification and close socket
WSAAsyncSelect(m_Socket, m_hWnd, 0, 0);
closesocket(m_Socket);
// Reset playback and close if all buffers are returned
waveOutReset(m_hWaveOut);
// Needed because we can be in delay mode
if (m_iCountOut == 0)
waveOutClose(m_hWaveOut);
}
if (m_hWaveIn != 0) { // do if capture device is open
m_fInClosing = true;
waveInReset(m_hWaveIn);
if (m_iCountIn == 0)
waveInClose(m_hWaveIn);
}
return;
}
void OnWimData(WAVEHDR *pHdrWave) {

CSendBuffer *pAudioBuffer; // pointer to the wave buffer
XMITDATA *pXmitData; // ptr to the portion to send
m_iCountIn--;
pAudioBuffer = (CSendBuffer*)(pHdrWave->dwUser);
// Unlink the buffer from the capture device
pAudioBuffer->Unprepare(m_hWaveIn);
if (!m_fInClosing) {
pXmitData = &(pAudioBuffer->m_Data);
// Set the buffer data size, sequence, redundant data
pXmitData->m_nSize = (T_BSIZE)(pHdrWave->
dwBytesRecorded);
pXmitData->m_dwSeq = m_dwOutSeq++;
pXmitData->m_nSizeP = m_nPrevSize;
memcpy(pXmitData->m_abDataP, m_abPrevData, m_nPrevSize);
// Save a copy of data to send with next packet
m_nPrevSize = pXmitData->m_nSize;
memcpy(m_abPrevData, pXmitData->m_abData, m_nPrevSize);
// add to the transmission queue
m_qpXmitBufs.push(pAudioBuffer);
OnSocketWrite(); // Try to send queued buffers
}
else { // close is requested, don't recycle
// If all buffers have been returned, close the device
if (m_iCountIn == 0)
waveInClose(m_hWaveIn);
}
}
void OnSocketWrite() {
CSendBuffer *pBuffer;
if (m_fInClosing) return; // Don't transmit when closing

while (!m_qpXmitBufs.empty()) { // Send as many as you can
pBuffer = m_qpXmitBufs.front();
// Send data over socket
if (send(m_Socket, (char*)&(pBuffer->m_Data),
sizeof(XMITDATA), 0) == SOCKET_ERROR) {
Report("Error sending data\r\n");
break; // Stop when the UDP buffers fill up
}
// Remover & recycle the sent buffer
m_qpXmitBufs.pop();
pBuffer->Prepare(m_hWaveIn);
pBuffer->Add(m_hWaveIn);
m_iCountIn++;
}
}
void JitterControl() {
CRecvBuffer *pBuffer;
if (m_fDelay) {
if (m_lpPlayBufs.size() >= THRESHOLD) {
// Start playback if enough buffers received
Report("Delay off\r\n");
m_fDelay = false;
for (int i = 0; i < THRESHOLD; i++) {
pBuffer = m_lpPlayBufs.front();
m_lpPlayBufs.pop_front();
if (pBuffer->m_Data.m_dwSeq == (m_dwSeqExp+1)) {
// Recover from previous if missing buffer
RecoverPrevData(pBuffer);
i++;
pBuffer->Prepare(m_hWaveOut);
pBuffer->Add(m_hWaveOut);
} else {
}
m_iCountOut++;
m_dwSeqExp = pBuffer->m_Data.m_dwSeq + 1;
}
}
return;
}
if (m_iCountOut == 0) {
// Start delay mode if we run out of buffers
m_fDelay = true;
Report("Delay on\r\n");
return;
}
for (;;) { // Playback as many as possible without gaps

if (m_lpPlayBufs.empty())
return;
pBuffer = m_lpPlayBufs.front();
if (pBuffer->m_Data.m_dwSeq == (m_dwSeqExp+1)) {
// Recover missing block
RecoverPrevData(pBuffer);
m_dwSeqExp++;
}
if (pBuffer->m_Data.m_dwSeq == m_dwSeqExp) {
// This is the expected buffer -- playback
m_iCountOut++;
m_dwSeqExp = pBuffer->m_Data.m_dwSeq + 1;
m_lpPlayBufs.pop_front();
continue;
}
if (m_iCountOut < PLAYBACK_THRESHOLD) {
// Playback next buffer regqrdless of seq#
// because we are short of data
m_dwSeqExp = pBuffer->m_Data.m_dwSeq;
Report("skipping\r\n");
continue;
}
break;
}
}
void RecoverPrevData(CRecvBuffer *pBuffer) {

CRecvBuffer *pBufferP;
if (m_lpFreeBufs.empty()) { // Fail if no free buffers

Report("Recovery failed (buffer overrun)\r\n");
return;
}
pBufferP = m_lpFreeBufs.front();
m_lpFreeBufs.pop_front();
pBufferP->m_Data.m_nSize = pBuffer->m_Data.m_nSizeP;
memcpy(pBufferP->m_Data.m_abData,
pBuffer->m_Data.m_abDataP,
pBufferP->m_Data.m_nSize);
pBufferP->Prepare(m_hWaveOut);
pBufferP->Add(m_hWaveOut);
m_iCountOut++;
}
void OnWomDone(WAVEHDR *pHdrWave) {

// Playback done -- Unprepare buffer and add to free list

pBuffer = (CRecvBuffer*)(pHdrWave->dwUser);
pBuffer->Unprepare(m_hWaveOut);
m_iCountOut--;
m_lpFreeBufs.push_back(pBuffer);
if (!m_fOutClosing)
JitterControl(); // Do jitter control if not exiting
else if (m_iCountOut == 0)
waveOutClose(m_hWaveOut);
}
void OnSocketRead() {
XMITDATA *pData;
if (m_fOutClosing) // Ignore data if playback is closing

return;
if (m_lpFreeBufs.empty()) { // Overflow
XMITDATA Data;
recv(m_Socket, (char*)&Data, sizeof(Data), 0);

Report("No free buffers (discarding block)\r\n?");
return;
}
pBuffer = (CRecvBuffer*)(m_lpFreeBufs.front());
pData = &(pBuffer->m_Data);
if (recv(m_Socket, (char*)pData,
sizeof(*pData), 0) == SOCKET_ERROR)
Report("Error receiving data\r\n");
else {
if (pData->m_dwSeq == 0)
m_dwSeqExp = 0; // Reset the expected sequence
if (pData->m_dwSeq >= m_dwSeqExp) {

CBufLIter Iter;
// Search for appropriate position
for (Iter = m_lpPlayBufs.begin();
Iter != m_lpPlayBufs.end(); Iter++)
if ((*Iter)->m_Data.m_dwSeq == pData->m_dwSeq)
return; // Duplicate buffer - don't insert
else if ((*Iter)->m_Data.m_dwSeq >
pData->m_dwSeq)
break; // Found the insertion point!
// Remove from Free list Insert into Playback list

m_lpFreeBufs.pop_front();
m_lpPlayBufs.insert(Iter, pBuffer);
JitterControl();
}
}
}
void OnWimOpen() {
m_dwOutSeq = 0; // reset sequence for sent blocks
m_iCountIn = 0; // reset count of data blocks in queue
for (int i = 0; i < NUM_BLOCKS; i++) {
// prepare and add blocks to capture device queue
m_aInBlocks[i].Prepare(m_hWaveIn);
m_aInBlocks[i].Add(m_hWaveIn);
m_iCountIn++;
}
}
void OnWimClose() {
m_hWaveIn = 0;
if (m_hWaveOut == 0) { // If both devices are closed
FALSE);
TRUE);
if (m_fExiting)
EndDialog(m_hWnd, 0);
}
}
void OnWomOpen() {
m_iCountOut = 0;
m_dwSeqExp = 0;
for (int i = 0; i < NUM_BLOCKS*2; i++) { // Setup free list
m_aOutBlocks[i].Prepare(m_hWaveOut);
m_lpFreeBufs.push_back(&(m_aOutBlocks[i]));
}
WSAAsyncSelect(m_Socket, m_hWnd, WM_USR_SOCKIO,
FD_READ | FD_WRITE); // Non-blocking socket
}
void OnWomClose() {
m_hWaveOut = 0;
if (m_hWaveIn == 0) { // If both devices are closed
FALSE);
TRUE);
if (m_fExiting)
EndDialog(m_hWnd, 0);
}
}
public:
BOOL static CALLBACK SoundDialogProc(HWND hWnd, UINT uMsg,
WPARAM wParam, LPARAM lParam) {
CSoundDialog *pSoundDlg;
pSoundDlg = (CSoundDialog *)GetWindowLong(hWnd, DWL_USER);

switch(uMsg) {
case MM_WIM_DATA:
pSoundDlg->OnWimData((WAVEHDR*)lParam); break;
case MM_WOM_DONE:
pSoundDlg->OnWomDone((WAVEHDR*)lParam); break;
case WM_USR_SOCKIO:
if (WSAGETSELECTEVENT(lParam) == FD_READ)
pSoundDlg->OnSocketRead();
if (WSAGETSELECTEVENT(lParam) == FD_WRITE)
pSoundDlg->OnSocketWrite();
break;
case WM_COMMAND:
if (GET_WM_COMMAND_CMD(wParam, lParam) == BN_CLICKED) {
switch (GET_WM_COMMAND_ID(wParam, lParam)) {
case IDCANCEL:
pSoundDlg->OnCancel();
break;
case IDC_BUTTON_CONNECT:
pSoundDlg->OnConnect();
break;
case IDC_BUTTON_DISCONNECT:
pSoundDlg->OnDisconnect();
break;
}
}
break;
case MM_WIM_OPEN:
pSoundDlg->OnWimOpen(); break;
case MM_WIM_CLOSE:
pSoundDlg->OnWimClose(); break;
case MM_WOM_OPEN:
pSoundDlg->OnWomOpen(); break;
case MM_WOM_CLOSE:
pSoundDlg->OnWomClose(); break;
case WM_INITDIALOG:
SetWindowLong(hWnd, DWL_USER, lParam);
pSoundDlg = (CSoundDialog *)lParam;
return pSoundDlg->OnInit(hWnd);
}
return 0;
}
};
int _stdcall WinMain(HINSTANCE hInst, HINSTANCE hPrevInst,

LPSTR szCmdLine, int iShow) {
CSoundDialog SoundDlg;
WORD wVersionRequested;
WSADATA wsaData;
if (hPrevInst) return -1;

wVersionRequested = MAKEWORD(1, 1);
WSAStartup(wVersionRequested, &wsaData); // setup WinSock
// start dialog window
DialogBoxParam(hInst, "SoundDlg", NULL,
SoundDlg.SoundDialogProc, (long)&SoundDlg);
WSACleanup(); // cleanup WinSock
return 0;
}
//End of File
Listing 2 (soundeco.cpp an echo server simulating packet errors)
// soundeco.cpp by Yoginder Dandass, September 2000

// Console mode socket application to echo audio datagrams.
// This application will bind to any user specified port
// (default 1500) and will drop a user specified percent
// of datagrams. You can set the port number to zero (0)
// in order to have the system assign a port.
//
// Calling convention: soundeco.exe <drop rate (%)> [<port#>]
// Examples:
// soundeco 5 drop rate 5%, port 1500
// soundeco 15 2000 drop rate 10%, port 2000
// soundeco 0 0 drop rate 0%, system assigned port
//
#include <winsock2.h>
#include <windows.h>
#include <stdlib.h>
#include <iostream>
typedef struct {
DWORD dwSeq;
WORD bSize;
WORD bSizeP;
BYTE abData[200000];
} SOUND_BUFFER;
int main(int argc, char* argv[]) {

struct sockaddr_in SockAddrRead;
int iSockUDP;
int iSockAddrReadLen;
short int sPort;
SOUND_BUFFER Buffer;
double dDropRate;
WORD wVersionRequested;
WSADATA wsaData;
/* setup windows sockets */

wVersionRequested = MAKEWORD( 1, 1 );
if (WSAStartup(wVersionRequested, &wsaData ) != 0) {
std::cout<<"Error initlializing Windows sockets \n";
return -1;
}
if (argc < 2) {
std::cout << "Usage: " << argv[0] <<
" <drop rate %> [<port #>]\n";
return -1;
}
// Bind to port 1500 by defualt

if (argc < 3)
sPort = 1500;
else
sPort = atoi(argv[2]);
dDropRate = ((double)atoi(argv[1]) / 100.0);

if (dDropRate > 0.99)
std::cout << "Drop rate should ideally be < 10%\n";
iSockUDP = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);

if (iSockUDP == INVALID_SOCKET) {
std::cout << "Error creating socket\n";
return -1;
}
iSockAddrReadLen = sizeof(SockAddrRead);
memset(&(SockAddrRead), '\0', iSockAddrReadLen);
SockAddrRead.sin_port = htons(sPort); // port setting
SockAddrRead.sin_family = AF_INET; // Internet address
if (bind(iSockUDP,(const struct sockaddr*)&SockAddrRead,

iSockAddrReadLen) == SOCKET_ERROR) {
std::cout << "Error binding socket\n";
return -1;
}
if (getsockname(iSockUDP, (struct sockaddr*)&SockAddrRead,

&iSockAddrReadLen) == SOCKET_ERROR) {
std::cout << "Error getting socket info\n";
return -1;
}
std::cout << "Sound Echo bound to port: "

<< ntohs(SockAddrRead.sin_port)
<< " with drop rate of: " << dDropRate << "\n";
for (;;) {
int iRecvLen;
iSockAddrReadLen = sizeof(SockAddrRead);
iRecvLen = recvfrom(iSockUDP, (char*)&Buffer,
sizeof(Buffer), 0,
(struct sockaddr*)&SockAddrRead,
&iSockAddrReadLen);
if (iRecvLen == SOCKET_ERROR) {
std::cout << "Error receiving data\n";
break;
} else {
// drop some % of the packets
if (rand() < (dDropRate * RAND_MAX)) {
std::cout << "dropping Seq: " << Buffer.dwSeq
<< "\t Size: " << (int)(Buffer.bSize)
<< "\n";
} else {
if (sendto(iSockUDP, (char*)&Buffer, iRecvLen, 0,
(struct sockaddr*)&SockAddrRead,
iSockAddrReadLen) == SOCKET_ERROR) {
std::cout << "Error sending data\n";
break;
}
std::cout << "Seq: " << Buffer.dwSeq << "\t Size: "
<< (int)(Buffer.bSize)
<< "(" << (int)(Buffer.bSizeP) << ")\n";
}
}
}
closesocket(iSockUDP); // close socket

WSACleanup(); // cleanup WinSock
return 0;
}
//End of File

Streaming Real-Time Audio - Yogi Dandass

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Streaming Real-Time Audio - Yogi Dandass

Uploaded by

Copyright:

Available Formats

Many Windows-based personal computers connected to the Internet are also

It is a common misconception that real time is equivalent to fast processing. While

Typically, audio is sampled at 8.0KHz, 11.025KHz (telephone quality), 22.05KHz

There are a number of audio compression/decompression (codec) drivers with

The MM_WIM_OPEN message is handled in OnWimOpen(). In this function, I

A buffer must be prepared before it is given to the waveIn device.

OnWimData() extracts the pointer to the CSendBuffer from the returned

I initialize Windows sockets implementation in WinMain() just before displaying the

In OnWomOpen(), which is invoked when the playback device is opened, I call

The WM_USR_SOCKIO message results in calls to OnSockWrite() or

Packet Reception and Jitter Control

In the application, JitterControl() is called whenever a datagram is received and also

// mysound.cpp by Yoginder Dandass, September 2000

#define WM_USR_SOCKIO (WM_USER+1) // Socket notification msg

typedef WORD T_BSIZE; // Type for size of block

MMRESULT Prepare(HWAVEIN hWaveIn) { // Prepare for playback

MMRESULT Unprepare(HWAVEIN hWaveIn) {

MMRESULT Add(HWAVEIN hWaveIn) {

MMRESULT Prepare(HWAVEOUT hWaveOut) {

MMRESULT Unprepare(HWAVEOUT hWaveOut) {

MMRESULT Add(HWAVEOUT hWaveOut) {

typedef std::queue<CSendBuffer*> CSendBufQ;

void Report(char *pszBuffer) {

BOOL OnInit(HWND hWnd) {

// Obtain remote host's IP address

// Create and bind the local socket to a port

mmRC = waveOutOpen(&m_hWaveOut, (UINT)WAVE_MAPPER,

mmRC = waveInOpen(&m_hWaveIn, (UINT)WAVE_MAPPER,

if (!(m_fInClosing && m_fOutClosing)) {

void OnWimData(WAVEHDR *pHdrWave) {

if (m_fInClosing) return; // Don't transmit when closing

for (;;) { // Playback as many as possible without gaps

void RecoverPrevData(CRecvBuffer *pBuffer) {

if (m_lpFreeBufs.empty()) { // Fail if no free buffers

void OnWomDone(WAVEHDR *pHdrWave) {

// Playback done -- Unprepare buffer and add to free list

if (m_fOutClosing) // Ignore data if playback is closing

recv(m_Socket, (char*)&Data, sizeof(Data), 0);

if (pData->m_dwSeq >= m_dwSeqExp) {

// Remove from Free list Insert into Playback list

pSoundDlg = (CSoundDialog *)GetWindowLong(hWnd, DWL_USER);

int _stdcall WinMain(HINSTANCE hInst, HINSTANCE hPrevInst,

if (hPrevInst) return -1;

Listing 2 (soundeco.cpp an echo server simulating packet errors)

// soundeco.cpp by Yoginder Dandass, September 2000

int main(int argc, char* argv[]) {

/* setup windows sockets */

// Bind to port 1500 by defualt

dDropRate = ((double)atoi(argv[1]) / 100.0);

iSockUDP = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);

if (bind(iSockUDP,(const struct sockaddr*)&SockAddrRead,

if (getsockname(iSockUDP, (struct sockaddr*)&SockAddrRead,

std::cout << "Sound Echo bound to port: "

closesocket(iSockUDP); // close socket

You might also like