Professional Documents
Culture Documents
Yogi Dandass
Yogi Dandass is a researcher at Mississippi State Universitys High Performance
Computing Laboratory and is a lecturer at the Department of Computer Science. He
designs software libraries in C++ for numerical computing, artificial intelligence, and
real-time communication middleware. He can be reached at yogi@cs.msstate.edu.
Windows isnt a real-time operating system, and the internet provides no real-time
packet delivery, so how can you handle a real-time task like internet audio
communication? This article shows how you can attack the problem with
compression and redundancy.
Unfortunately, the Internet does not provide guarantees for timely delivery of data.
Furthermore, while the audio devices in PCs are capable of smooth audio capture
and playback, Windows (which isnt a real-time operating system) does not provide
guarantees that the applications threads will be scheduled in a manner conducive to
producing on-time results. The non-real-time nature of the operating system and
internetworking introduces delays of unpredictable magnitude (jitter) during the
delivery of audio data. Therefore, I implement a simple jitter control mechanism in this
application to reduce the unpleasant gaps that inevitably occur in the audio stream.
Dataflow Overview
Figure 1 depicts the flow of data in the application. The application supplies a number
of data buffers to the audio capture device (waveIn). The device fills each buffer with
digitized data and returns it to the application. The application inserts the filled buffer
into the transmission queue, and after the data is put on the network, the application
returns the emptied buffer to the waveIn device, starting the process all over again.
At the playback computer, packets of audio data are received from the network into
buffers taken from the Free List. The application inserts the filled buffers into the
Playback List. The Playback List is used to assemble the buffers in the proper order
needed for playback. After ordering the buffers, the application hands the buffers to
the playback device, waveOut. After playing the data in the buffers, waveOut returns
the buffers to the application for reinsertion into the Free List.
Figure 1: Streaming audio overview
In this application, on-time delivery of data is more important than error-free delivery,
so I chose to avoid using the connection-oriented Transmission Control Protocol
(TCP). TCP guarantees error-free and ordered delivery of data packets by repeatedly
transmitting a damaged or lost packet before delivering subsequent packets. These
repeated transmissions could cause significant delays in packet delivery, making jitter
control difficult.
The User Datagram Protocol (UDP), on the other hand, makes no guarantees about
the delivery of packets. Packets can be delivered in a damaged condition and out of
sequence. Some packets may not be delivered at all if the network is congested.
Most importantly, each packet is routed independently, and a datagram packet is
transmitted regardless of whether the previous packet was delivered successfully. As
long as only a small percentage of packets are damaged or lost while traversing the
Internet, UDP/IP is well suited for this application.
To provide high-quality audio in the face of unbounded delays and lost datagrams
over the Internet, I send a copy of the previous datagrams audio data with each new
datagram. Under this scheme, the audio is played accurately even when every other
packet is dropped. However, this redundancy does increase the bandwidth
requirements of the application and can be removed when operating under
constrained bandwidth conditions.
mysound.cpp (Listing 1) contains the C++ source for a sample program that can run
simultaneously on two computers for a full-duplex conversation. The executable must
be linked with Windows socket library (wsock32.lib or ws2_32.lib) and the
multimedia library (winmm.lib). To test the program on a single computer, you can
have the program send packets to an echo server (port 7) and play back the echoed
packets. soundeco.cpp (Listing 1) contains the source code for an echo server that
drops random packets at a user specified rate.
Audio Acquisition
You open a waveIn device and request a particular codec and a set of audio
acquisition characteristics (sample rate, resolution, channels, etc.) by passing a
WAVEFORMATEX structure to waveInOpen(). The WAVEFORMATEX structure
holds the attributes common to all codecs. If the codec requires additional
information, the WAVEFORMATEX structure is included as the first member of a
codec-specific wave format structure. GSM, for example, needs the number of
samples per data block and therefore uses the GSM610WAVEFORMAT structure.
See winmm.h for additional examples of wave format structures.
In mysound.cpp (Listing 1), I initialize the wave input device in the OnConnect()
method of CSoundDialog. I request a GSM codec with a sample rate of 8Khz, 1
channel (mono), 1,625 bytes-per-second data rate, and a block alignment of 65. The
block alignment specifies the smallest quantity of data the codec can process at a
time. The GSM-specific field, samples-per-block, is set to 320. This combination of
fields specifies that the codec will produce 65-byte blocks of data, 25 times per
second (1,625/65 = 25) and each 65-byte block will contain 320 samples. The bits-
per-sample field is determined by GSM and therefore is set to zero. Note that each
65-byte block contains 40 milliseconds of audio data.
The set of 25 blocks holds one second of audio, allowing sufficient capacity to handle
any jitter in data acquisition and transmission. In other words, if the transmission of a
filled buffer is delayed for a second, all the other buffers will have audio data ready to
transmit, but the waveIn device will be starved of buffers in which to put new data.
However, delays greater than a few hundred milliseconds will have already reduced
audio quality to such an extent that the loss of data caused by transmission delays
greater than one second will not be a significant factor.
In waveInOpen(), I also specify that the waveIn device shall send messages to the
applications window to report status information and return filled buffers to the
application. (Alternatively, a callback function or thread can also be used for this
purpose.) The window receives MM_WIM_OPEN, MM_WIM_CLOSE, and
MM_WIM_DATA messages when waveIn is opened, closed, and returns filled
buffers to the application, respectively. waveInOpen() returns a handle to the wave
input device that is used in subsequent wave input functions.
After preparing the 25 buffers, I call waveInAddBuffer() repeatedly to add all of the
buffers WAVEHDRs to the waveIn devices queue. Once the waveIn device fills a
data block, it sets the number of bytes of audio data returned in the associated
WAVEHDRs dwBytesRecorded member and returns the WAVEHDR to the
application window. The resulting MM_WIM_DATA message is handled in the
applications OnWimData() method.
Data Transmission
OnSocketRead() handles packet reception; its called when WinSock notifies the
application of incoming datagrams. In this function, I take a buffer (an instance of
class CRecvBuffer) from m_lpFreeBufs (the list of free buffers), receive the data
from the socket into the buffer, and insert the filled buffer into m_lpPlayBufs (the
playback list). A total of 50 instances of class CRecvBuffer are available to the
application.
The code maintains the buffers in the playback list in ascending sequence number
order. If a buffer with the sequence number of the newly received buffer already
exists in the playback list, the new buffer is a delayed duplicate datagram and is
discarded. Because there is a remote chance that a burst of delayed datagrams
arrives faster than they can be played back, the code discards datagrams when no
free buffers are available.
Since I cannot rely on the non-real-time Internet and Windows to deliver packets to
the waveOut device precisely when they are needed for playback, I wait for 400
milliseconds (i.e., 10 data buffers) before starting to play the audio stream. This way,
I build some laxity into the packet delivery time (i.e., when packet n is being played
back, packet n+10 is being received). If a packet does not arrive 200 milliseconds
before it is required for playback (i.e., packet n+5 has not arrived by the time packet n
is being played), I assume it is lost and prepare the next available buffer for playback.
If the sequence number of the block at the head of the playback list is the expected
sequence number, JitterControl() removes the buffer from the Playback list and
gives it to the waveOut device for playback. If the expected block is missing, but the
next sequential block is available (i.e., the sequence number of the block at the head
of the list is equal to m_dwSeqExp+1), the data for the expected block is recovered
in RecoverPrevData() from the redundant copy in the next data block.
If m_iCountOut is less than five and the sequence number of the block at the head
of the playback list is not equal to m_dwSeqExp (i.e., the expected block has not
arrived yet), the application cannot wait any longer for the late block. The expected
block is assumed to be lost, and the next available buffer in the playback list
regardless of sequence number, is given to the waveOut device for playback.
If several consecutive buffers are available in the Playback list, they are all processed
for playback in JitterControl(). JitterControl() updates m_dwSeqExp to reflect the
next expected buffer at the head of the Playback list, and increments m_iCountOut
whenever a buffer is given to the waveOut device.
Audio Playback
OnConnect() opens the waveOut device in a manner similar to the waveIn device.
Opening the waveOut device generates a WOM_OPEN message that is handled by
WomOpen(); it initializes the m_lpFreeBufs list and m_iCountOut. In
JitterControl(), every playback buffer is prepared by calling its Prepare() method
before being given to the waveOut device. The preparation process of CRecvBuffer
is essentially identical to the preparation of CSendBuffer.
After the waveOut device plays back the data in the buffer, the buffer is returned to
the application in the WOM_DONE message, handled by WomDone(). WomDone()
extracts the pointer to the associated CRecvBuffer instance from the dwUser
member of the returned WAVEHDR structure. It then calls the buffers Unprepare()
method, inserts the buffer into the list of free buffers, and decrements m_iCountOut.
Program Termination
Before exiting, the application must stop the audio capture and playback process,
retrieve all enqueued buffers from the waveIn and waveOut devices, and close the
devices. Exiting before the devices are closed can cause the multimedia subsystem
to hang. However, the devices cannot be closed until they have returned all the
buffers to the application. Therefore, in OnCancel(), I record the users termination
request in m_fExiting and call OnDisconnect().
In OnDisconnect(), I close the socket, reset the devices, and set m_fOutClosing
and m_fInClosing to indicate that the devices are being closed and that
OnWomDone() and OnWimData() should not prepare the returned data buffers for
reuse. OnWomDone() decrements m_iCountOut as each enqueued playback buffer
is returned by waveOut. Once the count reaches zero, the waveOut device is
closed, resulting in the MM_WOM_CLOSE message that is handled by
OnWomClose(). Audio capture is similarly terminated by decrementing m_iCountIn
and closing the device when all of the buffers are returned to OnWimData(). Closing
the waveIn device generates the MM_WIM_CLOSE message that is handled by
OnWimClose(). In OnWomClose() and OnWimClose(), if both devices are marked
as closed, and application termination is requested, EndDialog() is called to exit the
application.
Further Enhancements
Because of limited space, I have used a relatively simple form of jitter control and
packet recovery. A more sophisticated approach would entail sending more than one
redundant copy of audio data over several packets. The redundant copies can be of
reduced quality, created with aggressive lossy compression algorithms, in order to
reduce the network bandwidth required. Sending smaller size packets when silence is
detected can further reduce bandwidth requirements. This is a particularly effective
technique because conversation is mostly half- duplex.
In this application, I make very little effort to synchronize the two computers. High-
quality applications can use TCP to send status information to each other in order to
indicate the extent of end-to-end delays, percent of packet loss due to network
congestion, and the termination of the remote end. Also, a complete application will
provide controls to allow users to select an input source (microphone, line-in, or CD-
ROM) and to set the playback volume. In this application, you can use the volume
control multimedia accessory application supplied with Windows for this purpose.
Finally, this application demonstrates that a Windows application can deliver
multimedia content over the Internet with controllable jitter. Furthermore, it is also
possible to devise error recovery techniques that deliver adequate quality. This
application also shows that the end-to- end delay can be kept sufficiently small so as
to make interactive conversation feasible.
Listing 1 (mysound.cpp)
#include <winsock2.h>
#include <windows.h>
#include <windowsx.h>
#include <stdlib.h>
#include <mmsystem.h>
#include <mmreg.h>
#include <list>
#include <queue>
#include "mysndrc.h"
class CSendBuffer {
public:
WAVEHDR m_WaveHeader; // wave header for the buffer
XMITDATA m_Data; // Data block to be transmitted over UDP
class CRecvBuffer {
public:
WAVEHDR m_WaveHeader;
XMITDATA m_Data;
class CSoundDialog {
protected:
HWND m_hWnd; // Dialog handle
bool m_fInClosing; // Stopping wave capture?
bool m_fOutClosing; // Stopping playback?
HWAVEIN m_hWaveIn; // Handle to capture device
HWAVEOUT m_hWaveOut; // Handle to playback device
CSendBuffer m_aInBlocks[NUM_BLOCKS]; // Capture bufs
CRecvBuffer m_aOutBlocks[NUM_BLOCKS*2]; // Playback bufs
T_BSIZE m_nPrevSize; // Size of previous data block
BYTE m_abPrevData[BLOCK_SIZE]; // Copy of block
SOCKET m_Socket; // UDP socket
struct sockaddr_in m_SockAddr; // Remote address
DWORD m_dwOutSeq; // Sequence counter
int m_iCountIn; // Items in capture queue
int m_iCountOut; // Items in playback queue
DWORD m_dwSeqExp; // Sequence of next out buffer
CRecvBufL m_lpPlayBufs; // List of playback buffers
CRecvBufL m_lpFreeBufs; // List of free recv buffers
CSendBufQ m_qpXmitBufs; // Transmission queue
bool m_fDelay; // In delay mode?
bool m_fExiting; // Shutting down?
EnableWindow(GetDlgItem(hWnd, IDC_BUTTON_DISCONNECT),
FALSE);
return TRUE;
}
void OnCancel() {
if ((m_hWaveOut != 0) || (m_hWaveIn != 0)) {
OnDisconnect();// Close socket/devices before exiting
m_fExiting = true; // Set exit indicator
}
else
EndDialog(m_hWnd, 0); // Exit if devices closed
}
void OnConnect() {
char szIPAddress[128];
unsigned long ulAddrIP;
struct hostent *pHostEnt;
GSM610WAVEFORMAT WaveFormatGSM;
MMRESULT mmRC;
ZeroMemory(&m_SockAddr, sizeof(m_SockAddr));
m_nPrevSize = 0; // Initialize size of previous buffer
m_iCountIn--;
pAudioBuffer = (CSendBuffer*)(pHdrWave->dwUser);
// Unlink the buffer from the capture device
pAudioBuffer->Unprepare(m_hWaveIn);
if (!m_fInClosing) {
pXmitData = &(pAudioBuffer->m_Data);
// Set the buffer data size, sequence, redundant data
pXmitData->m_nSize = (T_BSIZE)(pHdrWave->
dwBytesRecorded);
pXmitData->m_dwSeq = m_dwOutSeq++;
pXmitData->m_nSizeP = m_nPrevSize;
memcpy(pXmitData->m_abDataP, m_abPrevData, m_nPrevSize);
// Save a copy of data to send with next packet
m_nPrevSize = pXmitData->m_nSize;
memcpy(m_abPrevData, pXmitData->m_abData, m_nPrevSize);
// add to the transmission queue
m_qpXmitBufs.push(pAudioBuffer);
OnSocketWrite(); // Try to send queued buffers
}
else { // close is requested, don't recycle
// If all buffers have been returned, close the device
if (m_iCountIn == 0)
waveInClose(m_hWaveIn);
}
}
void OnSocketWrite() {
CSendBuffer *pBuffer;
void JitterControl() {
CRecvBuffer *pBuffer;
if (m_fDelay) {
if (m_lpPlayBufs.size() >= THRESHOLD) {
// Start playback if enough buffers received
Report("Delay off\r\n");
m_fDelay = false;
for (int i = 0; i < THRESHOLD; i++) {
pBuffer = m_lpPlayBufs.front();
m_lpPlayBufs.pop_front();
if (pBuffer->m_Data.m_dwSeq == (m_dwSeqExp+1)) {
// Recover from previous if missing buffer
RecoverPrevData(pBuffer);
i++;
pBuffer->Prepare(m_hWaveOut);
pBuffer->Add(m_hWaveOut);
} else {
pBuffer->Prepare(m_hWaveOut);
pBuffer->Add(m_hWaveOut);
}
m_iCountOut++;
m_dwSeqExp = pBuffer->m_Data.m_dwSeq + 1;
}
}
return;
}
if (m_iCountOut == 0) {
// Start delay mode if we run out of buffers
m_fDelay = true;
Report("Delay on\r\n");
return;
}
pBuffer = m_lpPlayBufs.front();
if (pBuffer->m_Data.m_dwSeq == (m_dwSeqExp+1)) {
// Recover missing block
RecoverPrevData(pBuffer);
m_dwSeqExp++;
}
if (pBuffer->m_Data.m_dwSeq == m_dwSeqExp) {
// This is the expected buffer -- playback
pBuffer->Prepare(m_hWaveOut);
pBuffer->Add(m_hWaveOut);
m_iCountOut++;
m_dwSeqExp = pBuffer->m_Data.m_dwSeq + 1;
m_lpPlayBufs.pop_front();
continue;
}
if (m_iCountOut < PLAYBACK_THRESHOLD) {
// Playback next buffer regqrdless of seq#
// because we are short of data
m_dwSeqExp = pBuffer->m_Data.m_dwSeq;
Report("skipping\r\n");
continue;
}
break;
}
}
if (!m_fOutClosing)
JitterControl(); // Do jitter control if not exiting
else if (m_iCountOut == 0)
waveOutClose(m_hWaveOut);
}
void OnSocketRead() {
CRecvBuffer *pBuffer;
XMITDATA *pData;
if (m_lpFreeBufs.empty()) { // Overflow
XMITDATA Data;
pBuffer = (CRecvBuffer*)(m_lpFreeBufs.front());
pData = &(pBuffer->m_Data);
if (recv(m_Socket, (char*)pData,
sizeof(*pData), 0) == SOCKET_ERROR)
Report("Error receiving data\r\n");
else {
if (pData->m_dwSeq == 0)
m_dwSeqExp = 0; // Reset the expected sequence
void OnWimOpen() {
m_dwOutSeq = 0; // reset sequence for sent blocks
m_iCountIn = 0; // reset count of data blocks in queue
for (int i = 0; i < NUM_BLOCKS; i++) {
// prepare and add blocks to capture device queue
m_aInBlocks[i].Prepare(m_hWaveIn);
m_aInBlocks[i].Add(m_hWaveIn);
m_iCountIn++;
}
}
void OnWimClose() {
m_hWaveIn = 0;
if (m_hWaveOut == 0) { // If both devices are closed
EnableWindow(GetDlgItem(m_hWnd, IDC_BUTTON_DISCONNECT),
FALSE);
EnableWindow(GetDlgItem(m_hWnd, IDC_BUTTON_CONNECT),
TRUE);
if (m_fExiting)
EndDialog(m_hWnd, 0);
}
}
void OnWomOpen() {
m_iCountOut = 0;
m_dwSeqExp = 0;
for (int i = 0; i < NUM_BLOCKS*2; i++) { // Setup free list
m_aOutBlocks[i].Prepare(m_hWaveOut);
m_lpFreeBufs.push_back(&(m_aOutBlocks[i]));
}
WSAAsyncSelect(m_Socket, m_hWnd, WM_USR_SOCKIO,
FD_READ | FD_WRITE); // Non-blocking socket
}
void OnWomClose() {
m_hWaveOut = 0;
if (m_hWaveIn == 0) { // If both devices are closed
EnableWindow(GetDlgItem(m_hWnd, IDC_BUTTON_DISCONNECT),
FALSE);
EnableWindow(GetDlgItem(m_hWnd, IDC_BUTTON_CONNECT),
TRUE);
if (m_fExiting)
EndDialog(m_hWnd, 0);
}
}
public:
BOOL static CALLBACK SoundDialogProc(HWND hWnd, UINT uMsg,
WPARAM wParam, LPARAM lParam) {
CSoundDialog *pSoundDlg;
case WM_COMMAND:
if (GET_WM_COMMAND_CMD(wParam, lParam) == BN_CLICKED) {
switch (GET_WM_COMMAND_ID(wParam, lParam)) {
case IDCANCEL:
pSoundDlg->OnCancel();
break;
case IDC_BUTTON_CONNECT:
pSoundDlg->OnConnect();
break;
case IDC_BUTTON_DISCONNECT:
pSoundDlg->OnDisconnect();
break;
}
}
break;
case MM_WIM_OPEN:
pSoundDlg->OnWimOpen(); break;
case MM_WIM_CLOSE:
pSoundDlg->OnWimClose(); break;
case MM_WOM_OPEN:
pSoundDlg->OnWomOpen(); break;
case MM_WOM_CLOSE:
pSoundDlg->OnWomClose(); break;
case WM_INITDIALOG:
SetWindowLong(hWnd, DWL_USER, lParam);
pSoundDlg = (CSoundDialog *)lParam;
return pSoundDlg->OnInit(hWnd);
}
return 0;
}
};
#include <winsock2.h>
#include <windows.h>
#include <stdlib.h>
#include <iostream>
typedef struct {
DWORD dwSeq;
WORD bSize;
WORD bSizeP;
BYTE abData[200000];
} SOUND_BUFFER;
WORD wVersionRequested;
WSADATA wsaData;
if (argc < 2) {
std::cout << "Usage: " << argv[0] <<
" <drop rate %> [<port #>]\n";
return -1;
}
for (;;) {
int iRecvLen;
iSockAddrReadLen = sizeof(SockAddrRead);
iRecvLen = recvfrom(iSockUDP, (char*)&Buffer,
sizeof(Buffer), 0,
(struct sockaddr*)&SockAddrRead,
&iSockAddrReadLen);
if (iRecvLen == SOCKET_ERROR) {
std::cout << "Error receiving data\n";
break;
} else {
// drop some % of the packets
if (rand() < (dDropRate * RAND_MAX)) {
std::cout << "dropping Seq: " << Buffer.dwSeq
<< "\t Size: " << (int)(Buffer.bSize)
<< "\n";
} else {
if (sendto(iSockUDP, (char*)&Buffer, iRecvLen, 0,
(struct sockaddr*)&SockAddrRead,
iSockAddrReadLen) == SOCKET_ERROR) {
std::cout << "Error sending data\n";
break;
}
std::cout << "Seq: " << Buffer.dwSeq << "\t Size: "
<< (int)(Buffer.bSize)
<< "(" << (int)(Buffer.bSizeP) << ")\n";
}
}
}