You are on page 1of 10

The Discrete Cosine Transform

110.312 - Introduction to Wavelets




























Aileen Cuddy
Elisabeth Walden
Sarah Zalewski
12/3/01









Introduction to the DCT

The Discrete Cosine Transform (DCT) is a change of basis that takes in real-
valued functions and transforms them with respect to an orthonormal cosine basis. The
DCT is formally defined as:

,
_

,
_

2
1 2
cos
2
1 2 2
1
0
) (
m
N
k
N
m
F
N
a
N
m
N
k



There are other variances of the DCT definition that rely on different basis vectors. The
above definition is in terms of a single variable. A two-dimensional version of the DCT
is commonly used in practice.

The basis vectors of the DCT are defined below:

1 ,... 2 , 1
2
1 2
cos
2
1
,...,
2
1
1
0
) (
) (
0

,
_

,
_

,
_

N k
m
N
k
C
C
N
m
N
k
N



Theorem 1
For N , the N vectors in R
N

'

1 ,..., 1 , 0 :
2
) (
N k C
N
N
k

form an orthonormal basis for R
N
.

Proof
Recall the following cosine identities:

(1) 2 cos() = e
i
+ e
-i
(2) 2 cos
2
() = 1 + cos(2)
(3) 2 cos() cos() = cos( + ) + cos( - )

This proof uses the equality:

(1.1) 0 )
2
1
( cos
1
0

,
_

N
m
m
N
k
for k = 1, 2, , 2N-1
Formula (1.1) is shown to be true by using cosine identity (1) and the partial geometric
series. Hernandez and Weiss provide a proof of (1.1). To prove that the basis vectors are
orthonormal we must show the following:

(i) The norms of both
) (
2
N
k
C
N
and
) (
0
2
N
C
N
are equal to one.
(ii)
) (
0
2
N
C
N
is orthogonal to
) (
2
N
k
C
N
.
(iii)
) (
2
N
l
C
N
is orthogonal to
) (
2
N
k
C
N
for k l and 1 k,l N-1.


Part (i) follows using cosine identity (2):

N
2
C
k
(N)
has norm 1:


2 ) ( ) (
)
2
(
2
N
k
N
k
C
N
C
N



It is only necessary to show that (C
k
(N)
)
2
= 1.

,
_

+ +
,
_

+ ) 1 2 ( cos 1 )
2
1
( cos 2
2
m
N
l
m
N
k

= 1 + 0 = 1

The cosine term is evaluated by letting k = 2l for l= 1, , N-1 and applying formula 1.1.
Also,
N
2
C
0
(N)
has norm 1:
1
2
1 2
)
2
(
2
2 ) (
0
) (
0
N
N
C
N
C
N
N N


Part (ii) is by equation 1.1:
To show that
N
2
C
0
(N)
is orthogonal to
N
2
C
k
(N)
, it is only necessary to show that

since C
0
(N)
is a constant


Equation 1.1 determines that the above is true.

Part (iii) can be proven true by cosine identity (3) and equation 1.1. We must show that
C
k
(N)
is orthogonal to C
l
(N)
for k l and 1 k, l N-1.
By identity (3),
0
1
0
) (

N
m
N
k
C
0 )) )( 1 2 ( cos( )) )( 1 2 ( cos(
))
2
1
( cos( ))
2
1
( cos( 2
1
0
1
0
1
0
+ + + +
+ +

k l m
N
k l m
N
m
N
l
m
N
k
N
m
N
m
N
m



The second line can use equation 1.1 because l + k and |l k| must be between 1 and N-1
given the conditions on l, k previously stated. With all three conditions satisfied, the
proof is complete.

Now that we have an orthonormal basis, the following properties hold:
(1) IDCT (inversion formula):

)
2
1 2
cos(
2
1
)
2
1 2
(
1
1
) ( ) (
0
+
+
+

m
N
k
a a
N
m
F
N
k
N
k
N


(2) Parsevals Relation:
If C is the DCT, then <z , w> = <C
Z
, C
W
>.

(3) Plancherals Relation:
If C is the DCT, then || z ||
2
= ||C
Z
||
2
.

Advantages and Disadvantages of the DCT

The Discrete Cosine Transform is similar to the Fourier Transform in that it
transforms a signal from the spatial or time domain to the frequency domain, as in if
preparing an image for compression. Just as the Fourier Transform can be completed in a
smaller amount of calculations by use of the FFT, the complexity of calculations needed
to use the DCT also can be reduced. A method called the Fast Cosine Transform, or
FCT, can be used when N = 2
q
, where N is the number of vectors to be transformed and q
is an element of the integers, and the complexity is reduced, just as with the FFT, from N
2

calculations to a number of calculations on the order of Nlog
2
N. The FFT is actually the
foundation for calculating the FCT. The process is as follows:
Assume a general F(m) is the function to be transformed.
Let m = 0, 1, ., N-1.
For all m in that range, let X
m
= F[(2m+1)/2N], and extend the values of m to [-N,
N-1].
X
(-m)
= X
m-1
for m = 0, 1, ., N due to the periodicity of the original function.

Now consider a specific function f undergoing a similar process except that in this
case,
X
L
= f(e
-2ikl/(2N)
) for all L from [-N, N-1].

The DFT of f is then as follows:

1
ikL/(2N) 2 -
Le X ) 2 / 1 (
N
N L
k N y


These y
k
's from the DFT are actually equal to 1/2*(e
i/(2N)
) times the corresponding DCT
values. If the FFT is used to get the y
k
's, then the aforementioned complexity reduction
holds. This process is that of the Fast Cosine Transform.
There are multiple advantages to using the DCT over even the Fast Fourier
Transform for application purposes. The first main advantage of the DCT is its
efficiency. As the size of the image to be produced increases, the FFT becomes
increasingly complex at a much more rapid rate, and is not efficient for compression.
Instead, in transforming to the frequency domain, a type of DCT called the Blocked DCT
is used, which performs the same task in a more efficient manner. The transform as a
whole is applied to n x n arrays, typically sized 8x8 in image compression. However,
computing a blocked DCT does not actually require manual separation of the image into
blocks like the FFT would, but rather this blocking occurs as an inherent function of the
DCT. The fact that the DFT must be computed on each block separately also upholds the
fact that without any complexity reduction, N
2
calculations would be required. But
instead, since the DCT is separable across dimensions, rows can be broken down into
segments of length n and the DCT can be applied to these segments. However, the
blockwise DCT destroys the invariance properties of the system, because the blockwise
frequencies do not bear a simple relation to the frequencies achieved by just transforming
the image into the Fourier (or frequency) domain. Hence, any linear scaling factor from
the time domain will not carry over into the frequency domain if blocking is used because
linearity is no longer maintained. This problem is a noteworthy one because certain
higher frequency components tend to be suppressed during the quantization step (which
will be discussed further), and multiplying them by a scaling factor to heighten their
expression is not helpful, as the factor is not held constant throughout the process.
Another advantage of the DCT is that its basis vectors are comprised of entirely
real-valued components. Therefore, in terms of image compression, all pixel values are
automatically represented by real numbers. In addition, the pixels themselves do not
affect each other. In Fourier analysis, one of the disadvantages is that every pixel affects
every other pixel, but if the DCT is used instead of the DFT, values of the pixels come
directly from the transform of the time domain value.
The aforementioned step of quantization is part of the image compression process,
and occurs after the image is prepared for compression by the DCT. In quantization, the
number of values representing a transformed quantity is reduced, hence also lowering the
amount of bits that represent it electronically. There are a few ways to carry out this
reduction of data. One method is simple rounding: real numbers become integers. A
more specific quantization first "weights" the value based on its contribution to the
image, and multiplies it by a weighting factor prior to rounding. A third method
eliminates the frequencies that least accurately represent a pixel value. For example,
often the highest frequencies will be eliminated due to their small size and small
contribution to the signal energy, as in the case of a low-pass filtered quantization. For
some applications, there is a predefined quantization matrix of factors that essentially
gives the weights for the image pixels.
Applications of the DCT

Most people outside mathematics fields have probably never even heard of the
discrete cosine transform (DCT), however most computer users interact with the DCT
indirectly on a regular basis. Even to those who are not avid technology enthusiasts,
JPEG image files and MPEG video files are familiar. Web pages containing images
compressed by JPEG require about 10% the download time of uncompressed images.
MPEG compression allows single DVDs to contain full length movies for home viewing
at a higher picture quality than available on VHS. Both of these compression algorithms
rely on the frequency separating of the DCT.
Below we will discuss the basic process of JPEG compression as a practical
example of the DCT in action. As a general background, it is important to understand
that JPEG image compression relies on dividing the image into smaller blocks of 8 by 8
pixels. This standard was adopted by the Joint Photographic Experts Group (JPEG) in
the developmental phases for two primary reasons. First, the processing of larger blocks
was seen as being prohibitively slow for the computer to execute. Second, the experts
observed that the use of larger blocks did not result in appreciably greater compression.
To this point, we have discussed the 1-dimensional DCT, however for
applications in image compression, we will use the 2-dimensional DCT. The two
dimensional DCT can be easily computed from the 1-dimensional DCT and is given by:

where u is contained in the interval [0,n-1] and v is contained in [0,m-1]. In general use
by mathematicians, the DCT is expressed by the abovementioned formulas. However,
for use in computer programs, it is more efficient to establish the DCT matrix to perform
the transform. In fact, computers can perform the DCT by use of this matrix with 1/8 the
number of multiplications and 1/4 the number of additions. The DCT matrix is given as:

where i represents row entry and j represents the column entry. From a programming
perspective, this C
ij
can be expanded into an 8 by 8 matrix for use with all of the 8 by 8
blocks of the image. The application of this DCT matrix converts the images spatial
information into frequency information for use in the quantization phase.
The next step in JPEG compression involves dividing the output of the DCT
phase by a quantization matrix. At this point, it is important that the images information
has been transformed into the frequency domain because the quantization matrix acts as a
low-pass filter removing the high frequency information which is less important to the
recreation of the image. The quantization phase is the lossy part of the JPEG
compression process after which the resulting matrix contains predominantly zero values
for high frequencies.
Lastly, the output of the quantization step undergoes lossless compression which
takes advantage of the large number of zero values in the matrix. The matrix is encoded
entry by entry in a zigzag fashion from the top left to the bottom right in order to
maximize the number of sequential zeros. Long strings of zeros can be encoded using a
maximum of 6 bits and modern encoding allows for the non- zero entries to be encoded in
a minimum of space. At this point, the image will be in JPEG format consuming roughly
1/10 of its previous space with no appreciable loss of image quality and can be
reconstituted with equal efficiency.
Conclusion
In conclusion, the Discrete Cosine Transform provides a mathematical and
computational method of taking spatial data, dividing it into parts of differing importance
with respect to visual quality, and compressing it into an accurate and overall high quality
image. The DCT upholds many of the properties of the Fourier Transform, such as
orthonormality and corresponding relations that follow, such as Parsevals and
Plancherals. Its inverse, the IDCT, allows reconstruction of an image frame that had
been encoded by the DCT, and hence transforms back to the time domain. Despite its
similarities to the Fourier Transform, it has been shown that for application purposes the
DCT is much more practical and efficient, and it is commonly thought of in regards to
image compression for JPEG and MPEG files.
















Bibliography

Hernandez, Eugenio and Guido Weiss. A First Course on Wavelets. Boca Raton, FL:
CRC Press, 1996. pp. 20-30, 432-442.

Lam, Edmund Y. and Joseph W. Goodman. Discrete Cosine Transform domain
restoration of defocussed images. Applied Optics. 37, 6213-6218 (1998).

Mitchell, Joan L. and William B. Pennebaker. MPEG Video Compression Standard.
New York, NY: Chapman and Hall, 1997. pp. 33-49.

Nelson, Mark and Jean-Loup Gally. The Data Compression Book. New York NY: M&T
Books, 1996. pp. 326-344.

Watson, Andrew. Image Compression Using the Discrete Cosine Transform. NASA
Ames Research Center.

Wickerhouser, Mladen Victor. Adapted Wavelet Analysis from Theory to Software.
New York, NY: AK Peters, Ltd., 1994. pp. 83-101.

You might also like