You are on page 1of 10

This tutorial is downloaded from https://sites.google.

com/site/enggprojectece

Image Compression Part 1


Discrete Cosine Transform (DCT)
Compiled by: Sivaranjan Goswami, Pursuing M. Tech. (2013-15 batch)
Dept. of ECE, Gauhati University, Guwahati, India
Contact: sivgos@gmail.com
This tutorial covers the fundamentals of Image transformation, the concept of energy compaction in
transformed domain an introduction to DCT and its application in image compression. Finally we will
look into the standard schemes of JPEG image compression technique in brief.

Image Representation in Transformation Domain


An image in spatial domain can be represented in a transformed domain given by:
( , )=

( , ) ( , , , )

(1)

Here, f(x,y) is the image in spatial domain and r(x,y,u,v) is called the forward transformation
kernel.
The original image can be obtained from the transformed image using the relation:
( , )=

( , ) ( , , , )

(2)

Where s(x,y,u,v) is called the inverse transformation kernel.


The transformation can be represented in matrix form given by:

T(MN) = R1(MM) f(MN) R2(NN)

(3)

Where f(MxN) is the image matrix and R1(MxM) and R2(NXN) combined represent the transformation
kernel.

Two Dimensional Discrete Fourier Transform


Two dimensional (2D) DFT is given by:
( , )=

( , )

(4)

Similarly, 2D IDFT is given by:


( , )=

( , )

(4)

This tutorial is downloaded from https://sites.google.com/site/enggprojectece


We know that Fourier transform gives the image in frequency domain. Hence the independent
variables u and v represents frequency along x axis and y axis respectively. In case of image,
frequency is nothing but the rate of change of intensity.
Thus F(0,0) contains information about the pixels where there is no variation. Similarly F(M 1, N
1) contains the information of the pixels where this variation is more, that is the pixels where there is
variation in intensity along both x axis and y axis in every neighboring pixel.
In MATLAB a 2D FFT can be obtained using the function fft2
Note
It is to be noted that MATLAB performs FFT (1D) by discretizing frequency,
in the range [0,2]. But for plotting both positive and negative spectrum in
the range [-, ], we must use the function fftshift.
Same is the case for 2D FFT also. When we perform fft2 the low frequency
portion goes to the four corners of the image and the high frequency portion
goes to the centre. However for most calculations, we take the frequency to
be 0 at origin and increasing outwards along x and y directions. Thats why
before applying any filter or other algorithms we have to use function fftshift.
However, before performing IFT2 dont perform an ftshift operation to
ensure correct result. Because the origin of spatial domain is at the upper left
corner of the image and we consider the image to be present only in the first
quadrant (considering the actual Cartesian plane to be 90 degree rotated).

The Concept of Energy Compaction:


In image, the low frequency corresponds to the coarser details, where the amount of intensity
variation is less and the high frequency corresponds to finer details, where there is more
variation in intensity. In most of the image, the high frequency portion contains less
information compared to the low frequency portion. So low frequency portions will have
more information compared to high frequency portion.
If the origin refers to the lowest frequency, then most of the information will be stored around
the origin. This is called energy compaction.
It can be verified by passing an image through LPF and HPF in the transformed domain and
then reconstructing them to the spatial domain.

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

Figure 1 (a): Passing an image through LPF more information in low frequency region

Figure 1 (b): Passing an image through HPF less information in high frequency region

From the figures 1(a) and 1(b) it can be seen that the low frequency components of the signal
has more information compared to the high frequency components. The LPF output is
intelligible but HPF output contains only some fine details of the image.
Here I have just shown the example of 2D DFT. But the fact is that any transformation shows
some energy compaction in the transformed domain. We shall discuss other transforms and
their performance in terms of energy compaction and other affecting factors in a later part of
this tutorial. Now we will try to understand what actually image compression is and why it is
required.

Overview of Image Compression


Let us consider a 32642448 color image (photographs taken by commonly used digital
cameras such as Nikon Coolpix are of such size actual size may vary from model to model).
Every pixel will require 3 numbers to represent the color (RGB). We know that in digital
image the pixel values are in the range 0 to 255. Thus we need 1 byte (8 bits) to store one
number. Hence, total amount of memory required to store the image is:
326424483 = 23970816 bytes = 22.86 mega bytes (MB)

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

But practically we can see that the sizes of the pictures taken by such cameras are in the
approximate range of 1.5 MB to 4 MB. This is possible only because of image compression.

How an Image is compressed?


An image (or most of the signals that we use) contains redundant amount of data than that is
actually required to convey information. There are basically three types of redundancy:
1. Coding Redundancy: This is because of inefficient allocation of bits per symbol.
This type of redundancy is eliminated using efficient coding schemes, such as
Huffman coding, Arithmetic coding etc. that are based on probability values of the
symbols.
2. Inter Sample Redundancy: Sometimes samples of a signal can be predicted from the
previous samples. In this case it is not an efficient way to store or transmit all the
samples. Instead, some proper code that yields only the necessary data to reconstruct
the information is sufficient. For example we can speak about DPCM (Differential
Pulse Code Modulation) where only the difference between two successive samples is
encoded.
3. Perceptual Redundancy: Sometimes an image contains more than that we can see
with our eyes. Such redundancy is called perceptual redundancy. This is also known
as psycho-visual redundancy.

Types of Compression
Compression algorithms are characterized by information preservation. There are three types
of compression:
1. Loss-less or Information Preserving: No loss of information. (text, legal or medical
application) But lossless compression provides only a modest amount of
compression.
2. Lossy Compression: Sacrifice some information for better compression (web
images) In this tutorial we are mainly dealing with this type.
3. Near Lossless: No (or very little) perceptible loss of information. (increasingly
accepted for legal, medical application).

Performance Evaluation of Compression


The performance of a compression technique is evaluated by comparing the reconstructed
image after compression with the original uncompressed image. The performance is
expressed in terms of root mean square (RMS) error and signal to noise ratio (SNR). They
can be calculated using equation 5(a) and 5(b) respectively.

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

(5a)

(5b)

Steps in Image Compression and Decompression:

Figure 2: steps in image compression

Transformer: It transforms the input data into a format to reduce interpixel redundancies in
the input image. Transform coding techniques use a reversible, linear mathematical transform
to map the pixel values onto a set of coefficients, which are then quantized and encoded. The
key factor behind the success of transform-based coding schemes is that many of the resulting
coefficients for most natural images have small magnitudes and can be quantized without
causing significant distortion in the decoded image. For compression purpose, the higher the
capability. of compressing information in fewer coefficients, the better the transform; for that
reason, the Discrete Cosine Transform (DCT) and Discrete Wavelet Transform(DWT) have
become the most widely used transform coding techniques.
Transform coding algorithms usually start by partitioning the original image into
subimages (blocks) of small size (usually 8 8). For each block the transform coefficients are
calculated, effectively converting the original 8 8 array of pixel values into an array of
coefficients within which the coefficients closer to the top-left corner usually contain most of
the information needed to quantize and encode (and eventually perform the reverse process at
the decoders side) the image with little perceptual distortion. The resulting coefficients are

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

then quantized and the output of the quantizer is used by symbol encoding techniques to
produce the output bitstream representing the encoded image. In image decompression model
at the decoders side, the reverse process takes place, with the obvious difference that the
dequantization stage will only generate an approximated version of the original coefficient
values e.g., whatever loss was
Quantizer: It reduces the accuracy of the transformers output in accordance with some preestablished fidelity criterion. Reduces the psychovisual (perceptual) redundancies of the input
image. This operation is not reversible and must be omitted if lossless compression is desired.
The quantization stage is at the core of any lossy image encoding algorithm. Quantization at
the encoder side, means partitioning of the input data range into a smaller set of values. There
are two main types of quantizers: scalar quantizers and vector quantizers. A scalar quantizer
partitions the domain of input values into a smaller number of intervals. If the output intervals
are equally spaced, which is the simplest way to do it, the process is called uniform scalar
quantization; otherwise, for reasons usually related to minimization of total distortion, it is
called non uniform scalar quantization. One of the most popular non uniform quantizers is the
Lloyd-Max quantizer. Vector quantization (VQ) techniques extend the basic principles of
scalar quantization to multiple dimensions.
Symbol (entropy) encoder: It creates a fixed or variable-length code to represent the
quantizers output and maps the output in accordance with the code. In most cases, a
variable-length code is used. An entropy encoder compresses the compressed values obtained
by the quantizer to provide more efficient compression. Most important types of entropy
encoders used in lossy image compression techniques are arithmetic encoder, huffman
encoder and run-length encoder.

Decompression is the reverse operation of all these steps to recover the image in
spatial domain.

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

Discrete Cosine Transform (2D)

The 2D DCT is given by:


( , )= ( ) ( )

( , )cos

(2 + 1)
2

(2 + 1)
2

(6)

(2 + 1)
2

(7)

Where,
( )=
( )=

if u=0

if u=1,2,3,.,M 1

2D Inverse DCT (IDCT) is given by:


( , )=

( ) ( ) ( , )

(2 + 1)
2

Compression using Quantization


DCT itself is almost lossless. But actual compression is achieved in the step quantization. As
we have already discussed in the section Energy Compaction, in the transformed domain, the
high frequency components contain very less information. It is to be noted that the energy
compaction of DCT is much more than DFT. Thus in the quantization stage, the image in
DCT domain is divided by a quantization matrix.
The following MATLAB program will give you a brief idea (here framing into 88 blocks is
skipped:
Example 1
clear all; close all; clc;
x=imread('lena_gray_256.tif'); %Reading original 256x256 image
X=dct2(x); %2D DCT
Y=X(1:150,1:150); %Taking only 150x150 of the DCT
Y1=round(Y); %Quantization - Practically some encoding required
y1=idct2(Y1,[256,256]);%Converting IDCT and making the size back to 256x256
y=uint8(y1); %Rounding off to integer values
figure
subplot 121
imshow(x), title('Original Image')
subplot 122
imshow(y), title('Compressed Image')
%Calculating mean square error (MSE)
mse=1/(256*256)*sqrt(sum(sum((x-y).^2)))

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

Result:
Original Image

Compressed Image

Figure 3: Output of Example 1: Comparison of the original and the compressed image
MSE obtained = 0.0156
It can be seen that the compressed image is similar to the original image and MSE is also
small. But we converted a 256256 image to just 150150 in the transformed (DCT) domain.
The encoding is nothing but representing the 150150 quantized image in transformed image
into some efficient encoding scheme to represent them in an efficient manner. Usually some
efficient variable length coding such as Huffman Coding is used.
The encoding part is important because in the transformed domain, there are more than 104
levels in the image and we need at approximately 16 bits to represent each pixel. Thus
although the size of the image is small, the number of bits required will be double compared
to a standard image (8-bit). If we reduce the number of number quantization levels then
quality suffers.
Here I have used in-built functions to perform DCT and IDCT. Students are encouraged to
write their own programs to perform these using equations 6 and 7 respectively.

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

JPEG Compression

Figure 4: Block diagram of a JPEG encoder.

JPEG Compression is the name given to an algorithm developed by the Joint


Photographic Experts Group whose purpose is to minimize the file size of photographic image files. JPEG images are used extensively in the internet. Nowadays most
of the commercially available digital cameras including mobile phone cameras gives
JPEG images. The extension of JPEG image files is .jpeg or in most cases .jpg.
1. The RGB image is first converted to YIQ or YIV format and subsample color.
2. Divide the whole image into 88 blocks
3. Perform DCT of each 88 blocks
Note: The results of a 64-element DCT transform are 1 DC coefficient and 63 AC
coefficients. The DC coefficient represents the average color of the 8x8 region. The
63 AC coefficients represent color change across the block. Low-numbered
coefficients represent low-frequency color change, or gradual color change across the
region. High-numbered coefficients represent high-frequency color change, or color
which changes rapidly from one pixel to another within the block. These 64 results

This tutorial is downloaded from https://sites.google.com/site/enggprojectece

are written in a zig-zag order as follows, with the DC coefficient followed by AC


coefficients of increasing frequency.

4. Quantize the image in DCT format using pre-defined quantization table


5. Encode DC part of the signal using DPCM (differential pulse code modulation)
Thus only the difference between the DC parts of two successive 88 blocks will be
quantized.
6. Encode the AC parts using RLE (Run Length Encoding) It is to be noted that as an
88 block is traced in zig-zag manner to make RLE more efficient.
Run-length encoding (RLE) is a very simple form of data compression in which
runs of data (that is, sequences in which the same data value occurs in many
consecutive data elements) are stored as a single data value and count, rather than as
the original run.
For example:
222233333334444555555111
The above stream will be encoded as:
4273446531
As the image is traced in zig-zag manner, more neighboring pixels will come together
and many of them are likely to have same values after quantization; which will make
the use of RLE more effective.

7. Finally the entropy encoder uses Huffman coding to optimize the entropy.

Part 2 will cover Image compression using Discrete Wavelet Transform and JPEG 2000

You might also like