You are on page 1of 8

A Survey on FPGA Hardware Implementation for Image

Processing
immediate
March 25, 2015

Abstract

Even though FPGAs are not new, they have


reached a large logic density over the years, and
have become a useful parallel platform for image processing. Also, FPGAs are loaded with
logic circuits providing reconfiguration flexibility.
Novice users of FPGAs often think that this advantage consists in real high operation frequency,
nevertheless FPGAs are limited to a several hundreds of MHz. On the contrary, a CPU frequency
is around GHz.
The reason why FPGA can outperform other
processors is that FPGA is a real parallel processor, for example it will take 5 operation cycles for
a CPU to fish and addition, while for FPGA, it
will only take one operation cycle to finish the
operation.
This paper is divided in X number of sections.

Image Processing algorithms implemented in


FPGA

Introduction

Image processing algorithms are used to solve


many issues these days. From medical to military applications, image processing has become
an interesting area of study for developing faster
and better algorithms. Many of these applications require real-time processing; as image sizes
get larger, using software solutions makes the response slower, and thats where hardware implementation turns up. CPU is the most popular
hardware for image processing, but real-time applications are less realizable due to the image
size, data width or user interruptions. In order to
enhance the performance of a hardware processor, two issues need to be considered: Parallel operation and increase operation frequency. DSPs
and GPUs are examples of circuits exclusively
designed to enhance parallel processing, but they
have been developed to provide a predefined set
of operations, not being able to work in a specific
designed application of image processing [1].

1.1

Introduction of FPGAs

1.2

Advantages and limitations of implementing Image processing algorithms in hardware

In order to implement a design in hardware, several methods can be chosen. FPGAs for example,
have hundred of thousands of logic gates embedded in a single chip. Besides, a user can program
1

clustering. The performance earned by the FPGAs was not too large (5 to 15) times. The efficiency of the FPGAs are limited by size of the
FPGAs and the memory bandwidth. (The image
data is too large to be stored in an FPGA.)

an FPGA design in considerably less time than


the required for the production of other high level
integrated systems (such as ASICs). Furthermore, FPGAs can be fully tested after design
and manufacture. Image processing speed has
become a bottleneck to further improvement of
real-time processing systems.
FPGAs are programmed in low level hardware
description languages, such as VHDL or Verilog.
Most of the software developers are not related
of circuit design, nor hardware languages which
functionality relies in synchronization and timing. [REFERNCIA A Accelerated IMAGE PROCESSING FPGA 2003]. Moreover, the simulations that are yet slow when just a fraction of
time is emulated.

2.2

Implementing Image Processing


Algorithms in FPGA Hardware
(2013)

In this work, the author describes different image


processing algorithms that include filtering and
enhancement implemented in a Xilinx Spartan 6
FPGA on Nexys3. Several image processing algorithms need to perform dozens of operations on
every pixel. Thus results in a heavy load to handle in a single serial processor. The algorithms
used in this work can be grouped into a category
1.3 Image
processing
algorithms called Windowing operators. These techniques
based on FPGA hardware
take a group of neighbouring pixels, called windows, and depending on the algorithm, calculates
2 Methods
a new value for the pixel in the center.
Using 3x3 and 5x5 windows, the author focus
2.1 How Fast is an FPGA in Image on developing hardware implementations of popProcessing? (2008)
ular image processing algorithms such as:
The FPGAs high performance comes from:

Median Filter

High parallelism in applications in image


processing

Smoothing Filter
Sobel Edge Detection

High ratio of 8-bit operations

Motion Blur

A large number of internal memory banks of


FPGAs which can be accessed in parallel.

Emboss filter
The images utilised in this article were of
585x450 pixels, but they claim that images of
any size can be used, using the proper hardware,
and the author also say that using the window
generator described many other algorithms can
be added easily.

The objective of this work was to reach the


best performance by reducing the number of
operations and memory accesses. Three applications were implemented in this paper: Twodimensional filters, stereo-vision and k-means
2

2.3

A Programmable Image process- spection of ceramic tiles is performed based on


the presence of the chromatic difference in the
ing System Using FPGA
tonality of surface, faulty edges, or presence of
cracks.

In this work, a flexible programmable image processing system is proposed. This system includes
the integration of DSP and FPGA to deal with
bit-level operations and arithmetic operations
found in image processing algorithms. They describe a systolic system (a pipeline array architecture synchronised by a clock signal that calculates operations). These characteristics can be
achieved with an FPGA such as Xilinx FPGA
(in this case they used 2 Xilinx 2090-100). The
system needs an IBM PC AT computer, working
as a host that gathers the data in a memory unit
(FIFO).
A 1-D median filter of window size of 5 was
implemented for the removal of impulsive noise
from signals. In the results showed in this article, an input image corrupted by Sand & pepper
noise, and the result is an image with a Peak
signal-to-noise-ratio improved by 10 dB.

2.4

Line capturing starts when the tile sensor activates the scan camera. Pixels are sent as 3.3v
signals, working with CMOS technology. Then,
the ceramic tile scanned image data is transferred
to the FPGAs SRAM Memory in 1024x8bit for
a single scanned line (Gray pixels are stored as
8-bit data). The data bus is also 8 bits long,
and is used to deliver the 8-bit pixel data to the
SRAM controller, and then is transferred to an
XGA block used for image displaying.

The ceramic tile surface defects could be determined by detecting a malfunction in the output
pixel intensity levels. The threshold of these levels are previously defined for light, and for dark
intensity. Also, a simple edge defect detection algorithm is considered with white tile surface imCeramic Tiles Failure Detection ages (Comparing the white color of the tile with
Based on FPGA image Processing the dark background).

(2009)
This article takes an industrial approach of image processing algorithms; where computer visual diagnosis is used to classify tiles according
to surface and edge defects, implemented in an
FPGA-based embedded hardware digital design.
The whole systems consists in acquiring an image from a camera that is aligned to the failure
detection line, and marking the faulty tiles for a
final inspection.
Normally, the visual inspection is performed
by humans, but using a system with complete
automation of the manufacturing process avoids
human based errors . The process for visual in-

An FPGA Xilinx Spartan 3 developer board


was used for the digital design implementation.
The main part of this digital design consists in
the definition of a finite state machine (FSM).
The author compares the time needed to verify a
ceramic tile using the same algorithm in C++
language; the experiment is done on PC with
T7300 processor under Windows XP. Moreover,
the FPGAs expected performance were calculated on a frequency of 75MHz. The result of
this work for detecting defects in tiles is 6 times
faster than the standard PC based algorithm, implemented in C++ language.
3

2.5

The Platform of Image Acquisi- for standard VGA (640x480), with an operating
tion and Processing System Based frequency of 125.59 MHz. The face detection is
ensured to be generated every clock cycle after
on DSP and FPGA (2008)
the first pipeline is completed.
The author compares his work with some others (PONER LAS REFERENCIAS), but claims
that his algorithm is faster.

In this paper an hybrid system using an Alteras


FPGA and a digital signal processor of Texas
Instruments is presented. The scope of the applications proposed by the author go from image
enhancement to image segmentation.
A large FIFO is designed of the FPGA, using the RAM of the Altera board. An image is
captured by a high performance Charge-Coupled
Device (CCD) sensor, then the analog data is
converted into digital data after pre-processing,
which is transferred into the FIFO in FPGA, and
then to the DSP. The DSP is used to perform the
algorithms for image processing in parallel. In
this work, Altera Quartus II was used to design,
simulate and synthesize the VHDL models.

2.6

2.7

FPGA Implementation of the


LRU Algorithm for Video Compression (1994)

Image and video compression are typical applications for HDTV, teleconferencing, multimedia
communications etc. The purpose of video and
image compression is to decrease the numbers of
bits used to represent an image while the quality
stays acceptable.
In this work, the author presents an implementation in FPGA of the Least Recently Used
(LRU) algorithm in Cache based Vector Quantization for constant quality and fixed bit rate
video transmission applications. The operation
frequency of the chip was 16 MHz, and is stated
that such frequency is enough for real-time execution of the CVQ algorithm.

Design and Implementation of


a Pipelined Datapath for HighSpeed Face Detection Using
FPGA (2012)

In this work an algorithm for face detection is


described using cascades of boosted classifiers,
implemented in a pipeliined datapath in FPGA.
A 16 level image pyramid is generated from
the input image to simultaneously identify faces
with varied sizes. The image is downscaled and
then transferred to the first stage of the cascade classifiers. By following this method, the
resource utilisation of the FPGAs is reduced
to one-eight, compared to the full parallel algorithm, this without accuracy loss.
The hardware used for this implementation
was the Xilinx Virtex-5 LX330 FPGA. The performance of this method is 307 frames per second, careless of the number of faces in the image

2.8

A Board System for High-Speed


Analysis and Neural Networks
(1996)

In this paper, the author implement neural networks of diverse sizes and architectures in an
FPGA controller, for applications that involve
text location, character recognition, and noise removal from an image that contains text.
The system used requires an external controller to generate the adresses for the code memory, and the calculation for transferring the data
from and to the state memory. This interface
4

mance of these implementation was compared


with the existing solutions and the "high speedup and efficiency have been attained for the parallel implementation".

controller is integrated bye four Xilinx 4005PG156 field programmable gate arrays. In the results, the optical character recognition algorithm
reaches a speed of approximately 1000 characters
per seconds; this is 10 to 100 times faster than an
implementation with a microprocessor (SPARC
Station 10).

2.9

2.11

Design and implementation of


a high level programming enviroment for FPGA-based image
processing (2000)

A Real-Time Matching System


for Large Fingerprint Databases
Another implementation of image processing al(1996)

gorithms using high level programming enviroment and FPGA is described in this paper. In
one side, the programming model of the system is
a PC programmed in C++. On the other hand,
the FPGA acts as the coprocessor for the algebra
of the image processing algorithms to carry out
some basic operations (convolution, neighbouring, etc).
The basic instructions of the coprocessor can
be described by a static window with preset
weights. Some of this instructions include Multiplication, Accumulation, Maximum and Minimum, and several neighbouring operations can
be done. The features needed to generate a new
image with this systems include dimension of the
FPGA Implementations of Fast image (256x256), 3x3 window size, a 16-bit pixel
Fourier Transforms for Real- size and the weights of the neighbourhood winTime and Signal Processing dow.

Databases of fingerprints are characterised by


their large size and bad quality query images.
This work presents a method or indexing large
databases of fingerprint, implemented on Splash
2, a field programmable gate array processor to
nearly match an ASIC speed. Index-based object
recognition has become popular within the vision
computer community, and specific characteristics
from an image are compared with features in the
model of database objects. Using Splash 2, "the
pattern matching under the conditions described
earlier can be executed at the rate of 110,000
matches per second".

2.10

(2005)
2.12

Programming FPGAs require skilled users to


have a detailed knowledge of the architecture of
the device used and is done in a very low level.
In this paper 1-D and 2-D FFTs using HandelC (Parametrisable structural language similar to
VHDL) code on a Celoxica RC1000 PCI-based
FPGA development board. According to the
mathematical model, the algorithm has been implemented for parallel 2-D FFT. The perfor-

Applying an XC6200 to RealTime Image Processing (1998)

Some FPGAs have a microprocessor embedded


and can be partially reconfigured in the operation. Although in this work a two-dimensional
discrete cosine transform (2D DCT) is implemented, this system is able to perform real-time
image processing applications. The design was
implemented in a 78x64 block within the XC6200
5

FPGA series from Xilinx, using 30% of the total intersection units. Filters and shifters work efchip area (128x128 cells), with a performance of ficiently in hardware, helping the achievement
of real-time applications. In the authors im2 billion operations per second.
plementation in FPGA, 50 comparisons per sec2.13 Combined Line-Based Architec- ond were made, working with a 35MHz clock
ture for the 5-3 and 9-7 Wavelet frequency device. FPGA implementation makes
Transform of JPEG2000 (2003) this application convenient for the industry.The
processing unit that consumes more time is the
Another work that deals with image compression histogram generator, being that the image must
is [REFERENCIA]. The author describes a hard- be fully read. This issue is solved by using exterware implementation of a discrete wavelet trans- nal RAM.
form for image compression using the JPEG2000
standard.
2.15 Accelerated Image Processing on
The goal is to implement a fast wavelet transFPGAs (2003)
form by processing two lines at a time. This architecture allows fast calculation and minimum In this work a high level language is used for
memory requirements. Using a VIRTEX E1000- the design of hardware. SA-C is a derivation of
8 at 110 MHz, 2 pixels per clock cycle can be the C programming language designed to achieve
decoded.
parallelism. There are some differences between
The authors claim that the main advantages standard C language and SA-C:
of their system are:
Finds a representation of floating point operations in a fixed point representation, taking advantage of the FPGA to form more
precise circuits.

Minimum area: Using one third of the classical design


Minimum memory requirement

Includes some standard C extensions to provide the FPGA with data parallel mechanisms and "true multi-dimensional arrays"

Pipelined datapath
Genericity: the coefficients used for this
transform can be replaced by other to implement new filters

2.14

Restrics variables to be single assigments

SA-C language makes the FPGAs to be availColour histogram content-based


able to programmers with no experience in hardimage retrieval and hardware imware description languages. In the results, the
plementation (2003)

author states that implementing an edge detector


(Canny and Prewitt) in FPGA using SA-C language, the hardware implementations overcome
the software implemented in a Pentium processor.

A pipelined hardware structure was developed


to improve the operation of a composite colour
image histogram processing using 4 units: histogram generator, normalisation, FIR filter and
6

2.16

Design of image acquisition better than a machine vision system, but will
and
processing
based
on be slower doing the task. The development
of a machine vision system begins with underFPGA(2003)

standing the applications requirements and constraints and proceeds with selecting appropriate
machine vision software and hardware to solve
the task. Also, industrial vision system must be
fast enough to meet the speed requirements of
their application environment.
The author in this work proposes the types of
This system include an Analogic-to-Digital ininspection used in industrial applications:
terface, a FIFO, the sensor controller and other
modules. One of the challenges implementing
this algorithm is to synchronize the clock fre Inspection of dimensional quality: Correct
quency of the FIFO and the image capture.
tolerances, correct shape. Inspection and
In this work, white balance processing and
image denoising methods are implemented in
FPGA. CMOS sensor data transit into RGB
format and storage to SDRAM, and after the
processing, is displayed in to the VGA display.

classification of solder joints of PCBs.

2.17

FPGA-Based Real-Time Image


Segmentation for Medical Systems and Data Processing

Inspection of surface quality: Inspecting objects for scratches, cracks, wear or checking
for proper finish, roughness and textures.

]
A hardware platform is proposed in this work
to implement a 3-D image segmentation algorithm for medical systems. An issue encountered in this kind of algorithms, and moreover,
in other high demanding image processing algorithms, is the large amount of memory needed
and the synchronization of all the parallel processes to make the system more efficient. The
use of DDR SDRAM modules up to 1GB was
needed to work with 266 MSamples/s.

2.18

Inspection of correct assembling


Inspection of structural quality: Checking
for missing components, or for the presence
of foreign or extra objects.
Inspection of accurate or correct operation:
Verification of correct or accurate operation
of the inspected products according to the
manufacturing standards.

Image processing oriented to industrial applications

This work states that the structure of typical


industrial vision systems is:
Authors says that all the decisions are reduced
to the action of confirmation of quality standards
satisfaction. Which is in most cases a binary
(Yes/No).

(Information taken from [2], paraphrase is in development)


Industrial automation requires innovative solution from machine vision systems. Usually,
quality control and visual inspection are executed by human experts, while they might be
7

Figure 1: Typical industrial inspection system

Conclusions

3.1

Possible information in the conclusions

What kind of image processing algorithms


could be used for industry application
Why FPGAs are a good choice for implementing IPAs
Limitations using FPGAs for IPAs
What has been done already on implementing IPAs on FPGAs
Opportunity areas on the field

References
[1] JianXiong, Q.M. Jonathan Wu (2010). An
Investigation of FPGA Implementation for
Image Processing
[2] Elias N. Malamas, et. al (2003). A survey on
industrial vision systems, applications and
tools Image and vision computing, 21:171
188.

You might also like