Rough Copy Thesis

A Project report on
VLSI Implementation of High Efficient 64 Tap Fixed-

Point DLMS Adaptive Filter
Submitted in partial fulfillment of the requirements
For the degree of
MASTER OF TECHNOLOGY
In
ELECTRONICS AND COMMUNICATION ENGINNERING
With specialization
VLSI
By
V.L.S. Mounika Devi (146M1D5708)
Under the esteemed guidance of
Mr.P.Srinivas, M.Tech.
Associate Professor, Department of ECE
Department of Electronics and Communication Engineering

BONAM VENKATA CHALAMAYYA COLLEGE OF ENGINEERING
(Approved by A.I.C.T.E New Delhi and Affiliated to JNTU, KAKINADA)
PALACHARLA, RAJAHMUNDRY, EAST GODAVARI DIST. (A.P)
2014-2016
DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING
BONAM VENKATA CHALAMAYYA COLLEGE OF ENGINEERING
(Approved by A.I.C.T.E New Delhi and Affiliated to JNTU, KAKINADA)
PALACHARLA, RAJAHMUNDRY, EAST GODAVARI DIST. (A.P)
BONAFIDE CERTIFICATE
This is to certify that this dissertation work entitled VLSI

Implementation of High Efficient 64 Tap Fixed-point DLMS
Adaptive Filter being submitted by V.L.S.Mounika Devi (Regd. No:
146M1D5708) in partial fulfillment of the requirements for the award of
Master of Technology with VLSI as specialization is a record of
bonafide work carried out by her, during the academic year 2014 - 2016.
Project Guide Head of the Department

Mr. P.SRINIVAS, M.Tech Mr. G.RAVIKANTH, M.Tech, (PhD)
Associate Professor, Dept.of ECE Associate Professor & HOD, Dept of ECE
Rajahmundry. Rajahmundry.
External Examiner
DECLARATION BY THE CANDIDATE
I V.L.S.MOUNIKA DEVI (Reg.No: 146M1D5708) hereby declare

that this project report on VLSI Implementation of High Efficient 64 Tap Fixed-
point DLMS Adaptive Filter submitted in the department of Electronics and
Communication Engineering (ECE), Bonam Venkata Chalamayya College of
Engineering, Rajahmundry, in partial fulfillment of the requirement for the award of
the degree of Master of Technology in VLSI is a bonafide record of my own work
carried out under the supervision of Mr. P.SRINIVAS , Associate Professor , Dept of
ECE, Bonam Venkata Chalamayya College of Engineering, Rajahmundry.
Also, I declare that the matter embodied in this project work has not been
submitted for the award of any degree/diploma of any other institution or university
previously.
(Signature of the candidate)

V.L.S.MOUNIKA DEVI
(146M1D5708)
ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of every task

during my dissertation would be incomplete without the mention of the people who
made it possible. I consider it my privilege to express my gratitude and respect to all
who guided, inspired and helped me in completion of my project work.
I extend my heartfelt gratitude to the Almighty for giving me strength in
proceeding with this project titled VLSI implementation of High Efficient 64 Tap
Fixed-point DLMS Adaptive filter.
I am extremely thankful to my project guide Mr. P.SRINIVAS, Associate
Professor, ECE Department, Bonam Venkata Chalamayya College of Engineering,
Rajahmundry, for his co-operation who provided the desired expert guidance and
endless support throughout this complex project.
I am extremely thankful to my supporting project guide Mr.G.RAVIKANTH,
Head of the Department of ECE, Bonam Venkata Chalamayya College of
Engineering, Rajahmundry, for his cooperation who provided the desired endless
support throughout this complex project.
I express my heartfelt thanks to Dr. M.ANJAN KUMAR, Principal, Bonam

Venkata Chalamayya College of Engineering, Rajahmundry, for giving me this
opportunity for the successful completion of my degree.
I owe my special thanks to the Management of our college for providing
necessary arrangements to carry out this project.
I would like to express my profound sense of gratitude to all the faculty
members for their cooperation and encouragement throughout my course.
Above all, I thank my parents. I feel deep sense of gratitude for my family
who formed part of my vision. Finally I thank one and all that have contributed
directly or indirectly to this thesis.
V.L.S.MOUNIKA DEVI
(146M1D5708)
Abstract
Adaptive filters, as part of digital signal systems, have been widely used, as well as in
applications such as adaptive beam forming, adaptive noise cancellation, system identification
and channel equalization. In this thesis, we proposed an efficient architecture technique of a
delayed least mean square (DLMS) adaptive filter. In order to achieve lower adaptation-delay
and area-delay-power efficient implementation, the proposed adaptive filter architecture consists
of two main computing blocks, namely the error computation block and weight update block. We
are propose and fixed point implementation scheme architecture with bit level clipping From
synthesis results, we find that the proposed design offers less area-delay product (ADP) and less
energy-delay product (EDP) than the best of the existing structures. This proposed system is
implemented in VHDL code synthesized by Xilinx and simulated using Modelsim.
Tools:
Xilinx ISE14.2
Modelsim 6.4b
Language:
VHDL
CONTENTS
ACKNOWLEDGEMENT
ABSTRACT
LIST OF FIGURES
LIST OF TABLES
CHAPTER Page No
1: INTRODUCTION 1-7
1.1 Introduction 1
1.2 Literature review 4
1.3 Limitation of previous work 6
1.4 Motivation and Scope 6
1.5 Problem Definition 7
1.6 Organisation of the thesis 7
2: DIGITAL FILTERS 8-18

2.1 Introduction 8
2.2 Types of Filters 10
2.2.1 FIR Filter 10
2.2.3 IIR Filter 11
2.3 Adaptive Filter 13
2.3.1 Introduction 13
2.3.3 LMS Algorithm 14
2.3.4 DLMS Algorithm 18
3: DESIGN METHODOLOGY 19-26

3.1 DLMS Algorithm 19
3.2 Proposed DLMS Algorithm 20
3.3 Error Computation Block 21
3.3.1 Structure of PPG 21
3.3.2 Structure of AOC 22
3.3.3 Structure of Adder-Tree 23
3.4 Weight- Update Block 24
3.5 Adaption Delay 26
4: DESIGN IMPLEMENTATION 27-37

4.1 Fixed Point Implementation 27
4.2 Blocks Requirement 29
4.3 VLSI Tools 31
4.3.1 VHDL 31
4.3.2 Xilinx ISE 33
4.3.3 Design Flow 34
4.4 Simulation flow 35
4.5 Simulation library compilation wizard 35
4.6 ModelSim 36
5: SIMULATION RESULTS 38-42

5.1 Simulation Output 38
5.2 Synthesis Report 39
5.3 Power Report 39
5.4 Area Report 40
5.5 Delay Report 40
5.6 RTL View 41
5.7 RTL Schematic 42
6: ADVANTAGES & APPLICATIONS 43-45

6.1 Advantages 43
6.2 Applications 43
CONCLUSION AND FUTURE WORK 46
REFERENCES 47-48
APPENDIX (A): SOURCE CODE 49-53
APPENDIX (B): COPY OF PUBLISHED PAPER 54

LIST OF FIGURES
Figure No Name of the Figure Page No
2.1 FIR filter 10

2.2 IIR filter 11
2.3 Adaptive filter 13
2.4 Adaptive filter with LMS algorithm 15
2.5 LMS filter algorithm 16
2.6 Conventional delayed LMS adaptive filter 18
3.1 DLMS adaptive filter 19
3.2 Modified DLMS adaptive filter 20
3.3 Proposed structure of error computation block 21
3.4 Proposed structure of PPG 22
3.5 Structure & function of AND OR cell 23
3.6 ` Adder structure of filtering unit 24
3.7 Proposed structure of Weight-update block 25
4.1 Fixed point representation 27
4.2 Internal component structure 30
4.3 Summary of VHDL design flow 33
4.4 Xilinx design flow 34
4.5 Simulation flow 35
4.6 ModelSim simulator window 37
5.1 Simulation waveform 38
5.2 Synthesis report 39
5.3 RTL view 41
5.4 RTL schematic 42
6.1 LMS adaptive beamforming network 44
6.2 Digital transmission using channel equalization 45
LIST OF TABLES
Table list Page No
Table 4.1: Fixed-Point representation of DLMS adaptive filter 35

Table 5.1: Comparison of power consumption 39
Table 5.2: Comparison of area report with slices 40
Table 5.3 : Comparison of results 41
VLSI Implementation of High Efficient 64 Tap Fixed-point DLMS Adaptive Filter
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Historically, analog chip design yielded smaller die sizes, but now with the noise
associated with modern sub micrometer designs, digital designs can often be much more
densely integrated than analog designs. This yields compact, low power and low cost digital
designs so modern programmable DSP was developed with more sophisticated functions.
VLSI: VLSI stands for "Very Large Scale Integration". This is the field which involves
packing more and more logic devices into smaller and smaller areas. VLSI, circuits that
would have taken board furls of space can now be put into a small space few millimetres
across! VLSI circuits are everywhere ... your computer, your car, your brand new state-of-
the-art digital camera, the cell-phones, and what have you. All this involves a lot of expertise
on many fronts within the same field, which we will look at in later sections.
The way normal blocks like latches and gates are implemented is different from what
students have seen so far, but the behaviour remains the same. All the miniaturization
involves new things to consider. A lot of thought has to go into actual implementations as
well as design.
Circuit Delays: Large complicated circuits running at very high frequencies have one big
problem to tackle - the problem of delays in propagation of signals through gates and wire
even for areas a few micrometers across! The operation speed is so large that as the delays
add up, they can actually become comparable to the clock speeds.
Power: Another effect of high operation frequencies is increased consumption of power. This
has two-fold effect - devices consume batteries faster, and heat dissipation increases. Coupled
with the fact that surface areas have decreased, heat poses a major threat to the Stability of
the circuit itself.
Layout: Laying out the circuit components is task common to all branches of electronics.
Whats so special in our case is that there are many possible ways to do this; there can be
multiple layers of different materials on the same silicon, there can be different arrangements
of the smaller parts for the same component and soon. The choice between the two is
Dept. of ECE BVC College of Engineering, Palacharla Page 1

determined by the way we chose the layout the circuit components. Layout can also affect the
fabrication of VLSI chips, making it either easy or difficult to implement the components on
the silicon.
Electronics took birth in 1897 when J.A. Fleming developed a vacuum diode. Useful
electronics came in 1906 when vacuum triode was invented by Lee De Forest which made
electrical amplification of weak radio signals and audio signals possible with a non-
mechanical device. Later, around 1925, tetrode and pentode vacuum tubes were developed.
These tubes dominated the field of electronics till the end of World War II. Until 1950 this
field was called "radio technology" because its principal application was the design and
theory of radio transmitters, receivers, and vacuum tubes. The era of semiconductor
electronics began with the invention of the junction transistor in 1948 at Bell Laboratories.
Soon, the transistors replaced the bulky vacuum tubes in different electronic circuits.
When engineers tried to build complex circuits using the vacuum tube, they
quickly became aware of its limitations. The first digital computer ENIAC, for example,
was a huge monster that weighed over thirty tons, and consumed 200 kilowatts of
electrical power. It had around 18,000 vacuum tubes that constantly burned out, making it
very unreliable.
When building a circuit, it is very important that all connections are intact. If not,
the electrical current will be stopped on its way through the circuit, making the circuit
fail. Before the integrated circuit, assembly workers had to construct circuits by hand,
soldering each component in place and connecting them with metal wires. Engineers soon
realized that manually assembling the vast number of tiny components needed in, for
example, a computer would be impossible, especially without generating a single faulty
connection. Another problem was the size of the circuits. A complex circuit, like a computer,
was dependent on speed. If the components of the computer were too large or the wires
interconnecting them too long, the electric signals couldn't travel fast enough through the
circuit, thus making the computer too slow to be effective. Advanced circuits contained
so many components and connections that they were virtually impossible to build. This
problem was known as the tyranny of numbers.
It was Kilby's idea to make all the components and the chip out of the same block
(monolith) of semiconductor material. Kilby presented his new idea to his superiors. He was
allowed to build a test version of his circuit. In September 1958, he had his first integrated
circuit ready.

Moores Law: It states that the number of transistors in an integrated circuit doubles
approximately for every two years. Integrated circuits are much smaller and consume less
power than the discrete components used to build the electronic systems before the 1960.
Integration allows us to build systems with many more transistors, allowing much more
computing power to be applied to solving a problem. Integrated circuits are much easier to
design and manufacture and are more reliable than discrete systems. Integrated circuits
improve system characteristics in several ways.
Size: Integrated circuits are much smaller both transistors and wires are shrunk to micrometer
sizes, compared to the millimeter or centimeter scales of discrete components, small size
leads to advantages in speed and power consumption, since smaller components have smaller
parasitic resistances, capacitances, and inductances.
Speed: Signals can be switched between logic 0 and logic 1 much quicker within a chip than
they can between chips. Communication within a chip can occur hundreds of times faster
than communication between chips on a printed circuit board. The high speed of circuits on-
chip is due to their small size, smaller components and wires have smaller parasitic
capacitance to slow down the signal.
Power: Logic operation within a chip also takes much less power. Once again low power
consumption is largely due to small size of circuits on chip-smaller parasitic capacitances and
resistances require less power to drive them.
Introduction of signal filtering concepts: A filter is a device or process that removes some
unwanted components from a signal. Filtering is a class of signal processing, the defining
feature of filters being complete or partial suppression of some aspect of the signal.
There are many different bases of classifying filters and these overlap in many
different ways. There is no simple hierarchical classification. Filters may be
Linear or Non-linear
Time invariant or Time Variant
Analog or digital
Passive or active etc...
Filters are essential to the operation of most electronic circuits. Filters used for direct
filtering can be either Fixed or Adaptive. Adaptive filters are widely used in Communication

and Digital Signal Processor applications. Adaptive filtering can be used strictly for analysis
or synthesis of a system. Adaptive filter is a self adjusting the filter coefficients according to
an adaptive algorithm. It is useful where the complete knowledge of environment is not
available. Adaptive filters are commonly classified as: Linear: Estimate of quantity is
computed as a linear combination of observations applied to filter input. Non Linear: Neural
Networks.
There are two types of linear filter structures. 1. FIR 2. IIR
Most adaptive filters are implemented as FIR filters, because they are inherently stable.
Generally, Filter algorithms used are
a. Least Mean Square ( LMS )

b. Delay Least Mean Square ( DLMS )
c. Recursive Least Square ( RLS )
a. LMS: Least mean square algorithms are a class of adaptive filter used to mimic a
desired filter by finding the filter coefficients that relate to producing the least mean
squares of the error signal. Here error signal is the difference between desired and the
actual signal.
b. DLMS: this is the modified form of LMS algorithm. In this algorithm, delay is
updated. For convergence analysis in LMS algorithm we can update the delay called
it as delayed LMS algorithm.
c. RLS: The Recursive least square is an adaptive filter which recursively finds the
coefficients that minimize a weighted linear least squares cost function relating to the
input signals. Here input signals are considered as deterministic. It exhibits extremely
fast convergence.
In this thesis, digital filter can be implementation by using DLMS algorithm with a fixed
point representation.
1.2 LITERATURE REVIEW
In order to start the thesis, the first step is study the research papers that have been
previously performed by other researchers, papers related to this work are chosen and
studied.

A modular pipelined implementation of a delayed LMS transversal adaptive filters

[3], various systolic architectures have been implemented using the DLMS algorithm. They
are mainly concerned with the increase the maximum usable frequency. Problem with these
architectures was they were involving a large adaptation delay. This delay is of ~ N cycles for
filter length N, which is quite high for large order filters.
Virtex FPGA implementation of a pipelined adaptive LMS predictor for electronic

support measures receivers,[11] Ting et al. have proposed a fine-grained pipelined design.
Pipelining is applied to multipliers to reduce the critical path. Rich register architecture of
FPGA can allow pipelining at CLB level, i.e., fine grained pipelining. Thus Virtex FPGA
technology is used. Each CLB acts as a 1 bit adder. Various sized ripple carry adders are
allowed by dedicated array logic. This design limits the critical path to the maximum of one
addition time and hence supports high sampling frequency. But as large numbers of pipeline
latches are being used it involves a lot of area overhead for pipelining and higher power
consumption. Also the routing of FPGA adds very large delay.
An efficient systolic architecture for the DLMS adaptive filter and its applications,
Tree methods enhance the performance of adaptive filter but they lack in modularity, local
connection. Also with the increase in tree stages critical period also increases. In order to
achieve a lower adaption delay again, Van and Feng have proposed a systolic architecture,
where they have used relatively large processing elements (PEs). The PE combines the
systolic architecture and tree structure to reduce adaption delay. But it involves the critical
path of one MAC operation.
The existing work on the DLMS adaptive filter does not discuss the fixed-point
implementation issues, e.g., location of radix point, choice of word length, and quantization
at various stages of computation, although they directly affect the convergence performance,
particularly due to the recursive behaviour of the LMS algorithm. Therefore, fixed-point
implementation issues are given adequate emphasis in this paper. Besides, we present here
the optimization of our previously reported design to reduce the number of pipeline delays
along with the area, sampling period, and energy consumption. The proposed design is a
64tap fixed point implementation. The proposed design is found to be more efficient in terms
of the power-delay product (PDP) and energy-delay product (EDP) compared to the existing
structures.

The block diagram of the DLMS adaptive filter is shown in chapter 2, where the
adaptation delay of m cycles amounts to the delay introduced by the whole of adaptive filter
structure consisting of finite impulse response (FIR) filtering and the weight-update process.
It is shown in that the adaptation delay of conventional LMS can be decomposed into two
parts: one part is the delay introduced by the pipeline stages in FIR filtering, and the other
part is due to the delay involved in pipelining the weight update process
1.3 LIMITATIONS OF PREVIOUS WORKS

From the above discussion it is clear that DLMS algorithm is the field in low power
technology. The DLMS algorithm is, therefore, being used for the design of the adaptive
filter. The following work has been done in design of the adaptive filter:
In conventional DLMS algorithm, two conventional blocks are used and it consumes
more little power. Purposes of designing DLMS adaptive filter are to give high throughput,
dissipate less power efficient computations with minimum time consuming of
combinational blocks. Due to its significance, several DLSM algorithm filter circuits have
already been proposed. but existing design has less accuracy.
In the proposed work, increase the filter length up to 64.That means increase throughput
and to achieve high accuracy. It is found that the area, delay, power can further be reduced
by design of adaptive filter that is proposed in this work.
1.4 MOTIVATION AND SCOPE
Motivation The primary motivation for adopting filter design lies in the fact that it can
provide a logic design methodology for designing ultra-low power circuits beyond K*T*ln2
limit for those emerging nanotechnologies in which the energy dissipated due to information
destruction will be significant factor of the overall heat dissipation. For achieving low power
adaptation delay and area-delay-power efficient implementation we use combinational
blocks.
The motivation of our research is to develop an algorithm for area-delay-power efficient

adaptive filter circuit with better improvements in area delay product and energy delay
product.

Scope
Research is also being carried out in developing digital signal processors which use
adaptive filter for computational purposes. Certain filters have also been realized for the
same. Similar development has also taken place in digital signal processing for channel
equalization. In differential power analysis, the cryptographic machines will not dissipate
heat due to which the intruders will not be able to perform power analysis or timing attacks.
VLSI signal processing is the upcoming field in any communication techniques which
promises to increase the speed of computation, reduce area and power dissipation. In this
way designing circuits in this domain will be promising in near future.
1.5 PROBLEM DEFINITION

Problems identified are with respect to designing basic adaptive filter circuit which forms
the foundation for designing the filter with adaptation delay. Some challenges include
designing standard logic gate structures with power efficient which may satisfy the filter
characteristics, reduced power dissipation for account of bit recall and storage, lesser area
occupancy of the realized circuit in pipelined structure implementation.
1.6 ORGANISATION OF THE THESIS

The report initially in chapter 1 introduces the reader with the work which is done
so far and their limitations. The chapter 2 deals with basics of digital filters and its
types and also explained theory behind the project. The actual design methodology
using selected features of the tool is given in chapter 3. The design implementation
can be explained in chapter4. The result is presented and discussed in 5th chapter
which shows the improved achieved in results. The 6th chapter concludes the project
report. Finally, the references for the project are also provided.
Chapter 1 provides the introduction of the thesis which about filters and VLSI
circuits.
Chapter 2 provides the information about basic digital filters which includes mainly
LMS adaptive filter.
Chapter 3 gives the basic design methodology of the project thesis.
Chpater 4 gives the complete information about an implementation of the project.
Chapter 5 and 6 explains about results and applications of the given thesis.

CHAPTER 2
DIGITAL FILTERS
2.1 Introduction
Digital filters are typically used to modify or alter the attributes of a signal in the time
domain or frequency domain. The most common digital filter is the linear time -invariant
(LTI) filter. An LTI interacts with its input signal through a process called linear
convolution, denoted by y= f*x where f is the filters impulse response, x is the input signal
and y is the convolved output. The linear convolution process is formally defined by
y[n] = x[n] *f[n]
= f[k]x[n-k]
LTI Digital filters are generally classified as being finite impulse response (FIR) or
Infinite impulse response (IIR). The FIR filter consists of a finite number of sample values,
reducing the above convolution sum to finite sum per output sample instant. An IIR filter
however requires that an infinite sum be performed. An FIR design and implementation
methodology is discussed in this thesis.
The studying of digital filters is found in their growing popularity as primary DSP
operation. Digital filters are rapidly replacing classic analog filters, which were implemented
using RLC components and operational amplifiers. Analog filters were mathematically
modelled using ordinary differential equations of Laplace transforms. They were analyzed in
the time or s domain. Analog prototypes are now only used in IIR design, while FIR is
typically designed using digital computer specifications and algorithms.
In this thesis, it is assumed that a digital filter has been designed and selected for
implementation. The major components of digital filter as been identified below.
The Input x(n)
The input of a digital filter is a series of discrete samples obtained by sampling the
Input waveform. The sampling rate must meet the Nyquist criteria that we covered in our
sampling lecture (highest frequency of input signal </ = 2 x sampling frequency). The
term x(n) means the input at a time (n).

Z -1
Z-1 represents a time delay that is equal to the sampling period. This is also called
a unit delay. Therefore, each z box delays the samples for one sampling period. In the
diagram, this is shown by the input going into the delay box as x(n) and coming out as
x(n-1). We see this because x (n) means the input at a time (n), and x(n-1) means the input
at time (n-1). What actually happens is that x (n-1) is the previous input that has been
saved in the memory of the DSP.
Filter Taps and Weights
The output of each delay box is called a tap. Taps are usually fed into scalars which scale
the value of the delayed sample to the required value by multiplying the input (or delayed
input) by a coefficient. In the diagram, these are marked as b0, b1 and b2. The scaling factor
is called the weight. In mathematical terms, the weight is multiplied by the delayed input, so
the output of the first tap is b0*x(n). The next tap output will be b1*x(n-1), and the output of
the last tap is b2*x(n-2).
Summing Junctions
The outputs of the weights are fed into summing junctions, which add the weighted,
delayed, forward-fed forward outputs from taps. So in this example, the output of the first
summing junction is b0*x(n) + b1*x(n- 1). At the next summing junction, this is added to the
output of the final tap, giving b0*x(n) + b1*x(n-1) + b2*x(n-2), which is the output.
The Output y(n)
The output of a digital filter is a combination of a number of delayed and weighted samples,
and is usually called y(n).
The Operation of Digital Filters
In summary, the output is y(n) and the present sample is x(n). The previous samples would
then be: x(n-1) = one unit time delay
x(n-2) = two unit time delay
When x(n) arrives at the input, the taps are feeding the delayed samples to weights
b1 and b2. Therefore sampling at any sampling instant, the value of the output can be
calculated using the weighted sum of the current sample and two previous samples as
follows:
y(n) = b0*x(n) + b1*x(n-1) + b2*x(n-2)
Tools Before we consider more complex digital filters, let us first learn about some
mathematical tools used in digital filtering. This will solidify our understanding of
digital filters and provide a foundation for future learning of more complex subjects.

Impulse Function
An impulse is defined as an idealized rectangular pulse of area 1.0, zero width, and
infinite amplitude. It is typically expressed by an integral as shown on the above diagram.
This is a general formula that allows us to calculate the area under any pulse.
Weighted Impulse Function
Consider the pulse with an amplitude of 3 and a width of 2, as shown on the slide. Using
the same integral to calculate the area under it, we find that it equals 6. A weighted impulse
function is similar to this. It has an area of A and amplitude of infinity. It is represented by the
integral as shown on the diagram. Obviously this is impossible in the real world, but the
weighted impulse function is extensively used in digital signal processing to help explain
DSP techniques. For example, an analog waveform can be represented as a multiplication of
the analog signal with a periodic weighted impulse function whose frequency is equal to the
sampling frequency.
2.2 Types of Filters

2.2.1 FIR Filter
The type of filter just discussed is classified as a Finite Impulse Response or FIR
filter. It is called this because its response to a single impulse is finite. After a defined period
of time (determined by the number of taps) following the impulse the output of the filter will
be zero. This type of filter is also referred to as a transversal, feed forward, all zero, or
moving average (MA) filter.
Fig 2.1: FIR Filter

An FIR with constant coefficients is an LTI digital filter. The output of an FIR of
order or length L, to an input time-series x[n], is given by a finite version of the convolution
sum is
y[n] = x[n] *f[n]

= = f[k]x[n k]
Where f[0] is not equal to zero through f[L-1] not equal to zero are the filters L
coefficients. They also correspond to the FIR impulse response. The Lth order LTI filter is
graphically interpreted in fig shown above. It can be seen consist of a collection of a tapped
delay line, adders and multipliers. One of the operands presented to each multiplier is an
FIR coefficients, referred to as tap-weight. We can call it as transversal filter or tapped
delay line.
2.2.2 IIR Filter

The impulse response of the filter is infinite, that is IIR filter. With a simple addition,
the FIR filter can be transformed into an IIR filter. Refer to the slide above. In addition to the
b coefficients from the FIR filter, a set of coefficients and unit delays are added to feedback
the filters output. These added coefficients are the a coefficients. The result is an IIR or
Autoregressive-Moving Average (ARMA) filter. One can look at the FIR filter configurations
discussed previously as ARMA filters with the a coefficients set to zero. It is also possible
to have an IIR filter with all the b coefficients set to zero. This type of filter is referred to as
an all-pole, feedback, or autoregressive (AR) filter.
Fig2.2: IIR Filter

The impulse response of an IIR filter has infinite length, hence the name infinite impulse
response. The feedback loop makes it possible for an IIR filter to be unstable. It is possible to
check for this instability during the design process, but sometimes a filter that is stable on
paper may become unstable in practice due to round off and truncation in the DSP hardware.
It is important to examine stability issues closely when working with IIR filters. In some
cases, the instability conditions of an IIR filter can be used to advantage in designing
oscillators.
Comparison between FIR and IIR filters

Some comparisons are warranted between FIR and IIR filters. It is easy to design a FIR filter
that has linear phase response in the pass-band; all that is required is that the impulse
response be symmetric. By definition, a stable IIR filter cannot have linear phase. FIR filters
are always stable while IIR filters can be unstable. FIR filters generally have more elements
than an IIR filter for a given frequency response specification assuming that linear phase is
unimportant.
FIR Vs IIR
1. IIR is infinite and used for applications where linear characteristics are not of concern.
2. FIR filters are Finite IR filters which are required for linear-phase characteristics.
3. IIR is better for lower-order tapping, whereas the FIR filter is used for higher-order
tapping.
4. FIR filters are preferred over IIR because they are more stable, and feedback is not
involved.
5. IIR filters are recursive and used as an alternate, whereas FIR filters have become too long
and cause problems in various applications.
So, Most adaptive filters are implemented as FIR filters, because they are inherently stable.
Advantages of Digital filters

One of the major advantages of digital filters is that they are programmable. To
change the cut-off frequency, the roll-off rate, or the phase response, all one must do is
change a few coefficients. Quite easily make major changes, such as converting a low pass
filter into a high-pass filter.
The idea of changing filter characteristics by changing a few coefficients opens

up even wider possibilities. It is possible to design adaptive filters that adapt themselves
to changing conditions. For adaptive filters, a mechanism must be designed to change the
coefficients of the filter in accordance with changing conditions. Such filters are very
useful in modems. Since the properties of telephone lines change continuously, adaptive
filters offer the ideal solution for these environments.
2.3 ADAPTIVE Filter

2.3.1 Introduction
Actually, we discussed filters so far had been used for applications where the
requirements for the optimal coefficients did not change over time, i.e they were LTI
systems. However, many real-world signals we find in typical DSP fields like speech
processing, communications, radar, sonar, seismology, biomedicine require that the optimal
filter or system coefficients need to be adjusted over time depending on the input signal. If
the parameter changes slowly compared with the sampling frequency we can compute a
better estimation for our optimal coefficients and adjust the filter appropriate.
In general, any filter structure FIR or IIR with many architectural variations may be
used as an adaptive digital filter (ADF).
Adaptive filters can now be seen to be DSP field. Adaptive filter is self adjustment
the filter coefficients according to adaptive algorithm. It is very useful if complete knowledge
of environment is not available
Fig 2.3: Adaptive filter

From the above fig 2.3. x[n] = input of the adaptive filter
y[n] = output of the adaptive filter.
d[n] = desired response of the adaptive filter
e[n] = d[n] y[n] = estimation error
With the continuous development of the adaptive algorithm application in the digital signal
processing field, there are many issues to attract everyone's attention, including a large
amount of computation and the difficult to achieve high-speed and real-time. For a long time,
adaptive filtering algorithms are based on the DSP chip, and achieved by the compilation or
high-level language programming procedures. This can be a good way to meet the require-
ments in less demanding situations of real-time, but in the higher real-time requirements
occasion and the harsh electromagnetic environment, it has been unable to meet the proc-
essing speed and robustness and so on. Field Programmable Gate Array (FPGA) can provide
a new method for the hardware implementation of adaptive algorithm through its high
flexibility and integration .
In some unknown signal characteristic conditions, according to some best practices,
from the initial conditions of the known part of the signal characteristic, basing on some
adaptive recursive algorithm, after completing a certain number of recursion, statistical
approximation methods converge to the optimal solution. When the statistical characteristics
of the input signal is unknown, or the statistical properties of the input signal changes ,
adaptive filter can automatically adjust its filter parameters in iteration, required to meet
certain criteria in order to achieve optimal filtering . Thus, the adaptive filter has the ability to
self-regulate and track, so in non-stationary environments, adaptive filtering can be a good
track the changes of the signal.
2.3.2. LMS algorithm

Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a
desired filter by finding the filter coefficients that relate to producing the least mean squares
of the error signal (difference between the desired signal and the actual signal). It is a
stochastic gradient descent method in which the filter is adapted based on the current time
error.

Fig 2.4: Adaptive filter with LMS algorithm
The basic idea behind LMS filter is to update the filter weights to converge to the
optimum filter weight. The algorithm starts by assuming a small weights (zero in most cases),
and at each step, where the gradient of the mean square error, the weights are found and
updated. If the MSE-gradient is positive, the error increases positively, else the same weight
is used for further iterations, which means we need to reduce the weights. If the gradient is
negative, weight need to be increased .Hence, basic weight update equation during the nth
iteration:
Where represents the mean-square error, is the step size, Wn is the weight vector. The
negative sign indicates that, need to change the weights in a direction opposite to that of the
gradient slope.
The mean-square error which is a function of filter weights is a quadratic function which
says that it has only one extreme, which minimizes the mean-square error, is the optimal
weight. The LMS thus, approaches towards this optimal weight by ascending/descending
down the mean square-error verses filter weight curve.
The LEAST MEAN SQUARE (LMS) adaptive filter is the most popular and most
widely used adaptive filter, not only because of its simplicity but also because of its
satisfactory convergence performance. The direct-form LMS adaptive filter involves a long
critical path due to an inner-product computation to obtain the filter output.

An adaptive filter is a computational device that iteratively models the relationship

between the input and output signals of a filter. An adaptive filter self-adjusts the filter
coefficients according to an adaptive algorithm. Figure shows the diagram of a typical
adaptive filter.
The linear filter can be different filter types such as finite impulse response (FIR) or
infinite impulse response (IIR). An adaptive algorithm adjusts the coefficients of the
linear filter iteratively to minimize the power of e(n).
The LMS algorithm is an adaptive algorithm among others which adjusts the
a coefficient of FIR filters iteratively. Other adaptive algorithms include the recursive least
square (RLS) algorithms. The method of the steepest descent is a recursive algorithm for
calculation of the Wiener filter when the statistics of the signals are known (knowledge about
R och p). The problem is that this information is often unknown! LMS is a method that is
based on the same principles as the method of the Steepest descent, but where the statistics is
estimated continuously. Since the statistics is estimated continuously, the LMS algorithm can
adapt to changes in the signal statistics; The LMS algorithm is thus an adaptive filter.
Because of estimated statistics the gradient becomes noisy. The LMS algorithm
belongs to a group of methods referred to as stochastic gradient methods, while the method of
the steepest descent belongs to the group deterministic gradient methods.
Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic
a desired filter by finding the filter coefficients that relate to producing the least mean squares
of the error signal (difference between the desired and the actual signal). It is a stochastic
gradient descent method in that the filter is only adapted based on the error at the current
time.
Fig 2.5. LMS filter algorithm

Relationship to the least squares filter

The realization of the causal Wiener filter looks a lot like the solution to the least
squares estimate, except in the signal processing domain. The FIR least mean squares filter is
related to the Wiener filter, but minimizing the error criterion of the former does not rely on
cross-correlations or auto-correlations. Its solution converges to the Wiener filter solution.
Most linear adaptive filtering problems can be formulated using the block diagram above.
LMS algorithm steps
1. Filter output

y[n] = u[n k] [n]

=
2. Estimation error
e[n] = d[n] - y[n]
3. Tap-weight Adaptation
w[n+1] = w[n] + u[n-k]e*[n]
Create an algorithm that minimizes E{|e(n)|2}, just like the SD, but based on unknown
statistics. A strategy that then can be used is to uses estimates of the autocorrelation matrix R
and the cross correlation vector p. If instantaneous estimates are chosen,
R(n) = u(n)uH(n)
p(n) = u(n)d*(n)
The resulting method is the Least Mean Squares algorithm.

2.3.3 DLMS algorithm

The DLMS calculation, rather than utilizing the recent most input blunder e(n)
relating to the n-th cycle for overhauling the channel weights, it utilizes the postponed
mistake e(nm), i.e. the mistake comparing to (nm)- th emphasis for overhauling the present
weight. The weight-overhaul comparison of DLMS calculation is give
Wn+1 = Wn + e(n-m)x(n-m)
where m is the adaptation delay. The structure of conventional delayed LMS adaptive filter is
shown in figure 2.3. It can be seen that the adaptation-delay m is the number of cycles
required for the error corresponding to any given sampling instant to become available to the
weight adaptation circuit.
Fig 2.6. Conventional delayed LMS adaptive filter
In the delayed LMS algorithm the assumption is that the gradient of the error [n] =e[n]x[n]
does not change much if we delay the coefficient update by a couple of samples., i.e.
[n] = [n]-D. It has been as long as the delay is less than the system order, i.e. filter length,
this assumption is well true and the update does not degrade the convergence speed. Longs
original DLMS algorithm only considered pipelining the adder tree of the adaptive filter
assuming also that multiplication and coefficient update can be done in one clock cycle but
for a FPGA implementation multiplier and the coefficient update requires additional path and
D2 in the coefficient update path the LMS algorithm become
e[n-D1] = d[n-D1] ft[n-D1]x[n-D1]
f[n+1] = f[n-D1-D2] + [n-D1-D2]x[n-D1-D2]

CHAPTER 3
DESIGN METHODOLOGY
3.1 DLMS algorithm
The weight of LMS adaptive filter during the nth iteration are updated according to the
equations below
w[n+1] = w[n] + e[n] . x[n]
where e[n] = d[n] y[n] and y[n] = w[n]T. X[n]. Here input vector is x[n] and the weight
vector w[n] at the nth iteration are
d[n] is the desired response, y[n] is the filter output, and e[n] denotes the error
computed during the nth iteration. is the step-size, and N is the number of weights used in
the LMS adaptive filter.
In case of pipelined designs with m pipeline stages, the error e[n] becomes available
after m cycles, where m is called the adaptation-delay. The DLMS algorithm therefore uses
the delayed error e[n-m], i.e., the error corresponding to (n-m)th iteration for updating the
current weight instead of the recent most error.
The weight-update equation of DLMS adaptive filter is given by
w[n+1] = w[n] + e[n-m] . x[n-m]
Fig 3.1: DLMS Adaptive filter

The block diagram of DLMS adaptive filter is depicted in Fig. 1, where the adaptation-delay
of m cycles amounts to the delay introduced by the whole of adaptive filter structure
consisting of FIR filtering and weight-update process. It is shown in [12] that the adaptation-
delay of conventional LMS can be decomposed into two parts, where one part is the
delay introduced by the pipeline stages in FIR filtering and the other part is due to the delay
involved in pipelining of weight update process.
3.2 Proposed DLMS Algorithm

In the ordinary DLMS calculation the adjustment postponement of m cycles adds up
to the deferral presented by the entire of versatile channel structure comprising of FIR sifting
and weight adjustment process. In any case, rather, this adjustment deferral could be
disintegrated into two sections. One section is the postponement acquainted due with the FIR
separating and the other part is because of the deferral included in weight adjustment. Taking
into account such deterioration of postponement, the proposed structure of DLMS versatile
channel. The calculation of channel yield and the last subtraction to figure the criticism
mistake are converged in the blunder calculation unit to lessen the inactivity of mistake
calculation way. In the event that the inertness of calculation of mistake is n1 cycles, the
blunder processed by the structure at the nth cycle is e(n n1), which is utilized with the data
tests postponed by n1 cycles to create the weight increment term. The weight-overhaul
comparison of the proposed postponed LMS calculation is, in this manner, given
Fig 3.2. Modified DLMS Adaptive filter

There are two main computing blocks in the adaptive filter architecture:
1) The error-computation block
2) Weight-update block.
In this, we discuss the design strategy of the proposed structure to minimize the adaptation
delay in the error-computation block, followed by the weight-update block.
3.3 Error Computation Block

The proposed structure for error-computation unit of an N-tap DLMS adaptive filter is
shown in Fig. . It consists of N number of 2-b partial product generators (PPG)
corresponding to N multipliers and a cluster of L/2 binary adder trees, followed by a
single shiftadd tree. Each sub block is described in detail.
Fig 3.3: Proposed structure of error computation block
3.3.1 Structure of PPG

The structure of each PPG is shown in Fig. 5. It consists of L/2 number of 2-to-3
decoders and the same number of AND/OR cells (AOC).1 Each of the 2-to-3 decoders takes
a 2-b digit (u1u0) as input and produces three outputs b0 = u0 . u1, b1 = . u0 u1, and b2 =
u0 u1, such that b0 = 1 for (u1u0) = 1, b1 = 1 for (u1u0) = 2, and b2 = 1 for (u1u0) = 3. The
decoder output b0, b1 and b2 along with w, 2w, and 3w are fed to an AOC, where w, 2w, and
3w are in 2s complement representation and sign-extended to have (W + 2) bits each. To take
care of the sign of the input samples while computing the partial product corresponding to the

most significant digit (MSD), i.e., (uL1uL2) of the input sample, the AOC (L/2 1) is fed
with w, 2w, and w as input since (uL1uL2) can have four possible values 0, 1, 2, and
1.
Fig 3.4: Proposed structure of PPG
3.3.2 Structure of AOC

The structure and function of an AOC are depicted in Fig. . Each AOC consists of
three AND cells and two OR cells. The structure and function of AND cells and OR cells are
depicted by Fig. (b) and (c), respectively. Each AND cell takes an n-bit input D and a single
bit input b, and consists of n AND gates. It distributes all the n bits of input D to its n AND
gates as one of the inputs. The other inputs of all the n AND gates are fed with the single-bit
input b. As shown in Fig. , each OR cell similarly takes a pair of nbit input words and has n
OR gates. A pair of bits in the same bit position in B and D is fed to the same OR gate.
The output of an AOC is w, 2w, and 3w corresponding to the decimal values 1, 2, and
3 of the 2-b input (u1u0), respectively. The decoder along with the AOC performs a
multiplication of input operand w with a 2-b digit (u1u0), such that the PPG of Fig.
performs L/2 parallel multiplications of input word w with a 2-b digit to produce L/2 partial
products of the product word wu.

Fig 3.5. Structure and function of AND/OR cell. Binary operators and + in (b) and (c)
are implemented using AND and OR gates, respectively.
3.3.3. Structure of Adder Tree

Conventionally, we should have performed the shift-add operation on the partial
products of each PPG separately to obtain the product value and then added all the N product
values to compute the desired inner product. However, the shift-add operation to obtain the
product value increases the word length, and consequently increases the adder size of N 1
additions of the product values. To avoid such increase in word size of the adders, we add all
the N partial products of the same place value from all the N PPGs by one adder tree. All the
L/2 partial products generated by each of the N PPGs are thus added by (L/2) binary adder
trees.
The outputs of the L/2 adder trees are then added by a shift-add tree according to their
place values. Each of the binary adder trees require log2 N stages of adders to add N partial
product, and the shiftadd tree requires log2 L 1 stages of adders to add L/2 output of L/2
binary adder trees.2 The addition scheme for the error-computation block for a four-tap filter
and input word size L = 8 is shown in Fig. 7. For N = 4 and L = 8, the adder network requires
four binary adder trees of two stages each and a two-stage shiftadd tree. In this figure, we

have shown all possible locations of pipeline latches by dashed lines, to reduce the critical
path to one addition time.
If we introduce pipeline latches after every addition, it would require L(N 1)/2 +
L/2 1 latches in log2 N + log2 L 1 stages, which would lead to a high adaptation delay
and introduce a large overhead of area and power consumption for large values of N and L.
On the other hand, some of those pipeline latches are redundant in the sense that they are not
required to maintain a critical path of one addition time. The final adder in the shiftadd tree
contributes to the maximum delay to the critical path. Based on that observation, we have
identified the pipeline latches that do not contribute significantly to the critical path and could
exclude those without any noticeable increase of the critical path. The location of pipeline
latches for filter lengths N = 8, 16, and 32 and for input size L = 8 are shown in Table I. The
pipelining is performed by a feed forward cut-set retiming of the error-computation block.
Fig 3.6: Adder structure of the filtering unit for N = 4 and L= 8
3.4 Weight update Block

The proposed structure for the weight-update block is shown in Fig. It performs N
multiply-accumulate operations of the form ( e) xi + wi to update N filter weights. The
step size is taken as a negative power of 2 to realize the multiplication with recently
available error only by a shift operation. Each of the MAC units therefore performs the
multiplication of the shifted value of error with the delayed input samples xi followed by
the additions with the corresponding old weight values wi . All the N multiplications for

the MAC operations are performed by N PPGs, followed by N shiftadd trees. Each of the
PPGs generates L/2 partial products corresponding to the product of the recently shifted
error value e with L/2, the number of 2-b digits of the input word xi , where the sub
expression 3e is shared within the multiplier. Since the scaled error (e) is multiplied
with all the N delayed input values in the weight-update block, this sub expression can be
shared across all the multipliers as well. This leads to substantial reduction of the adder
complexity. The final outputs of MAC units constitute the desired updated weights to be used
as inputs to the error-computation block as well as the weight update block for the next
iteration.
Fig 3.7: Proposed structure of the weight-update block
Noticed that during weight adaptation, the error with n1 delays is used while the filtering unit
uses the weights delayed by n2 cycles. By this approach the adaptation-delay is effectively
reduced by n2 cycles. In the next section, we show that the proposed algorithm can be
implemented efficiently with very low adaptation delay which is not affected substantially by
the increase in filter order.

3.5 Adaptation Delay

As shown in Fig 3.4, the adaptation delay is decomposed into n1 and n2. The error
computation block generates the delayed error by n1 1 cycles as shown in Fig above. which
is fed to the weight-update block shown in Fig3.6. after scaling by ; then the input is
delayed by 1 cycle before the PPG to make the total delay introduced by FIR filtering be n1.
In Fig., the weight-update block generates wn1n2, and the weights are delayed by n2+1
cycles. However, it should be noted that the delay by 1 cycle is due to the latch before the
PPG, which is included in the delay of the error-computation block, i.e., n1.Therefore, the
delay generated in the weight-update block becomes n2. If the locations of pipeline latches
are decided as in Table I, n1 becomes 5, where three latches are in the error-computation
block, one latch is after the subtraction , and the other latch is before PPG in Fig. Also, n2 is
set to 1 from a latch in the shift-add tree in the weight update block.

CHAPTER 4
DESIGN IMPLEMENTATION
4.1 Fixed point implementation
In this section, we discuss the fixed-point implementation and optimization of the

proposed DLMS adaptive filter. A bit level pruning of the adder tree is also proposed to
reduce the hardware complexity without noticeable degradation of steady state MSE.
Fixed point design considerations
For fixed-point implementation, the choice of word lengths and radix points for
input samples, weights, and internal signals need to be decided. Fig. 4.1 shows the fixed-
point representation of a binary number. Let (X, Xi ) be a fixed-point representation of a
binary number where X is the word length and Xi is the integer length. The word length
and location of radix point of xn and wn in Fig. 4.1 need to be predetermined by the
hardware designer taking the design constraints, such as desired accuracy and hardware
complexity, into consideration.
Fig 4.1: fixed point representation
Xi = integer word length, Xf = fractional word length
Assuming (L, Li ) and (W,Wi ), respectively, as the representations of input signals and filter
weights, all other signals in Figs. 4.1 can be decided as shown in Table II.
x, w, p, q, y, d, and e can be found in the error-computation block, r, and s are defined
in the weight-update block in Fig. 8. It is to be noted that all the subscripts and time indices
of signals are omitted for simplicity of notation.
The signal pi j , which is the output of PPG block (shown in Fig. 4), has at most three
times the value of input coefficients. Thus, we can add two more bits to the word length and
to the integer length of the coefficients to avoid overflow. The output of each stage in the

adder tree in Fig. 3.6 is one bit more than the size of input signals, so that the fixed-point
representation of the output of the adder tree with log2 N stages becomes (W + log2 N +
2,Wi + log2 N + 2). Accordingly, the output of the shiftadd tree would be of the form
(W+L+log2 N,Wi+Li+ log2 N), assuming that no truncation of any least significant bits
(LSB) is performed in the adder tree or the shiftadd tree. However, the number of bits of the
output of the shiftadd tree is designed to have W bits. The most significant W bits need to be
retained out of (W + L + log2 N) bits, which results in the fixed-point representation (W,Wi +
Li +log2 N) for y, as shown in Table II. Let the representation of the desired signal d be the
same as y, even though its quantization is usually given as the input. For this purpose, the
specific scaling/sign extension and truncation/zero padding are required. Since the LMS
algorithm performs learning so that y has the same sign as d, the error signal e can also be set
to have the same representation as y without overflow after the subtraction.
Table 4.1: Fixed-Point representation of the signals of the Proposed DLMS Adaptive Filter
( = 2 (Li+Log2 N))
It is shown in that the convergence of an N-tap DLMS adaptive filter with n1 adaptation
delay will be ensured if

where 2 x is the average power of input samples. Furthermore, if the value of is defined as
(power of 2) 2n, where n Wi+Li+log2 N, the multiplication with is equivalent to the
change of location of the radix point. Since the multiplication with does not need any
arithmetic operation, it does not introduce any truncation error. If we need to use a smaller
step size, i.e., n > Wi + Li +log2 N, some of the LSBs of en need to be truncated. If we
assume that n = Li + log2 N, i.e., = 2(Li+log2 N) , as in Table II, the representation of en
should be (W,Wi ) without any truncation.
The weight increment term s (shown in Fig. 8), which is equivalent to enxn, is
required to have fixed-point representation (W + L,Wi + Li ). However, only Wi MSBs in the
computation of the shiftadd tree of the weight-update circuit are to be retained, while the
rest of the more significant bits of MSBs need to be discarded. This is in accordance with the
assumptions that, as the weights converge toward the optimal value, the weight increment
terms become smaller, and the MSB end of error term contains more number of zeros.
Also, in our design, L Li LSBs of weight increment terms are truncated so that the
terms have the same fixed-point representation as the weight values. We also assume that no
overflow occurs during the addition for the weight update. Otherwise, the word length of the
weights should be increased at every iteration, which is not desirable. The assumption is valid
since the weight increment terms are small when the weights are converged. Also when
overflow occurs during the training period, the weight updating is not appropriate and will
lead to additional iterations to reach convergence. Accordingly, the updated weight can be
computed in truncated form (W,Wi ) and fed into the error computation block.
4.2 Blocks Requirement

In this project we use a novel partial product generator and propose a strategy for
optimized balanced pipelining across the time-consuming combinational blocks of the
structure. 1. Error Computation Block
Partial Product generator
Adder Tree
Shift add Tree
2. Weight Update block
Partial Product generator
Shift-add Tree

In this we also use AOCs (and-or cells). These are the initial circuits to design LMS adaptive
Filter with the help of Error computation block and weight update block.
1. Error computation block: In this block module we can consider sub blocks PPG,
Adder tree and shift-add tree. In this proposed design in VHDL we write a code for 64
coefficients and also give sub blocks of 64 ppg, adder tree and shift add tree.
2. Weight Update block: It contains PPG and shift add tree.
The below figure shows the internal architecture of the proposed design. The
components structure present in the proposed design. These components were
developed in VHDL language.
DLMS Adaptive Filter
Error Computation Block Weight Update Block
PPG (Partial Product Generator) AOC PPG (Partial Product

Generator)
Decoder (2*3)
AOC
Adder Tree
Decoder (2*3)
Shift Add Tree
Shift Add Tree
Fig 4.2: Internal Component Structure
The VHDL codes of various components present in the proposed design in Xilinx ISE14.2
version. And synthesized using that IDE and simulated by using the simulator called as
model sim. These tools discussed in detailed below.

4.3 VLSI Tools

4.3.1. VHDL
A digital system can be described at different levels of abstraction and from different
points of view. An HDL should faithfully and accurately model and describe a circuit,
whether already built or under development, from either the structural or behavioral views, at
the desired level of abstraction. Because HDLs are modelled after hardware, their semantics
and use are very different from those of traditional programming languages.
Limitations of traditional programming languages
There are wide varieties of computer programming languages, from Frontend to C to

Java. Unfortunately, they are not adequate to model digital hardware. To understand their
limitations, it is beneficial to examine the development of a language. A programming
language is characterized by its syntax and semantics. The syntax comprises the grammatical
rules used to write a program, and the semantics is the meaning associated with language
constructs. When a new computer language is developed, the designers first study the
characteristics of the underlying processes and then develop syntactic constructs and their
associated semantics to model and express these characteristics.
Most traditional general-purpose programming languages, such as C, are modeled

after a sequential process. In this process, operations are performed in sequential order, one
operation at a time. Since an operation frequently depends on the result of an earlier
operation, the order of execution cannot be altered at will. The sequential process model has
two major benefits. At the abstract level, it helps the human thinking process to develop an
algorithm step by step. At the implementation level, the sequential process resembles the
operation of a basic computer model and thus allows efficient translation from an algorithm
to machine instructions.
The characteristics of digital hardware, on the other hand, are very different from
those of the sequential model. A typical digital system is normally built by smaller parts, with
customized wiring that connects the input and output ports of these parts. When signal
changes, the parts connected to the signal are activated and a set of new operations is initiated
accordingly. These operations are performed concurrently, and each operation will take a
specific amount of time, which represents the propagation delay of a particular part, to
complete. After completion, each part updates the value of the corresponding output port. If

the value is changed, the output signal will in turn activate all the connected parts and initiate
another round of operations. This description shows several unique characteristics of digital
systems, including the connections of parts, concurrent operations, and the concept of
propagation delay and timing. The sequential model used in traditional programming
languages cannot capture the characteristics of digital hardware, and there is a need for
special languages (i.e., HDLs) that are designed to model digital hardware.
VHDL includes facilities for describing logical structure and function of digital
systems at a number of levels of abstraction, from system level down to the gate level. It is
intended, among other things, as a modelling language for specification and simulation. We
can also use it for hardware synthesis if we restrict ourselves to a subset that can be
automatically translated into hardware.
VHDL arose out of the United States governments Very High Speed Integrated
Circuits (VHSIC) program. In the course of this program, it became clear that there was a
need for a standard language for describing the structure and function of integrated circuits
(ICs). Hence the VHSIC Hardware Description Language (VHDL) was developed. It was
subsequently developed further under the auspices of the Institute of Electrical and Electronic
Engineers (IEEE) and adopted in the form of the IEEE Standard 1076, Standard VHDL
Language Reference Manual, in 1987. This first standard version of the language is often
referred to as VHDL-87.
After the initial release, various extensions were developed to facilitate various design and
modelling requirements. These extensions are documented in several IEEE standards:
i. IEEE standard 1076.1-1999, VHDL Analog and Mixed Signal Extensions (VHDL-
AMS): defines the extension for analog and mixed-signal modelling.
ii. IEEE standard 1076.2-1996, VHDL Mathematical Packages: defines extra

mathematical functions for real and complex numbers.
iii. IEEE standard 1076.3- 1997, Synthesis Packages: defines arithmetic operations over
a collection of bits.
iv. IEEE standard 1076.4-1995, VHDL Initiative towards ASK Libraries (VITAL):
defines a mechanism to add detailed timing information to ASIC cells.

v. IEEE standard 1076.6-1999, VHDL Register Transfer Level (RTL) Synthesis:

defines a subset that is suitable for synthesis.
vi. IEEE standard 1 164- 1993 Multivalve Logic System for VHDL Model
Interoperability (std-logicJl64): defines new data types to model multivalve logic.
vii. IEEE standard 1029.1-1998, VHDL Waveform and Vector Exchange to Support
Design and Test Verification (WAVES): defines how to use VHDL to exchange
information in a simulation environment.
Fig 4.3: Summary of VHDL Design Flow
4.3.2 Xilinx ISE

Xilinx ISE Simulator is a test bench and test fixture creation tool integrated in the
Project Navigator framework. It constitutes of Waveform Editor which can be used to
graphically enter Stimuli and the expected response, and then generate a VHDL test bench or
Verilog test fixture. ISE controls all aspects of the design flow. Through the Project navigator
interface, we can Access all of the design entry and design implementation tools. We can also
access the files and documents associated with your project.
Starting the ISE Software:

To start ISE: Double-click the ISE Project Navigator icon on desktop or select Start >
All Programs > Xilinx ISE 14.2> Project Navigator.

Creating New Project

1. Create a Verilog source file for the project as follows:
2. Click the New Source button in the New Project Wizard.
3. Select verilog Module as the source type.
4. Type in the file name counter.
5. Verify that the Add to project checkbox is selected.
6. Click Next.
4.3.3 Design Flow
The first step involved in implementation of a design on FPGA involves System

specifications. Specifications refer to kind of inputs and kind of outputs and the range of
values that the kit can take in based on these Specifications. After the first step system
specifications the next step is the Architecture. Architecture describes the interconnections
between all the blocks involved in our design. Each and every block in the Architecture along
with their interconnections are modelled in either VHDL or Verilog depending on the ease.
All these blocks are then simulated and the outputs are verified for correct functioning.
Fig 4.4: Xiinx Design Flow

4.4. Simulation Flow
The following diagram shows the basic steps for simulating a design in Xilinx.
Fig 4.5: Simulation flow
4.5 Simulation Library Compilation Wizard
This method is highly recommended because library compilation can be performed

for more than one device family and/or language.
1. Open the Simulation Library Compilation Wizard. This can be accessed from Start
MenuProgramsXilinx ISE Design Suite 13.xISE Design Tools32-bit Tools
Note: If you are using 64-bit version of ModelSim use the Simulation Library Compilation
Wizard from 64-bit Tools. ModelSim PE Student Version 10.x is only available in the 32-bit
version.
2. The Select Simulator window opens up. Select the appropriate simulator (For Student
Edition select ModelSim PE); enter c:\modeltech64 10.0cnwin64 for executable
location, compxlib.cfg for Compxlib Configuration File and compxlib.log for
Compxlib Log File.

3. Next select the HDL used for simulation. If you are unsure select Both VHDL and
Verilog. However, this will increase the compilation time and the disk space required.
4. Then select all the device families that you will be working with. Again the more
number of devices, more the compilation time and the disk space required. Remember
that you can always run the compilation wizard at a later time for additional devices.
5. The next window is for selecting libraries for Functional and Timing Simulation.
Different libraries are required for different types of simulation (behavioral, post-
route, etc.). We suggest that you select All Libraries as the default option. Interested
users can refer to Chapter 6 of the Xilinx Synthesis and Simulation Design Guide for
additional information.
6. Finally the window for Output directory for compiled libraries is shown. We suggest
leaving the default values that Xilinx picks. Then select Launch Compile Process.
7. Be patient as the compilation can take a long time depending on the options that you
have chosen.
8. The compile process may have contained a lot of warnings but should be error-free.
We have not explored the reasons behind these warnings, but they do not appear to
affect the simulation of any of our designs.
9. Once the process is completed, open c:\modeltech64 10.0cnmodelsim.ini and verifies
if there are libraries pointing to the output directory entered in step 6. This will
happen only if you have set the environment variables.
Library compilation is now complete. If you have not set the environment variables
then the wizard creates a modelsim.ini in the output directory entered in step 6. By default
this location is c:\Xilinxn13.xnISE DSnISE. Open this file and verify that it contains the
location of the libraries that were just compiled. This file should be copied into every project
you create.
4.6 ModelSim
ModelSim is a multi-language HDL simulation environment by Mentor Graphics,[1] for
simulation of hardware description languages such as VHDL, Verilog and SystemC, and
includes a built-in C debugger. ModelSim can be used independently, or in conjunction
with Altera Quartus or Xilinx ISE. Simulation is performed using the graphical user
interface (GUI), or automatically using scripts.

ModelSim is offered in multiple editions such as Modelsim PE, ModelSim XE, and
Modelsim SE. Modelsim SE offers high-performance and advanced debugging capabilities,
while Modelsim PE is the entry-level simulator for hobbyists and students. Modelsim SE is
used in large multi-million gate designs, and is supported on Microsoft Windows and Linux,
in 32-bit and 64-bit architectures.
Modelsim XE stands for Xilinx Edition, and is specially designed for integration with Xilinx
ISE. Modelsim XE enables testing of HDL programs written for Xilinx Virtex/Spartan series
FPGA's without needed physical hardware.
Modelsim uses a unified kernel for simulation of all supported languages, and the method of
debugging embedded C code is the same as VHDL or Verilog.
Modelsim enables simulation, verification and debugging for the following languages:
VHDL
Verilog
Verilog 2001
SystemVerilog
PSL
SystemC
Mentor Graphics was the first to combine single kernel simulator (SKS) technology with a
unified debug environment for Verilog, VHDL, and SystemC. The combination of industry-
leading, native SKS performance with the best integrated debug and analysis environment
make ModelSim the simulator of choice for both ASIC and FPGA designs. The best
standards and platform support in the industry make it easy to adopt in the majority of
process and tool flows.
Fig 4.6: Modelsim Simulator Window

CHAPTER 5
SIMULATION RESULTS
5.1 Simulation Output
Simulation is the process of verifying the functional characteristics of models at any

level of abstraction. We use simulators to simulate the Hardware models. To test if the RTL
code meets the functionally requirements of the specification, we must see if all the RTL
blocks are functionally correct. To achieve this we need to write a test bench, which
generates clock, reset and the required test vectors. Normally we spend 60-70% of time in
design verification. We use the waveform output from the simulator to see if the DUT
(Device under Test) is functionally correct. Most of the simulators come with a waveform
viewer. As design becomes complex, we write self-checking test bench, where test bench
applies the test vector, then compares the output of DUT with expected values
The simulate proposed system architecture in Modelsim and to analysis the area,
power, and delay of the proposed system in Spartan 6 by using Xilinx software. The
simulation result for adaptive filter is shows in figure 8. The synthesis report of the adaptive
filter is shown in figure5.2. Finally the comparison of the proposed system is detailed in
table.
Fig 5.1: Simulation Waveform

5.2 Synthesis Report
Fig 5.2: Synthesis report
5.3 Power Report
0.3
0.25
0.2
0.15 Existing system Proposed system
0.1
0.05
0
Power consumption(W)
Table 5.1: Comparison of Power Consumption
The power consumption of proposed and existing systems as compared in above table 2. The
existing system power consumption is around 0.24mW. but in case of proposed system power
consumption is around 0.14mW. The proposed system is 644tap fixed point filter. From this

we can calculate energy per sample and also find energy delay product (EDP) by using
different compliers. The differences shown in the table
5.4 Area Report
10000
9000
8000
7000
6000
5000 Existing system Proposed system
4000
3000
2000
1000
0
Number of slices
Table 5.2: Comparison of Area Report with Slices
The area occupied by the filter is obtained from the synthesis results. The proposed hardware
for the test circuit has been described in VHDL and synthesized using Xilinx ISE 14.2. From
above figure we observed the existing and proposed systems of required number of slices.
Existing system contains 32tap coefficient and proposed system contains 64tap coefficient so
number of slices required is more compared to existing system. But we observed the power
consumption in proposed system.
5.5 Delay Report
The proposed design can be designed in VHDL and synthesized by compiler for
different filter orders. The word length of the input samples and weights are chosen to be 8,
16, 32, 64 and its multiplication without any additional circuitry. In table we have shown the
synthesis results of existing and proposed designs in terms of data arrival time, power
consumption etc.
The below table shown that energy delay product (EDP) also. It comes from product of data
arrival time and energy of the particular filter length. The proposed design results show better
results compared to existing one. Apart from this based on number of slices we have to
calculate area occupied by the filter also. And also calculate area delay product also from this
results.

The results table shown below.
Existing system Proposed system
Filter length 32 64
DAT (Data Arrival Time) ns 1.15 115
Power Consumption (mW) 0.24 0.132
Energy per sample (EPS) mW*ns 0.24*1.15 0 .132*1.15

(0.276) (0.1518)
Table 5.3: Comparison of Results
5.6 RTL View
Fig 5.3 RTL View

5.7 RTL Schematic
Fig 5.4: RTL Schematic

CHAPTER 6
ADVANTAGES & APPLICATIONS
6.1 Advantages
Reduce the critical path to support high input-sampling rates. Hence, throughput
increases.
Reduce Power consumption.
Less adaptation Delay.
Faster performance.
Less area occupied on hardware implementation.
It gives higher performance results of EDP (Energy delay Product) and ADP (Area
Delay Product).
Steady state error also calculates in this filter.
6.2 Applications
1. Echo Cancellation:
Echo suppression and echo cancellation are methods in telephony to improve voice
quality by preventing echo from being created or removing it after it is already present. In
addition to improving subjective quality, this process increases the capacity achieved
through silence suppression by preventing echo from traveling across a network.
These methods are commonly called acoustic echo suppression (AES) and acoustic echo
cancellation (AEC), and more rarely line echo cancellation (LEC). In some cases, these terms
are more precise, as there are various types and causes of echo with unique characteristics,
including acoustic echo (sounds from a loudspeaker being reflected and recorded by a
microphone, which can vary substantially over time) and line echo (electrical impulses
caused by, e.g., coupling between the sending and receiving wires, impedance mismatches,
electrical reflections, etc., which varies much less than acoustic echo). In practice, however,
the same techniques are used to treat all types of echo, so an acoustic echo canceller can
cancel line echo as well as acoustic echo. "AEC" in particular is commonly used to refer to

echo cancellers in general, regardless of whether they were intended for acoustic echo, line
echo, or both.
2. Adaptive Beam Forming
An adaptive beam former is a system that performs adaptive spatial signal

processing with an array of transmitters or receivers. The signals are combined in a manner
which increases the signal strength to/from a chosen direction. Signals to/from other
directions are combined in a benign or destructive manner, resulting in degradation of the
signal to/from the undesired direction. This technique is used in both radio frequency and
acoustic arrays, and provides for directional sensitivity without physically moving an array
of receivers or transmitters.
Adaptive beam forming was initially developed in the 1960s for the military applications of
sonar and radar.[1] There exist several modern applications for beam forming, one of the most
visible applications being commercial wireless networks such as LTE. Initial applications of
adaptive beam forming were largely focused in radar and electronic countermeasures to
mitigate the effect of signal jamming in the military domain.
Radar uses can be seen here phased array radar. Although not strictly adaptive, these
radar applications make use of either static or dynamic (scanning) beam forming.
Commercial wireless standards such as 3GPP Long Term Evolution (LTE
telecommunication) and IEEE 802.16 WiMax rely on adaptive beam forming to enable
essential services within each standard.
Example of adaptive beam forming network as shown in fig 6.1 below.
Fig 6.1. LMS Adaptive beam forming network

3. System Identification
The field of system identification uses statistical methods to build mathematical

models of dynamical systems from measured data. System identification also includes
the optimal design of experiments for efficiently generating informative data for fitting such
models as well as model reduction.
4. Channel Equalization
In telecommunication, equalization is the reversal of distortion incurred by a signal

transmitted through a channel. Equalizers are used to render the frequency responsefor
instance of a telephone lineflat from end-to-end. When a channel has been equalized
the frequency domain attributes of the signal at the input are faithfully reproduced at the
output. Telephones, DSL lines and television cables use equalizers to prepare data signals for
transmission.
Equalizers are critical to the successful operation of electronic systems such as analog
broadcast television. In this application the actual waveform of the transmitted signal must be
preserved, not just its frequency content. Equalizing filters must cancel out any group delay
and phase delay between different frequency components.
Adaptive equalizer: is typically a linear equalizer or a DFE. It updates the equalizer

parameters (such as the filter coefficients) as it processes the data. Typically, it uses the MSE
cost function; it assumes that it makes the correct symbol decisions, and uses its estimate of
the symbols to compute.
Example of digital transmission system using channel equalization as shown in fig 6.2
below.
Fig 6.2: Digital transmission using channel equalization.

5. Image Processing, video, multimedia applications etc.

CONCLUSION AND FUTURE WORK
Conclusion
We proposed an areadelay-power efficient low adaptation delay architecture for

Fixed-point implementation of DLMS adaptive filter. We used a novel PPG for efficient
Implementation of general multiplications and inner-product computation by common sub
expression sharing. Besides, we have proposed an efficient addition scheme for inner product
computation to reduce the adaptation delay significantly in order to achieve faster
convergence performance and to reduce the critical path to support high input-sampling
rates. Aside from this, we proposed a strategy for optimized balanced pipelining across
the time-consuming blocks of the structure to reduce the adaptation delay and power
consumption, as well. The proposed structure involved significantly less adaptation delay
and provided significant saving of ADP and EDP compared to the existing structures. We
proposed a fixed-point implementation of the proposed architecture.
Future Work
For low sampling rate, this proposed design clock is slower than usable frequency.
And also it maintains under only low operating voltage. This design can be further extended
to floating point considerations also.

REFERENCES
[1] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood Cliffs, NJ, USA:
Prentice-Hall, 1985.
[2] S. Haykin and B. Widrow, Least-Mean-Square Adaptive Filters. Hoboken, NJ, USA:
Wiley, 2003.
[3] M. D. Meyer and D. P. Agrawal, A modular pipelined implementation of a delayed LMS
transversal adaptive filter, in Proc. IEEE Int. Symp.Circuits Syst., May 1990, pp. 1943
1946.
[4] G. Long, F. Ling, and J. G. Proakis, The LMS algorithm with delayed coefficient
adaptation, IEEE Trans. Acoust., Speech, Signal Process.,vol. 37, no. 9, pp. 13971405,
Sep. 1989.
[5] G. Long, F. Ling, and J. G. Proakis, Corrections to The LMS algorithm with delayed
coefficient adaptation, IEEE Trans. Signal Process., vol. 40, no. 1, pp. 230232, Jan. 1992.
[6] H. Herzberg and R. Haimi-Cohen, A systolic array realization of an LMS adaptive filter
and the effects of delayed adaptation, IEEE Trans. Signal Process., vol. 40, no. 11, pp.
27992803, Nov. 1992.
[7] M. D. Meyer and D. P. Agrawal, A high sampling rate delayed LMS filter architecture,
IEEE Trans. Circuits Syst. II, Analog Digital Signal Process., vol. 40, no. 11, pp. 727729,
Nov. 1993.
[8] S. Ramanathan and V. Visvanathan, A systolic architecture for LMS adaptive filtering
with minimal adaptation delay, in Proc. Int. Conf. Very Large Scale Integr. (VLSI) Design,
Jan. 1996, pp. 286289.
[9] Y.Yi, R.Woods, L.K.Ting, and C.F.N.Cowan, High speed FPGA-based implementations
of delayed-LMS filters, J. Very Large Scale Integr. (VLSI) Signal Process., vol. 39, nos. 1
2, pp. 113131,Jan. 2005.
[10] L. D. Van and W. S. Feng, An efficient systolic architecture for the DLMS adaptive
filter and its applications, IEEE Trans. CircuitsSyst. II, Analog Digital Signal Process., vol.
48, no. 4, pp. 359366, Apr. 2001.
[11] L.-K. Ting, R. Woods, and C. F. N. Cowan, Virtex FPGA implementation of a
pipelined adaptive LMS predictor for electronic support measures receivers, IEEE Trans.
Very Large Scale Integr. (VLSI) Syst.,vol. 13, no. 1, pp. 8699, Jan. 2005.
[12] P. K. Meher and M. Maheshwari, A high-speed FIR adaptive filter architecture using a
modified delayed LMS algorithm, in Proc. IEEE Int. Symp. Circuits Syst., May 2011,

[13] P. K. Meher and S. Y. Park, Low adaptation-delay LMS adaptive filter part-I:
Introducing a novel multiplication cell, in Proc. IEEE Int.Midwest Symp. Circuits Syst.,
Aug. 2011, pp. 14.
[14] P. K. Meher and S. Y. Park, Low adaptation-delay LMS adaptive filter part-II: An
optimized architecture, in Proc. IEEE Int. Midwest Symp.Circuits Syst., Aug. 2011, pp. 14.
[15] K. K. Parhi, VLSI Digital Signal Procesing Systems: Design and Implementation. New
York, USA: Wiley, 1999.
[16] C. Caraiscos and B. Liu, A roundoff error analysis of the LMS adaptive algorithm,
IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 1,pp. 3441, Feb. 1984.
[17] R. Rocher, D. Menard, O. Sentieys, and P. Scalart, Accuracy evaluation of fixed-point
LMS algorithm, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2004, pp.
237240.
[18] Xilinx14.2, Synthesis and Simulation Design Guide, UG626 (v14.2)

APPENDIX (A): SOURCE CODE
AOC :
library ieee;
use ieee.std_logic_1164.all;
entity AOC is
port (
b0 : in std_logic;
b1 : in std_logic;
b2 : in std_logic;
w : in std_logic_vector(9 downto 0);
w2 : in std_logic_vector(9 downto 0);
w_out : out std_logic_vector(9 downto 0)
);
end;
architecture rtl of AOC is
signal bb0 : std_logic_vector(9 downto 0);

signal ac1 : std_logic_vector(9 downto 0);

signal oc1 : std_logic_vector(9 downto 0);
signal oc2 : std_logic_vector(9 downto 0);
begin
bb0 <= (others => '0') when b0 = '0' else (others => '1');
ac1 <= bb0 and w;

ac2 <= bb1 and w2;
ac3 <= bb2 and w3;
oc1 <= ac1 or ac2;

oc2 <= oc1 or ac3;
w_out <= oc2;
end;

Decoder:
library ieee;
entity decoder2_3 is
port (
u0 : in std_logic;
u1 : in std_logic;
b0 : out std_logic;
b1 : out std_logic;
b2 : out std_logic
);
end;
architecture rtl of decoder2_3 is

begin
b0 <= u0 and (not u1);

b1 <= u1 and (not u0);
b2 <= u0 and u1;
end;
PPG :
library ieee;
use ieee.std_logic_unsigned.all;
library fir; use fir.fir_fix_types.all;
entity PPG is
port (
clk : in std_logic;
reset : in std_logic;
x : in std_logic_vector(7 downto 0);
w_coff : in std_logic_vector(7 downto 0);
p00 : out std_logic_vector(9 downto 0);
p03 : out std_logic_vector(9 downto 0)
);
end;
architecture rtl of PPG is
signal K_NEG_1 : std_logic_vector(7 downto 0):= (r2b(1.0,8));
component AOC
port (
b0 : in std_logic;
b1 : in std_logic;
b2 : in std_logic;
w : in std_logic_vector(9 downto 0);


w_out : out std_logic_vector(9 downto 0)
);
end component;
component decoder2_3
port (
u0 : in std_logic;
u1 : in std_logic;
b0 : out std_logic;
b1 : out std_logic;
b2 : out std_logic
);
end component;
signal w : std_logic_vector(9 downto 0);

signal w2 : std_logic_vector(9 downto 0);
signal w3 : std_logic_vector(9 downto 0);
signal neg_w2 : std_logic_vector(9 downto 0);
signal neg_w : std_logic_vector(9 downto 0);
signal mult : std_logic_vector(9 downto 0);
signal aoc_0 : std_logic_vector(2 downto 0);

begin
w <= ("00" & w_coff) when w_coff(7) = '0' else ("11" & w_coff);
w2 <= w(8 downto 0) & "0";
w3 <= w2+w;
mult <= (not w) + "1";
neg_w <= mult;
neg_w2 <= mult(8 downto 0) & "0";
decode1: decoder2_3
port map(
u0 => x(0),
u1 => x(1),
b0 => aoc_0(0),
b1 => aoc_0(1),
b2 => aoc_0(2)
);
decode2: decoder2_3
port map(
u0 => x(2),
u1 => x(3),
b0 => aoc_1(0),
b1 => aoc_1(1),
b2 => aoc_1(2)

);
decode3: decoder2_3
port map(
u0 => x(4),
u1 => x(5),
b0 => aoc_2(0),
b1 => aoc_2(1),
b2 => aoc_2(2)
);
decode4: decoder2_3
port map(
u0 => x(6),
u1 => x(7),
b0 => aoc_3(0),
b1 => aoc_3(1),
b2 => aoc_3(2)
);
AOC0: AOC
port map(
b0 => aoc_0(0),
b1 => aoc_0(1),
b2 => aoc_0(2),
w => w,
w2 => w2,
w3 => w3,
w_out => p00
);
AOC1: AOC
port map(
b1 => aoc_1(1),
b2 => aoc_1(2),
w => w,
w2 => w2,
w3 => w3,
w_out => p01
);
AOC2: AOC
port map(
b0 => aoc_2(0),
b1 => aoc_2(1),
b2 => aoc_2(2),
w => w,
w2 => w2,
w3 => w3,
w_out => p02
);
AOC3: AOC
port map(
b0 => aoc_3(0),
b1 => aoc_3(1),
b2 => aoc_3(2),
w => w,
w2 => w2,
w3 => w3,

w_out => p03

);
Shift-Add Tree:
library ieee;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;
entity shift_add_tree is
generic ( n : in integer := 16);
port (
q0 : in std_logic_vector(n-1 downto 0);
yn : out std_logic_vector(n-1 downto 0)
);
end;
architecture rtl of shift_add_tree is
signal adder1 : std_logic_vector(n+5 downto 0);

begin
adder1 <= ("000000" & q0) + ("0000" & q1(n-1 downto 0) & "0");
adder2 <= ("000000" & q2) + ("0000" & q3(n-1 downto 0) & "0");
adder3 <= adder1 + (adder2(n-5 downto 0) & "0000");
yn <= adder3(n-1 downto 0);
end;

APPENDIX (B): PUBLISHED PAPER

VLSI Implementation of High Efficient 64 Tap
Fixed-Point DLMS Adaptive Filter
V.L.S.Mounika#1 and P.Srinivas#2
#
M.Tech (VLSI), Department of ECE, BVC College of Engineering, Rajahmundry, A.P, India
*
Associate Professor, Department of ECE, BVC College of Engineering, Rajahmundry, A.P, India
simulation result of the paper. Finally Section V presents the

Abstract We are discussed about a FPGA implementation conclusion of the paper.
of an efficient architecture of a delayed least mean square
adaptive filter. From the architecture we are achieves the low
adaption delay and Area, power, delay. We are propose and II. RELATED WORK
fixed point implementation scheme architecture with bit level
clipping. From the synthesis, we are analysis the area, delay The existing work on the DLMS adaptive filter does not
and power of the proposed system to be optimized. Finally we discuss the fixed-point implementation issues, e.g., location
are design an efficient architecture of adaptive filter.
of radix point, choice of word length, and quantization at
various stages of computation, although they directly affect
Index Terms adaptive filter, area efficient, FIR filter.
the convergence performance, particularly due to the
recursive behavior of the LMS algorithm. Therefore, fixed-
I. INTRODUCTION
point implementation issues are given adequate emphasis in
this paper. Besides, we present here the optimization of our
In nowadays the one of the most widely used adaptive filter previously reported design to reduce the number of pipeline
is LMS adaptive filter, because if the simplicity and delays along with the area, sampling period, and energy
converge performance. consumption. The proposed design is found to be more
efficient in terms of the power-delay product (PDP) and
In [1], describes the comparison between adaptive filtering energy-delay product (EDP) compared to the existing
algorithms that is least mean square (LMS), Normalized structures.
least mean square (NLMS). Implementation aspects of these
algorithms, there are SNR and computational complexity The block diagram of the DLMS adaptive filter is shown in
analysis. These algorithms are used less input and output Fig. 4, where the adaptation delay of m cycles amounts to
delay. For this comparison the LMS is efficient one from the the delay introduced by the whole of adaptive filter structure
SNR level. consisting of finite impulse response (FIR) filtering and the
weight-update process. It is shown in that the adaptation
The convergence characteristics of the least mean square delay of conventional LMS can be decomposed into two
(LMS) algorithm are analysis in order to establish a range parts: one part is the delay introduced by the pipeline stages
for the convergence factor that will more stability. The in FIR filtering, and the other part is due to the delay
convergence speed of the least mean square (LMS) is involved in pipelining the weight update process.
dependent on the eigen value of the spread an input signal
correlation matrix [2].
In [3], Proposed fine grained pipelined design of an adaptive

filter, it is supports high sampling frequency but with
pipeline depth. The unfortunate effects of the architecture
are that adaptation delay increases, convergence
performance degrades and power dissipation increases.
In [4], proposed an efficient architecture for DLMS adaptive

digital filter based on a new tree-systolic processing element.
The proposed efficient architecture that maintains best
convergence performance has the same lowest critical period
as that in conventional circuit, high degrees of modularity Figure 1: Structure of the conventional delayed LMS adaptive
and finite driving or update and locality at no extra area cost. filter.
The rest of this paper to introduce the existing system for the III. PROPOSED SYSTEM
paper is discussed in section II. Then, in section III, the
adaptive filter is discussed. Section IV presents the We are proposed a new architecture design for the delayed
LMS adaptive filter with 64 tap. Based on decomposition of
delay on the existing system, the Delayed LMS adaptive
1
filter can be implemented by a proposed structure shown in
Fig. 2.
Figure 4: Proposed structure of PPG. AOC stands for AND/OR

cell.
Figure 2: Structure of the modified delayed LMS adaptive filter.
Where, dn is the desired response, yn is the filter output, and

en denotes the error computed during the nth iteration. is
the step-size, and N is the number of weights used in the
LMS adaptive filter.
The proposed architecture there is two main components.

1. Error computation block
2. Weight update block
A. Error computation block
The proposed architecture for error computation block for

the delayed LMS adaptive filter is shown in fig 3. This
architecture consists of delay element D, 2 bit PPG unit,
adder tree with log2N stages, and shift adder tree with log2L-
Figure 5: Structure and function of AND/OR cell. Binary operators
1 stages.
and + in (b) and (c) are implemented using AND and OR gates,
respectively.
2) Architecture of adder tree:

Conventionally, we should have performed the shift-add
operation on the partial products of each PPG separately to
obtain the product value and then added all the N product
values to compute the desired inner product. However, the
shift-add operation to obtain the product value increases the
word length, and consequently increases the adder size of N
1 additions of the product values. To avoid such increase
Figure 3: Proposed structure of the error-computation block. in word size of the adders, we add all the N partial products
1) 2-bit PPG unit: of the same place value from all the N PPGs by one adder
tree. The addition scheme for the error-computation block
This unit is generating the partial product values of the error
for a four-tap filter and input word size L = 8 is shown in
computation block. This PPG is consists of 2 to 3 decoder
Fig. 6
and AOC. The 2 to 3 decoder is generating the values of 1, 2
and 3 for the input of u1 and u0 when 01, 10, 11
respectively. The 2 bit PPG unit is shown in fig 4.
The AOC is the AND/OR cell design. This is use to multiply

the input data into the coefficient value. The architecture for
AOC is shown in fig 8. The structure and function of AND
cells and OR cells are depicted by Fig. 5(b) and 5(c)
respectively.
2
Figure 6: Adder-structure of the filtering unit
Figure 7: Proposed structure of the weight-update block.
B. Weighted updated block:

IV. SIMULATION AND RESULT
The proposed structure for the weight-update block is shown The simulate the proposed system architecture in Modelsim
in Fig. 7. It performs N multiply-accumulate operations of and to analysis the area, power, and delay of the proposed
the form ( e) xi + wi to update N filter weights. The system in Spartan 6 by using Xilinx software. The simulation
result for adaptive filter is shows in figure 8. The synthesis
step
report of the adaptive filter is shown in figure 9. Finally the
size is taken as a negative power of 2 to realize the comparison of the proposed system is detailed in table 1.
multiplication with recently available error only by a shift
operation. Each of the MAC units therefore performs the
multiplication of the shifted value of error with the delayed
input samples xi followed by the additions with the
corresponding old weight values wi. All the N
multiplications for the MAC operations are performed by N
PPGs, followed by N shift add trees. Each of the PPGs
generates L/2 partial products corresponding to the product
of the recently shifted error value e with L/2, the number
of 2-b digits of the input word xi, where the sub-expression
3 e is shared within the multiplier. Since the scaled error
( e) is multiplied with the entire N delayed input values in
the weight-update block, this sub-expression can be shared Figure 8: simulation result
across all the multipliers as well. This leads to substantial
reduction of the adder complexity. The final outputs of
MAC units constitute the desired updated weights to be used
as inputs to the error-computation block as well as the
weight-update block for the next iteration.
Figure 9: synthesis report
Table 1: comparison
Existing system Proposed system
Filter length 32 64
Number of slices 4060 8631
Power 0.24 0.132
consumption(mW)
3
[3]. Y.Yi.R.Woods, L.-K. Ting, R.Woods and C.F.N.
Cowan. 2005. High speed FPGA based
10000 implementations of delayed LMS filters, J. Very
8000 Large Scale Integr. (VLSI) Signal Process., vol. 39,
No.1-2, pp. 113 131, Jan 2005.
6000 Existing system [4]. L. D. Van and W. S. Feng, An efficient systolic
4000 Proposed system architecture for the DLMS adaptive filter and its
applications " IEEE Trans. Circuits Syst. II, Analog
2000
Digital Signal Process., vol. 48, no. 4, pp. 359-366,
0 Apr. 2001.
Number of slices [5]. H. Herzberg and R. Haimi-Cohen, A systolic array
realization of an LMS adaptive filter and the effects of
Figure 10: number of slices
delayed adaptation, IEEE Trans. Signal Process., vol.
40, no. 11, pp. 27992803, Nov. 1992.
[6]. M. D. Meyer and D. P. Agarwal, A high sampling rate
delayed LMS filter architecture, IEEE Trans. Circuits
0.3 Syst. II, Analog Digital Signal Process., vol. 40, no. 11,
0.25 pp. 727729, Nov. 1993.
[7]. S. Ramanathan and V. Visvanathan, A systolic
0.2
Existing system architecture for LMS adaptive filtering with minimal
0.15
Proposed system adaptation delay, in Proc. Int. Conf. Very Large Scale
0.1
Integr. (VLSI) Design, Jan. 1996, pp. 286289.
0.05 [8]. Y. Yi, R. Woods, L.-K. Ting, and C. F. N. Cowan,
0 High speed FPGA-based implementations of delayed-
Power consumption(W) LMS filter, J. Very Large Scale Integr. (VLSI) Signal
Process. vol. 39, nos. 12, pp. 113131, Jan. 2005.
Figure 11: power consumption [9]. L. D. Van and W. S. Feng, An efficient systolic
architecture for the DLMS adaptive filter and its
applications, IEEE Trans. Circuits Syst. II, Analog
V. CONCLUSION
Digital Signal Process., vol. 48, no. 4, pp. 359366,
Apr. 2001.
We proposed an efficient fixed point adaptive filter with low
[10]. L.-K. Ting, R. Woods, and C. F. N. Cowan, Virtex
adaptation delay. We have used a novel partial product
FPGA implementation of a pipelined adaptive LMS
generator of multiplications and inner product. We have
predictor for electronic support measures receivers,
proposed fixed point implementation scheme architecture
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.
with bit level clipping. From the synthesis, we also analysed
13, no. 1, pp. 8699, Jan. 2005.
the area, delay and power of the proposed system to be
[11]. P. K. Meher and M. Maheshwari, A high-speed FIR
optimized. The proposed dragon gives the efficient output
adaptive filter architecture using a modified delayed
reuslts compared with existing ones. Further we proceed
LMS algorithm, in Proc. IEEE Int. Symp. Circuits
pipeline implementation with partial product generator
Syst., May 2011, pp. 121124.
across the time consuming combinational blocks of filter
[12]. P. K. Meher and S. Y. Park, Low adaptation-delay
structure.
LMS adaptive filter part-I: Introducing a novel
multiplication cell, in Proc. IEEE Int. Midwest Symp.
REFERENCES
Circuits Syst., Aug. 2011, pp. 14.
[1]. Jyoti dhiman, shadab ahmad , kuldeep gulia, [13]. P. K. Meher and S. Y. Park, Low adaptation-delay
Comparison between Adaptive filter Algorithms (LMS, LMS adaptive filter part-II: An optimized architecture,
NLMS and RLS) International Journal of Science, in Proc. IEEE Int. Midwest Symp. Circuits Syst., Aug.
Engineering and Technology Research (IJSETR) 2011, pp. 14.
Volume 2, Issue 5, May 2013. [14]. K. K. Parhi, VLSI Digital Signal Procesing Systems:
[2]. B. Widrow, J. M. McCool, M. G. Larimore, and C. R. Design and Implementation. New York, USA: Wiley,
Johnson, Jr., Stationary and nonstationary learning 1999.
characteristics of the LMS adaptive filters, [15]. C. Caraiscos and B. Liu, A round off error analysis of
Proceedings of the IEEE, vol. 64, pp. 1151-1162, Aug. the LMS adaptive algorithm, IEEE Trans. Acoust.,
1976.
4
Speech, Signal Process., vol. 32, no. 1, pp. 3441, Feb.
1984.

Rough Copy Thesis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rough Copy Thesis

Uploaded by

Copyright:

Available Formats

A Project report on

VLSI Implementation of High Efficient 64 Tap Fixed-

Under the esteemed guidance of

Associate Professor, Department of ECE

Department of Electronics and Communication Engineering

This is to certify that this dissertation work entitled VLSI

Project Guide Head of the Department

I V.L.S.MOUNIKA DEVI (Reg.No: 146M1D5708) hereby declare

(Signature of the candidate)

The satisfaction that accompanies the successful completion of every task

I express my heartfelt thanks to Dr. M.ANJAN KUMAR, Principal, Bonam

2: DIGITAL FILTERS 8-18

3: DESIGN METHODOLOGY 19-26

4: DESIGN IMPLEMENTATION 27-37

5: SIMULATION RESULTS 38-42

6: ADVANTAGES & APPLICATIONS 43-45

CONCLUSION AND FUTURE WORK 46

APPENDIX (A): SOURCE CODE 49-53

APPENDIX (B): COPY OF PUBLISHED PAPER 54

Figure No Name of the Figure Page No

2.1 FIR filter 10

Table list Page No

Table 4.1: Fixed-Point representation of DLMS adaptive filter 35

Dept. of ECE BVC College of Engineering, Palacharla Page 1

Dept. of ECE BVC College of Engineering, Palacharla Page 2

Dept. of ECE BVC College of Engineering, Palacharla Page 3

There are two types of linear filter structures. 1. FIR 2. IIR

a. Least Mean Square ( LMS )

1.2 LITERATURE REVIEW

Dept. of ECE BVC College of Engineering, Palacharla Page 4

A modular pipelined implementation of a delayed LMS transversal adaptive filters

Virtex FPGA implementation of a pipelined adaptive LMS predictor for electronic

Dept. of ECE BVC College of Engineering, Palacharla Page 5

1.3 LIMITATIONS OF PREVIOUS WORKS

1.4 MOTIVATION AND SCOPE

The motivation of our research is to develop an algorithm for area-delay-power efficient

Dept. of ECE BVC College of Engineering, Palacharla Page 6

1.5 PROBLEM DEFINITION

1.6 ORGANISATION OF THE THESIS

Chapter 3 gives the basic design methodology of the project thesis.

Chpater 4 gives the complete information about an implementation of the project.

Dept. of ECE BVC College of Engineering, Palacharla Page 7

y[n] = x[n] *f[n]

The Input x(n)

Dept. of ECE BVC College of Engineering, Palacharla Page 8

Dept. of ECE BVC College of Engineering, Palacharla Page 9

2.2 Types of Filters

Fig 2.1: FIR Filter

Dept. of ECE BVC College of Engineering, Palacharla Page 10

2.2.2 IIR Filter

Fig2.2: IIR Filter

Dept. of ECE BVC College of Engineering, Palacharla Page 11

Comparison between FIR and IIR filters

Advantages of Digital filters

Dept. of ECE BVC College of Engineering, Palacharla Page 12

2.3 ADAPTIVE Filter

Fig 2.3: Adaptive filter

Dept. of ECE BVC College of Engineering, Palacharla Page 13

2.3.2. LMS algorithm

Dept. of ECE BVC College of Engineering, Palacharla Page 14

Fig 2.4: Adaptive filter with LMS algorithm

Dept. of ECE BVC College of Engineering, Palacharla Page 15

An adaptive filter is a computational device that iteratively models the relationship

Fig 2.5. LMS filter algorithm