Professional Documents
Culture Documents
With specialization
VLSI
By
V.L.S. Mounika Devi (146M1D5708)
Mr.P.Srinivas, M.Tech.
BONAFIDE CERTIFICATE
External Examiner
DECLARATION BY THE CANDIDATE
Also, I declare that the matter embodied in this project work has not been
submitted for the award of any degree/diploma of any other institution or university
previously.
V.L.S.MOUNIKA DEVI
(146M1D5708)
Abstract
Adaptive filters, as part of digital signal systems, have been widely used, as well as in
applications such as adaptive beam forming, adaptive noise cancellation, system identification
and channel equalization. In this thesis, we proposed an efficient architecture technique of a
delayed least mean square (DLMS) adaptive filter. In order to achieve lower adaptation-delay
and area-delay-power efficient implementation, the proposed adaptive filter architecture consists
of two main computing blocks, namely the error computation block and weight update block. We
are propose and fixed point implementation scheme architecture with bit level clipping From
synthesis results, we find that the proposed design offers less area-delay product (ADP) and less
energy-delay product (EDP) than the best of the existing structures. This proposed system is
implemented in VHDL code synthesized by Xilinx and simulated using Modelsim.
Tools:
Xilinx ISE14.2
Modelsim 6.4b
Language:
VHDL
CONTENTS
ACKNOWLEDGEMENT
ABSTRACT
LIST OF FIGURES
LIST OF TABLES
CHAPTER Page No
1: INTRODUCTION 1-7
1.1 Introduction 1
1.2 Literature review 4
1.3 Limitation of previous work 6
1.4 Motivation and Scope 6
1.5 Problem Definition 7
1.6 Organisation of the thesis 7
REFERENCES 47-48
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Historically, analog chip design yielded smaller die sizes, but now with the noise
associated with modern sub micrometer designs, digital designs can often be much more
densely integrated than analog designs. This yields compact, low power and low cost digital
designs so modern programmable DSP was developed with more sophisticated functions.
VLSI: VLSI stands for "Very Large Scale Integration". This is the field which involves
packing more and more logic devices into smaller and smaller areas. VLSI, circuits that
would have taken board furls of space can now be put into a small space few millimetres
across! VLSI circuits are everywhere ... your computer, your car, your brand new state-of-
the-art digital camera, the cell-phones, and what have you. All this involves a lot of expertise
on many fronts within the same field, which we will look at in later sections.
The way normal blocks like latches and gates are implemented is different from what
students have seen so far, but the behaviour remains the same. All the miniaturization
involves new things to consider. A lot of thought has to go into actual implementations as
well as design.
Circuit Delays: Large complicated circuits running at very high frequencies have one big
problem to tackle - the problem of delays in propagation of signals through gates and wire
even for areas a few micrometers across! The operation speed is so large that as the delays
add up, they can actually become comparable to the clock speeds.
Power: Another effect of high operation frequencies is increased consumption of power. This
has two-fold effect - devices consume batteries faster, and heat dissipation increases. Coupled
with the fact that surface areas have decreased, heat poses a major threat to the Stability of
the circuit itself.
Layout: Laying out the circuit components is task common to all branches of electronics.
Whats so special in our case is that there are many possible ways to do this; there can be
multiple layers of different materials on the same silicon, there can be different arrangements
of the smaller parts for the same component and soon. The choice between the two is
determined by the way we chose the layout the circuit components. Layout can also affect the
fabrication of VLSI chips, making it either easy or difficult to implement the components on
the silicon.
Electronics took birth in 1897 when J.A. Fleming developed a vacuum diode. Useful
electronics came in 1906 when vacuum triode was invented by Lee De Forest which made
electrical amplification of weak radio signals and audio signals possible with a non-
mechanical device. Later, around 1925, tetrode and pentode vacuum tubes were developed.
These tubes dominated the field of electronics till the end of World War II. Until 1950 this
field was called "radio technology" because its principal application was the design and
theory of radio transmitters, receivers, and vacuum tubes. The era of semiconductor
electronics began with the invention of the junction transistor in 1948 at Bell Laboratories.
Soon, the transistors replaced the bulky vacuum tubes in different electronic circuits.
When engineers tried to build complex circuits using the vacuum tube, they
quickly became aware of its limitations. The first digital computer ENIAC, for example,
was a huge monster that weighed over thirty tons, and consumed 200 kilowatts of
electrical power. It had around 18,000 vacuum tubes that constantly burned out, making it
very unreliable.
When building a circuit, it is very important that all connections are intact. If not,
the electrical current will be stopped on its way through the circuit, making the circuit
fail. Before the integrated circuit, assembly workers had to construct circuits by hand,
soldering each component in place and connecting them with metal wires. Engineers soon
realized that manually assembling the vast number of tiny components needed in, for
example, a computer would be impossible, especially without generating a single faulty
connection. Another problem was the size of the circuits. A complex circuit, like a computer,
was dependent on speed. If the components of the computer were too large or the wires
interconnecting them too long, the electric signals couldn't travel fast enough through the
circuit, thus making the computer too slow to be effective. Advanced circuits contained
so many components and connections that they were virtually impossible to build. This
problem was known as the tyranny of numbers.
It was Kilby's idea to make all the components and the chip out of the same block
(monolith) of semiconductor material. Kilby presented his new idea to his superiors. He was
allowed to build a test version of his circuit. In September 1958, he had his first integrated
circuit ready.
Moores Law: It states that the number of transistors in an integrated circuit doubles
approximately for every two years. Integrated circuits are much smaller and consume less
power than the discrete components used to build the electronic systems before the 1960.
Integration allows us to build systems with many more transistors, allowing much more
computing power to be applied to solving a problem. Integrated circuits are much easier to
design and manufacture and are more reliable than discrete systems. Integrated circuits
improve system characteristics in several ways.
Size: Integrated circuits are much smaller both transistors and wires are shrunk to micrometer
sizes, compared to the millimeter or centimeter scales of discrete components, small size
leads to advantages in speed and power consumption, since smaller components have smaller
parasitic resistances, capacitances, and inductances.
Speed: Signals can be switched between logic 0 and logic 1 much quicker within a chip than
they can between chips. Communication within a chip can occur hundreds of times faster
than communication between chips on a printed circuit board. The high speed of circuits on-
chip is due to their small size, smaller components and wires have smaller parasitic
capacitance to slow down the signal.
Power: Logic operation within a chip also takes much less power. Once again low power
consumption is largely due to small size of circuits on chip-smaller parasitic capacitances and
resistances require less power to drive them.
Introduction of signal filtering concepts: A filter is a device or process that removes some
unwanted components from a signal. Filtering is a class of signal processing, the defining
feature of filters being complete or partial suppression of some aspect of the signal.
There are many different bases of classifying filters and these overlap in many
different ways. There is no simple hierarchical classification. Filters may be
Linear or Non-linear
Time invariant or Time Variant
Analog or digital
Passive or active etc...
Filters are essential to the operation of most electronic circuits. Filters used for direct
filtering can be either Fixed or Adaptive. Adaptive filters are widely used in Communication
and Digital Signal Processor applications. Adaptive filtering can be used strictly for analysis
or synthesis of a system. Adaptive filter is a self adjusting the filter coefficients according to
an adaptive algorithm. It is useful where the complete knowledge of environment is not
available. Adaptive filters are commonly classified as: Linear: Estimate of quantity is
computed as a linear combination of observations applied to filter input. Non Linear: Neural
Networks.
Most adaptive filters are implemented as FIR filters, because they are inherently stable.
Generally, Filter algorithms used are
In this thesis, digital filter can be implementation by using DLMS algorithm with a fixed
point representation.
In order to start the thesis, the first step is study the research papers that have been
previously performed by other researchers, papers related to this work are chosen and
studied.
An efficient systolic architecture for the DLMS adaptive filter and its applications,
Tree methods enhance the performance of adaptive filter but they lack in modularity, local
connection. Also with the increase in tree stages critical period also increases. In order to
achieve a lower adaption delay again, Van and Feng have proposed a systolic architecture,
where they have used relatively large processing elements (PEs). The PE combines the
systolic architecture and tree structure to reduce adaption delay. But it involves the critical
path of one MAC operation.
The existing work on the DLMS adaptive filter does not discuss the fixed-point
implementation issues, e.g., location of radix point, choice of word length, and quantization
at various stages of computation, although they directly affect the convergence performance,
particularly due to the recursive behaviour of the LMS algorithm. Therefore, fixed-point
implementation issues are given adequate emphasis in this paper. Besides, we present here
the optimization of our previously reported design to reduce the number of pipeline delays
along with the area, sampling period, and energy consumption. The proposed design is a
64tap fixed point implementation. The proposed design is found to be more efficient in terms
of the power-delay product (PDP) and energy-delay product (EDP) compared to the existing
structures.
The block diagram of the DLMS adaptive filter is shown in chapter 2, where the
adaptation delay of m cycles amounts to the delay introduced by the whole of adaptive filter
structure consisting of finite impulse response (FIR) filtering and the weight-update process.
It is shown in that the adaptation delay of conventional LMS can be decomposed into two
parts: one part is the delay introduced by the pipeline stages in FIR filtering, and the other
part is due to the delay involved in pipelining the weight update process
Motivation The primary motivation for adopting filter design lies in the fact that it can
provide a logic design methodology for designing ultra-low power circuits beyond K*T*ln2
limit for those emerging nanotechnologies in which the energy dissipated due to information
destruction will be significant factor of the overall heat dissipation. For achieving low power
adaptation delay and area-delay-power efficient implementation we use combinational
blocks.
Scope
Research is also being carried out in developing digital signal processors which use
adaptive filter for computational purposes. Certain filters have also been realized for the
same. Similar development has also taken place in digital signal processing for channel
equalization. In differential power analysis, the cryptographic machines will not dissipate
heat due to which the intruders will not be able to perform power analysis or timing attacks.
VLSI signal processing is the upcoming field in any communication techniques which
promises to increase the speed of computation, reduce area and power dissipation. In this
way designing circuits in this domain will be promising in near future.
Chapter 1 provides the introduction of the thesis which about filters and VLSI
circuits.
Chapter 2 provides the information about basic digital filters which includes mainly
LMS adaptive filter.
Chapter 5 and 6 explains about results and applications of the given thesis.
CHAPTER 2
DIGITAL FILTERS
2.1 Introduction
Digital filters are typically used to modify or alter the attributes of a signal in the time
domain or frequency domain. The most common digital filter is the linear time -invariant
(LTI) filter. An LTI interacts with its input signal through a process called linear
convolution, denoted by y= f*x where f is the filters impulse response, x is the input signal
and y is the convolved output. The linear convolution process is formally defined by
= f[k]x[n-k]
LTI Digital filters are generally classified as being finite impulse response (FIR) or
Infinite impulse response (IIR). The FIR filter consists of a finite number of sample values,
reducing the above convolution sum to finite sum per output sample instant. An IIR filter
however requires that an infinite sum be performed. An FIR design and implementation
methodology is discussed in this thesis.
The studying of digital filters is found in their growing popularity as primary DSP
operation. Digital filters are rapidly replacing classic analog filters, which were implemented
using RLC components and operational amplifiers. Analog filters were mathematically
modelled using ordinary differential equations of Laplace transforms. They were analyzed in
the time or s domain. Analog prototypes are now only used in IIR design, while FIR is
typically designed using digital computer specifications and algorithms.
In this thesis, it is assumed that a digital filter has been designed and selected for
implementation. The major components of digital filter as been identified below.
The input of a digital filter is a series of discrete samples obtained by sampling the
Input waveform. The sampling rate must meet the Nyquist criteria that we covered in our
sampling lecture (highest frequency of input signal </ = 2 x sampling frequency). The
term x(n) means the input at a time (n).
Z -1
Z-1 represents a time delay that is equal to the sampling period. This is also called
a unit delay. Therefore, each z box delays the samples for one sampling period. In the
diagram, this is shown by the input going into the delay box as x(n) and coming out as
x(n-1). We see this because x (n) means the input at a time (n), and x(n-1) means the input
at time (n-1). What actually happens is that x (n-1) is the previous input that has been
saved in the memory of the DSP.
Filter Taps and Weights
The output of each delay box is called a tap. Taps are usually fed into scalars which scale
the value of the delayed sample to the required value by multiplying the input (or delayed
input) by a coefficient. In the diagram, these are marked as b0, b1 and b2. The scaling factor
is called the weight. In mathematical terms, the weight is multiplied by the delayed input, so
the output of the first tap is b0*x(n). The next tap output will be b1*x(n-1), and the output of
the last tap is b2*x(n-2).
Summing Junctions
The outputs of the weights are fed into summing junctions, which add the weighted,
delayed, forward-fed forward outputs from taps. So in this example, the output of the first
summing junction is b0*x(n) + b1*x(n- 1). At the next summing junction, this is added to the
output of the final tap, giving b0*x(n) + b1*x(n-1) + b2*x(n-2), which is the output.
The Output y(n)
The output of a digital filter is a combination of a number of delayed and weighted samples,
and is usually called y(n).
The Operation of Digital Filters
In summary, the output is y(n) and the present sample is x(n). The previous samples would
then be: x(n-1) = one unit time delay
x(n-2) = two unit time delay
When x(n) arrives at the input, the taps are feeding the delayed samples to weights
b1 and b2. Therefore sampling at any sampling instant, the value of the output can be
calculated using the weighted sum of the current sample and two previous samples as
follows:
y(n) = b0*x(n) + b1*x(n-1) + b2*x(n-2)
Tools Before we consider more complex digital filters, let us first learn about some
mathematical tools used in digital filtering. This will solidify our understanding of
digital filters and provide a foundation for future learning of more complex subjects.
Impulse Function
An impulse is defined as an idealized rectangular pulse of area 1.0, zero width, and
infinite amplitude. It is typically expressed by an integral as shown on the above diagram.
This is a general formula that allows us to calculate the area under any pulse.
Weighted Impulse Function
Consider the pulse with an amplitude of 3 and a width of 2, as shown on the slide. Using
the same integral to calculate the area under it, we find that it equals 6. A weighted impulse
function is similar to this. It has an area of A and amplitude of infinity. It is represented by the
integral as shown on the diagram. Obviously this is impossible in the real world, but the
weighted impulse function is extensively used in digital signal processing to help explain
DSP techniques. For example, an analog waveform can be represented as a multiplication of
the analog signal with a periodic weighted impulse function whose frequency is equal to the
sampling frequency.
An FIR with constant coefficients is an LTI digital filter. The output of an FIR of
order or length L, to an input time-series x[n], is given by a finite version of the convolution
sum is
y[n] = x[n] *f[n]
= = f[k]x[n k]
Where f[0] is not equal to zero through f[L-1] not equal to zero are the filters L
coefficients. They also correspond to the FIR impulse response. The Lth order LTI filter is
graphically interpreted in fig shown above. It can be seen consist of a collection of a tapped
delay line, adders and multipliers. One of the operands presented to each multiplier is an
FIR coefficients, referred to as tap-weight. We can call it as transversal filter or tapped
delay line.
The impulse response of an IIR filter has infinite length, hence the name infinite impulse
response. The feedback loop makes it possible for an IIR filter to be unstable. It is possible to
check for this instability during the design process, but sometimes a filter that is stable on
paper may become unstable in practice due to round off and truncation in the DSP hardware.
It is important to examine stability issues closely when working with IIR filters. In some
cases, the instability conditions of an IIR filter can be used to advantage in designing
oscillators.
FIR Vs IIR
1. IIR is infinite and used for applications where linear characteristics are not of concern.
2. FIR filters are Finite IR filters which are required for linear-phase characteristics.
3. IIR is better for lower-order tapping, whereas the FIR filter is used for higher-order
tapping.
4. FIR filters are preferred over IIR because they are more stable, and feedback is not
involved.
5. IIR filters are recursive and used as an alternate, whereas FIR filters have become too long
and cause problems in various applications.
So, Most adaptive filters are implemented as FIR filters, because they are inherently stable.
up even wider possibilities. It is possible to design adaptive filters that adapt themselves
to changing conditions. For adaptive filters, a mechanism must be designed to change the
coefficients of the filter in accordance with changing conditions. Such filters are very
useful in modems. Since the properties of telephone lines change continuously, adaptive
filters offer the ideal solution for these environments.
From the above fig 2.3. x[n] = input of the adaptive filter
y[n] = output of the adaptive filter.
d[n] = desired response of the adaptive filter
e[n] = d[n] y[n] = estimation error
With the continuous development of the adaptive algorithm application in the digital signal
processing field, there are many issues to attract everyone's attention, including a large
amount of computation and the difficult to achieve high-speed and real-time. For a long time,
adaptive filtering algorithms are based on the DSP chip, and achieved by the compilation or
high-level language programming procedures. This can be a good way to meet the require-
ments in less demanding situations of real-time, but in the higher real-time requirements
occasion and the harsh electromagnetic environment, it has been unable to meet the proc-
essing speed and robustness and so on. Field Programmable Gate Array (FPGA) can provide
a new method for the hardware implementation of adaptive algorithm through its high
flexibility and integration .
In some unknown signal characteristic conditions, according to some best practices,
from the initial conditions of the known part of the signal characteristic, basing on some
adaptive recursive algorithm, after completing a certain number of recursion, statistical
approximation methods converge to the optimal solution. When the statistical characteristics
of the input signal is unknown, or the statistical properties of the input signal changes ,
adaptive filter can automatically adjust its filter parameters in iteration, required to meet
certain criteria in order to achieve optimal filtering . Thus, the adaptive filter has the ability to
self-regulate and track, so in non-stationary environments, adaptive filtering can be a good
track the changes of the signal.
The basic idea behind LMS filter is to update the filter weights to converge to the
optimum filter weight. The algorithm starts by assuming a small weights (zero in most cases),
and at each step, where the gradient of the mean square error, the weights are found and
updated. If the MSE-gradient is positive, the error increases positively, else the same weight
is used for further iterations, which means we need to reduce the weights. If the gradient is
negative, weight need to be increased .Hence, basic weight update equation during the nth
iteration:
Where represents the mean-square error, is the step size, Wn is the weight vector. The
negative sign indicates that, need to change the weights in a direction opposite to that of the
gradient slope.
The mean-square error which is a function of filter weights is a quadratic function which
says that it has only one extreme, which minimizes the mean-square error, is the optimal
weight. The LMS thus, approaches towards this optimal weight by ascending/descending
down the mean square-error verses filter weight curve.
The LEAST MEAN SQUARE (LMS) adaptive filter is the most popular and most
widely used adaptive filter, not only because of its simplicity but also because of its
satisfactory convergence performance. The direct-form LMS adaptive filter involves a long
critical path due to an inner-product computation to obtain the filter output.
1. Filter output
2. Estimation error
e[n] = d[n] - y[n]
3. Tap-weight Adaptation
w[n+1] = w[n] + u[n-k]e*[n]
Create an algorithm that minimizes E{|e(n)|2}, just like the SD, but based on unknown
statistics. A strategy that then can be used is to uses estimates of the autocorrelation matrix R
and the cross correlation vector p. If instantaneous estimates are chosen,
R(n) = u(n)uH(n)
p(n) = u(n)d*(n)
The resulting method is the Least Mean Squares algorithm.
Wn+1 = Wn + e(n-m)x(n-m)
where m is the adaptation delay. The structure of conventional delayed LMS adaptive filter is
shown in figure 2.3. It can be seen that the adaptation-delay m is the number of cycles
required for the error corresponding to any given sampling instant to become available to the
weight adaptation circuit.
In the delayed LMS algorithm the assumption is that the gradient of the error [n] =e[n]x[n]
does not change much if we delay the coefficient update by a couple of samples., i.e.
[n] = [n]-D. It has been as long as the delay is less than the system order, i.e. filter length,
this assumption is well true and the update does not degrade the convergence speed. Longs
original DLMS algorithm only considered pipelining the adder tree of the adaptive filter
assuming also that multiplication and coefficient update can be done in one clock cycle but
for a FPGA implementation multiplier and the coefficient update requires additional path and
D2 in the coefficient update path the LMS algorithm become
e[n-D1] = d[n-D1] ft[n-D1]x[n-D1]
f[n+1] = f[n-D1-D2] + [n-D1-D2]x[n-D1-D2]
CHAPTER 3
DESIGN METHODOLOGY
3.1 DLMS algorithm
The weight of LMS adaptive filter during the nth iteration are updated according to the
equations below
where e[n] = d[n] y[n] and y[n] = w[n]T. X[n]. Here input vector is x[n] and the weight
vector w[n] at the nth iteration are
d[n] is the desired response, y[n] is the filter output, and e[n] denotes the error
computed during the nth iteration. is the step-size, and N is the number of weights used in
the LMS adaptive filter.
In case of pipelined designs with m pipeline stages, the error e[n] becomes available
after m cycles, where m is called the adaptation-delay. The DLMS algorithm therefore uses
the delayed error e[n-m], i.e., the error corresponding to (n-m)th iteration for updating the
current weight instead of the recent most error.
The weight-update equation of DLMS adaptive filter is given by
w[n+1] = w[n] + e[n-m] . x[n-m]
The block diagram of DLMS adaptive filter is depicted in Fig. 1, where the adaptation-delay
of m cycles amounts to the delay introduced by the whole of adaptive filter structure
consisting of FIR filtering and weight-update process. It is shown in [12] that the adaptation-
delay of conventional LMS can be decomposed into two parts, where one part is the
delay introduced by the pipeline stages in FIR filtering and the other part is due to the delay
involved in pipelining of weight update process.
There are two main computing blocks in the adaptive filter architecture:
1) The error-computation block
2) Weight-update block.
In this, we discuss the design strategy of the proposed structure to minimize the adaptation
delay in the error-computation block, followed by the weight-update block.
most significant digit (MSD), i.e., (uL1uL2) of the input sample, the AOC (L/2 1) is fed
with w, 2w, and w as input since (uL1uL2) can have four possible values 0, 1, 2, and
1.
Fig 3.5. Structure and function of AND/OR cell. Binary operators and + in (b) and (c)
are implemented using AND and OR gates, respectively.
have shown all possible locations of pipeline latches by dashed lines, to reduce the critical
path to one addition time.
If we introduce pipeline latches after every addition, it would require L(N 1)/2 +
L/2 1 latches in log2 N + log2 L 1 stages, which would lead to a high adaptation delay
and introduce a large overhead of area and power consumption for large values of N and L.
On the other hand, some of those pipeline latches are redundant in the sense that they are not
required to maintain a critical path of one addition time. The final adder in the shiftadd tree
contributes to the maximum delay to the critical path. Based on that observation, we have
identified the pipeline latches that do not contribute significantly to the critical path and could
exclude those without any noticeable increase of the critical path. The location of pipeline
latches for filter lengths N = 8, 16, and 32 and for input size L = 8 are shown in Table I. The
pipelining is performed by a feed forward cut-set retiming of the error-computation block.
the MAC operations are performed by N PPGs, followed by N shiftadd trees. Each of the
PPGs generates L/2 partial products corresponding to the product of the recently shifted
error value e with L/2, the number of 2-b digits of the input word xi , where the sub
expression 3e is shared within the multiplier. Since the scaled error (e) is multiplied
with all the N delayed input values in the weight-update block, this sub expression can be
shared across all the multipliers as well. This leads to substantial reduction of the adder
complexity. The final outputs of MAC units constitute the desired updated weights to be used
as inputs to the error-computation block as well as the weight update block for the next
iteration.
Noticed that during weight adaptation, the error with n1 delays is used while the filtering unit
uses the weights delayed by n2 cycles. By this approach the adaptation-delay is effectively
reduced by n2 cycles. In the next section, we show that the proposed algorithm can be
implemented efficiently with very low adaptation delay which is not affected substantially by
the increase in filter order.
CHAPTER 4
DESIGN IMPLEMENTATION
4.1 Fixed point implementation
For fixed-point implementation, the choice of word lengths and radix points for
input samples, weights, and internal signals need to be decided. Fig. 4.1 shows the fixed-
point representation of a binary number. Let (X, Xi ) be a fixed-point representation of a
binary number where X is the word length and Xi is the integer length. The word length
and location of radix point of xn and wn in Fig. 4.1 need to be predetermined by the
hardware designer taking the design constraints, such as desired accuracy and hardware
complexity, into consideration.
Assuming (L, Li ) and (W,Wi ), respectively, as the representations of input signals and filter
weights, all other signals in Figs. 4.1 can be decided as shown in Table II.
x, w, p, q, y, d, and e can be found in the error-computation block, r, and s are defined
in the weight-update block in Fig. 8. It is to be noted that all the subscripts and time indices
of signals are omitted for simplicity of notation.
The signal pi j , which is the output of PPG block (shown in Fig. 4), has at most three
times the value of input coefficients. Thus, we can add two more bits to the word length and
to the integer length of the coefficients to avoid overflow. The output of each stage in the
adder tree in Fig. 3.6 is one bit more than the size of input signals, so that the fixed-point
representation of the output of the adder tree with log2 N stages becomes (W + log2 N +
2,Wi + log2 N + 2). Accordingly, the output of the shiftadd tree would be of the form
(W+L+log2 N,Wi+Li+ log2 N), assuming that no truncation of any least significant bits
(LSB) is performed in the adder tree or the shiftadd tree. However, the number of bits of the
output of the shiftadd tree is designed to have W bits. The most significant W bits need to be
retained out of (W + L + log2 N) bits, which results in the fixed-point representation (W,Wi +
Li +log2 N) for y, as shown in Table II. Let the representation of the desired signal d be the
same as y, even though its quantization is usually given as the input. For this purpose, the
specific scaling/sign extension and truncation/zero padding are required. Since the LMS
algorithm performs learning so that y has the same sign as d, the error signal e can also be set
to have the same representation as y without overflow after the subtraction.
Table 4.1: Fixed-Point representation of the signals of the Proposed DLMS Adaptive Filter
( = 2 (Li+Log2 N))
It is shown in that the convergence of an N-tap DLMS adaptive filter with n1 adaptation
delay will be ensured if
where 2 x is the average power of input samples. Furthermore, if the value of is defined as
(power of 2) 2n, where n Wi+Li+log2 N, the multiplication with is equivalent to the
change of location of the radix point. Since the multiplication with does not need any
arithmetic operation, it does not introduce any truncation error. If we need to use a smaller
step size, i.e., n > Wi + Li +log2 N, some of the LSBs of en need to be truncated. If we
assume that n = Li + log2 N, i.e., = 2(Li+log2 N) , as in Table II, the representation of en
should be (W,Wi ) without any truncation.
The weight increment term s (shown in Fig. 8), which is equivalent to enxn, is
required to have fixed-point representation (W + L,Wi + Li ). However, only Wi MSBs in the
computation of the shiftadd tree of the weight-update circuit are to be retained, while the
rest of the more significant bits of MSBs need to be discarded. This is in accordance with the
assumptions that, as the weights converge toward the optimal value, the weight increment
terms become smaller, and the MSB end of error term contains more number of zeros.
Also, in our design, L Li LSBs of weight increment terms are truncated so that the
terms have the same fixed-point representation as the weight values. We also assume that no
overflow occurs during the addition for the weight update. Otherwise, the word length of the
weights should be increased at every iteration, which is not desirable. The assumption is valid
since the weight increment terms are small when the weights are converged. Also when
overflow occurs during the training period, the weight updating is not appropriate and will
lead to additional iterations to reach convergence. Accordingly, the updated weight can be
computed in truncated form (W,Wi ) and fed into the error computation block.
In this we also use AOCs (and-or cells). These are the initial circuits to design LMS adaptive
Filter with the help of Error computation block and weight update block.
1. Error computation block: In this block module we can consider sub blocks PPG,
Adder tree and shift-add tree. In this proposed design in VHDL we write a code for 64
coefficients and also give sub blocks of 64 ppg, adder tree and shift add tree.
2. Weight Update block: It contains PPG and shift add tree.
The below figure shows the internal architecture of the proposed design. The
components structure present in the proposed design. These components were
developed in VHDL language.
Decoder (2*3)
AOC
Adder Tree
Decoder (2*3)
Shift Add Tree
The VHDL codes of various components present in the proposed design in Xilinx ISE14.2
version. And synthesized using that IDE and simulated by using the simulator called as
model sim. These tools discussed in detailed below.
A digital system can be described at different levels of abstraction and from different
points of view. An HDL should faithfully and accurately model and describe a circuit,
whether already built or under development, from either the structural or behavioral views, at
the desired level of abstraction. Because HDLs are modelled after hardware, their semantics
and use are very different from those of traditional programming languages.
The characteristics of digital hardware, on the other hand, are very different from
those of the sequential model. A typical digital system is normally built by smaller parts, with
customized wiring that connects the input and output ports of these parts. When signal
changes, the parts connected to the signal are activated and a set of new operations is initiated
accordingly. These operations are performed concurrently, and each operation will take a
specific amount of time, which represents the propagation delay of a particular part, to
complete. After completion, each part updates the value of the corresponding output port. If
the value is changed, the output signal will in turn activate all the connected parts and initiate
another round of operations. This description shows several unique characteristics of digital
systems, including the connections of parts, concurrent operations, and the concept of
propagation delay and timing. The sequential model used in traditional programming
languages cannot capture the characteristics of digital hardware, and there is a need for
special languages (i.e., HDLs) that are designed to model digital hardware.
VHDL includes facilities for describing logical structure and function of digital
systems at a number of levels of abstraction, from system level down to the gate level. It is
intended, among other things, as a modelling language for specification and simulation. We
can also use it for hardware synthesis if we restrict ourselves to a subset that can be
automatically translated into hardware.
VHDL arose out of the United States governments Very High Speed Integrated
Circuits (VHSIC) program. In the course of this program, it became clear that there was a
need for a standard language for describing the structure and function of integrated circuits
(ICs). Hence the VHSIC Hardware Description Language (VHDL) was developed. It was
subsequently developed further under the auspices of the Institute of Electrical and Electronic
Engineers (IEEE) and adopted in the form of the IEEE Standard 1076, Standard VHDL
Language Reference Manual, in 1987. This first standard version of the language is often
referred to as VHDL-87.
After the initial release, various extensions were developed to facilitate various design and
modelling requirements. These extensions are documented in several IEEE standards:
i. IEEE standard 1076.1-1999, VHDL Analog and Mixed Signal Extensions (VHDL-
AMS): defines the extension for analog and mixed-signal modelling.
iii. IEEE standard 1076.3- 1997, Synthesis Packages: defines arithmetic operations over
a collection of bits.
iv. IEEE standard 1076.4-1995, VHDL Initiative towards ASK Libraries (VITAL):
defines a mechanism to add detailed timing information to ASIC cells.
vi. IEEE standard 1 164- 1993 Multivalve Logic System for VHDL Model
Interoperability (std-logicJl64): defines new data types to model multivalve logic.
vii. IEEE standard 1029.1-1998, VHDL Waveform and Vector Exchange to Support
Design and Test Verification (WAVES): defines how to use VHDL to exchange
information in a simulation environment.
The following diagram shows the basic steps for simulating a design in Xilinx.
3. Next select the HDL used for simulation. If you are unsure select Both VHDL and
Verilog. However, this will increase the compilation time and the disk space required.
4. Then select all the device families that you will be working with. Again the more
number of devices, more the compilation time and the disk space required. Remember
that you can always run the compilation wizard at a later time for additional devices.
5. The next window is for selecting libraries for Functional and Timing Simulation.
Different libraries are required for different types of simulation (behavioral, post-
route, etc.). We suggest that you select All Libraries as the default option. Interested
users can refer to Chapter 6 of the Xilinx Synthesis and Simulation Design Guide for
additional information.
6. Finally the window for Output directory for compiled libraries is shown. We suggest
leaving the default values that Xilinx picks. Then select Launch Compile Process.
7. Be patient as the compilation can take a long time depending on the options that you
have chosen.
8. The compile process may have contained a lot of warnings but should be error-free.
We have not explored the reasons behind these warnings, but they do not appear to
affect the simulation of any of our designs.
9. Once the process is completed, open c:\modeltech64 10.0cnmodelsim.ini and verifies
if there are libraries pointing to the output directory entered in step 6. This will
happen only if you have set the environment variables.
Library compilation is now complete. If you have not set the environment variables
then the wizard creates a modelsim.ini in the output directory entered in step 6. By default
this location is c:\Xilinxn13.xnISE DSnISE. Open this file and verify that it contains the
location of the libraries that were just compiled. This file should be copied into every project
you create.
4.6 ModelSim
ModelSim is a multi-language HDL simulation environment by Mentor Graphics,[1] for
simulation of hardware description languages such as VHDL, Verilog and SystemC, and
includes a built-in C debugger. ModelSim can be used independently, or in conjunction
with Altera Quartus or Xilinx ISE. Simulation is performed using the graphical user
interface (GUI), or automatically using scripts.
ModelSim is offered in multiple editions such as Modelsim PE, ModelSim XE, and
Modelsim SE. Modelsim SE offers high-performance and advanced debugging capabilities,
while Modelsim PE is the entry-level simulator for hobbyists and students. Modelsim SE is
used in large multi-million gate designs, and is supported on Microsoft Windows and Linux,
in 32-bit and 64-bit architectures.
Modelsim XE stands for Xilinx Edition, and is specially designed for integration with Xilinx
ISE. Modelsim XE enables testing of HDL programs written for Xilinx Virtex/Spartan series
FPGA's without needed physical hardware.
Modelsim uses a unified kernel for simulation of all supported languages, and the method of
debugging embedded C code is the same as VHDL or Verilog.
Modelsim enables simulation, verification and debugging for the following languages:
VHDL
Verilog
Verilog 2001
SystemVerilog
PSL
SystemC
Mentor Graphics was the first to combine single kernel simulator (SKS) technology with a
unified debug environment for Verilog, VHDL, and SystemC. The combination of industry-
leading, native SKS performance with the best integrated debug and analysis environment
make ModelSim the simulator of choice for both ASIC and FPGA designs. The best
standards and platform support in the industry make it easy to adopt in the majority of
process and tool flows.
CHAPTER 5
SIMULATION RESULTS
5.1 Simulation Output
The simulate proposed system architecture in Modelsim and to analysis the area,
power, and delay of the proposed system in Spartan 6 by using Xilinx software. The
simulation result for adaptive filter is shows in figure 8. The synthesis report of the adaptive
filter is shown in figure5.2. Finally the comparison of the proposed system is detailed in
table.
0.3
0.25
0.2
0.1
0.05
0
Power consumption(W)
The power consumption of proposed and existing systems as compared in above table 2. The
existing system power consumption is around 0.24mW. but in case of proposed system power
consumption is around 0.14mW. The proposed system is 644tap fixed point filter. From this
we can calculate energy per sample and also find energy delay product (EDP) by using
different compliers. The differences shown in the table
10000
9000
8000
7000
6000
5000 Existing system Proposed system
4000
3000
2000
1000
0
Number of slices
The area occupied by the filter is obtained from the synthesis results. The proposed hardware
for the test circuit has been described in VHDL and synthesized using Xilinx ISE 14.2. From
above figure we observed the existing and proposed systems of required number of slices.
Existing system contains 32tap coefficient and proposed system contains 64tap coefficient so
number of slices required is more compared to existing system. But we observed the power
consumption in proposed system.
The proposed design can be designed in VHDL and synthesized by compiler for
different filter orders. The word length of the input samples and weights are chosen to be 8,
16, 32, 64 and its multiplication without any additional circuitry. In table we have shown the
synthesis results of existing and proposed designs in terms of data arrival time, power
consumption etc.
The below table shown that energy delay product (EDP) also. It comes from product of data
arrival time and energy of the particular filter length. The proposed design results show better
results compared to existing one. Apart from this based on number of slices we have to
calculate area occupied by the filter also. And also calculate area delay product also from this
results.
Filter length 32 64
CHAPTER 6
6.1 Advantages
Reduce the critical path to support high input-sampling rates. Hence, throughput
increases.
Faster performance.
It gives higher performance results of EDP (Energy delay Product) and ADP (Area
Delay Product).
6.2 Applications
1. Echo Cancellation:
Echo suppression and echo cancellation are methods in telephony to improve voice
quality by preventing echo from being created or removing it after it is already present. In
addition to improving subjective quality, this process increases the capacity achieved
through silence suppression by preventing echo from traveling across a network.
These methods are commonly called acoustic echo suppression (AES) and acoustic echo
cancellation (AEC), and more rarely line echo cancellation (LEC). In some cases, these terms
are more precise, as there are various types and causes of echo with unique characteristics,
including acoustic echo (sounds from a loudspeaker being reflected and recorded by a
microphone, which can vary substantially over time) and line echo (electrical impulses
caused by, e.g., coupling between the sending and receiving wires, impedance mismatches,
electrical reflections, etc., which varies much less than acoustic echo). In practice, however,
the same techniques are used to treat all types of echo, so an acoustic echo canceller can
cancel line echo as well as acoustic echo. "AEC" in particular is commonly used to refer to
echo cancellers in general, regardless of whether they were intended for acoustic echo, line
echo, or both.
Adaptive beam forming was initially developed in the 1960s for the military applications of
sonar and radar.[1] There exist several modern applications for beam forming, one of the most
visible applications being commercial wireless networks such as LTE. Initial applications of
adaptive beam forming were largely focused in radar and electronic countermeasures to
mitigate the effect of signal jamming in the military domain.
Radar uses can be seen here phased array radar. Although not strictly adaptive, these
radar applications make use of either static or dynamic (scanning) beam forming.
Commercial wireless standards such as 3GPP Long Term Evolution (LTE
telecommunication) and IEEE 802.16 WiMax rely on adaptive beam forming to enable
essential services within each standard.
3. System Identification
4. Channel Equalization
Equalizers are critical to the successful operation of electronic systems such as analog
broadcast television. In this application the actual waveform of the transmitted signal must be
preserved, not just its frequency content. Equalizing filters must cancel out any group delay
and phase delay between different frequency components.
Example of digital transmission system using channel equalization as shown in fig 6.2
below.
Conclusion
Future Work
For low sampling rate, this proposed design clock is slower than usable frequency.
And also it maintains under only low operating voltage. This design can be further extended
to floating point considerations also.
REFERENCES
[1] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood Cliffs, NJ, USA:
Prentice-Hall, 1985.
[2] S. Haykin and B. Widrow, Least-Mean-Square Adaptive Filters. Hoboken, NJ, USA:
Wiley, 2003.
[3] M. D. Meyer and D. P. Agrawal, A modular pipelined implementation of a delayed LMS
transversal adaptive filter, in Proc. IEEE Int. Symp.Circuits Syst., May 1990, pp. 1943
1946.
[4] G. Long, F. Ling, and J. G. Proakis, The LMS algorithm with delayed coefficient
adaptation, IEEE Trans. Acoust., Speech, Signal Process.,vol. 37, no. 9, pp. 13971405,
Sep. 1989.
[5] G. Long, F. Ling, and J. G. Proakis, Corrections to The LMS algorithm with delayed
coefficient adaptation, IEEE Trans. Signal Process., vol. 40, no. 1, pp. 230232, Jan. 1992.
[6] H. Herzberg and R. Haimi-Cohen, A systolic array realization of an LMS adaptive filter
and the effects of delayed adaptation, IEEE Trans. Signal Process., vol. 40, no. 11, pp.
27992803, Nov. 1992.
[7] M. D. Meyer and D. P. Agrawal, A high sampling rate delayed LMS filter architecture,
IEEE Trans. Circuits Syst. II, Analog Digital Signal Process., vol. 40, no. 11, pp. 727729,
Nov. 1993.
[8] S. Ramanathan and V. Visvanathan, A systolic architecture for LMS adaptive filtering
with minimal adaptation delay, in Proc. Int. Conf. Very Large Scale Integr. (VLSI) Design,
Jan. 1996, pp. 286289.
[9] Y.Yi, R.Woods, L.K.Ting, and C.F.N.Cowan, High speed FPGA-based implementations
of delayed-LMS filters, J. Very Large Scale Integr. (VLSI) Signal Process., vol. 39, nos. 1
2, pp. 113131,Jan. 2005.
[10] L. D. Van and W. S. Feng, An efficient systolic architecture for the DLMS adaptive
filter and its applications, IEEE Trans. CircuitsSyst. II, Analog Digital Signal Process., vol.
48, no. 4, pp. 359366, Apr. 2001.
[11] L.-K. Ting, R. Woods, and C. F. N. Cowan, Virtex FPGA implementation of a
pipelined adaptive LMS predictor for electronic support measures receivers, IEEE Trans.
Very Large Scale Integr. (VLSI) Syst.,vol. 13, no. 1, pp. 8699, Jan. 2005.
[12] P. K. Meher and M. Maheshwari, A high-speed FIR adaptive filter architecture using a
modified delayed LMS algorithm, in Proc. IEEE Int. Symp. Circuits Syst., May 2011,
[13] P. K. Meher and S. Y. Park, Low adaptation-delay LMS adaptive filter part-I:
Introducing a novel multiplication cell, in Proc. IEEE Int.Midwest Symp. Circuits Syst.,
Aug. 2011, pp. 14.
[14] P. K. Meher and S. Y. Park, Low adaptation-delay LMS adaptive filter part-II: An
optimized architecture, in Proc. IEEE Int. Midwest Symp.Circuits Syst., Aug. 2011, pp. 14.
[15] K. K. Parhi, VLSI Digital Signal Procesing Systems: Design and Implementation. New
York, USA: Wiley, 1999.
[16] C. Caraiscos and B. Liu, A roundoff error analysis of the LMS adaptive algorithm,
IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 1,pp. 3441, Feb. 1984.
[17] R. Rocher, D. Menard, O. Sentieys, and P. Scalart, Accuracy evaluation of fixed-point
LMS algorithm, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2004, pp.
237240.
[18] Xilinx14.2, Synthesis and Simulation Design Guide, UG626 (v14.2)
AOC :
library ieee;
use ieee.std_logic_1164.all;
entity AOC is
port (
b0 : in std_logic;
b1 : in std_logic;
b2 : in std_logic;
w : in std_logic_vector(9 downto 0);
w2 : in std_logic_vector(9 downto 0);
w3 : in std_logic_vector(9 downto 0);
w_out : out std_logic_vector(9 downto 0)
);
end;
begin
bb0 <= (others => '0') when b0 = '0' else (others => '1');
bb1 <= (others => '0') when b1 = '0' else (others => '1');
bb2 <= (others => '0') when b2 = '0' else (others => '1');
end;
Decoder:
library ieee;
use ieee.std_logic_1164.all;
entity decoder2_3 is
port (
u0 : in std_logic;
u1 : in std_logic;
b0 : out std_logic;
b1 : out std_logic;
b2 : out std_logic
);
end;
PPG :
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
library fir; use fir.fir_fix_types.all;
entity PPG is
port (
clk : in std_logic;
reset : in std_logic;
x : in std_logic_vector(7 downto 0);
w_coff : in std_logic_vector(7 downto 0);
p00 : out std_logic_vector(9 downto 0);
p01 : out std_logic_vector(9 downto 0);
p02 : out std_logic_vector(9 downto 0);
p03 : out std_logic_vector(9 downto 0)
);
end;
component AOC
port (
b0 : in std_logic;
b1 : in std_logic;
b2 : in std_logic;
w : in std_logic_vector(9 downto 0);
component decoder2_3
port (
u0 : in std_logic;
u1 : in std_logic;
b0 : out std_logic;
b1 : out std_logic;
b2 : out std_logic
);
end component;
begin
w <= ("00" & w_coff) when w_coff(7) = '0' else ("11" & w_coff);
w2 <= w(8 downto 0) & "0";
w3 <= w2+w;
mult <= (not w) + "1";
neg_w <= mult;
neg_w2 <= mult(8 downto 0) & "0";
decode1: decoder2_3
port map(
u0 => x(0),
u1 => x(1),
b0 => aoc_0(0),
b1 => aoc_0(1),
b2 => aoc_0(2)
);
decode2: decoder2_3
port map(
u0 => x(2),
u1 => x(3),
b0 => aoc_1(0),
b1 => aoc_1(1),
b2 => aoc_1(2)
);
decode3: decoder2_3
port map(
u0 => x(4),
u1 => x(5),
b0 => aoc_2(0),
b1 => aoc_2(1),
b2 => aoc_2(2)
);
decode4: decoder2_3
port map(
u0 => x(6),
u1 => x(7),
b0 => aoc_3(0),
b1 => aoc_3(1),
b2 => aoc_3(2)
);
AOC0: AOC
port map(
b0 => aoc_0(0),
b1 => aoc_0(1),
b2 => aoc_0(2),
w => w,
w2 => w2,
w3 => w3,
w_out => p00
);
AOC1: AOC
port map(
b1 => aoc_1(1),
b2 => aoc_1(2),
w => w,
w2 => w2,
w3 => w3,
w_out => p01
);
AOC2: AOC
port map(
b0 => aoc_2(0),
b1 => aoc_2(1),
b2 => aoc_2(2),
w => w,
w2 => w2,
w3 => w3,
w_out => p02
);
AOC3: AOC
port map(
b0 => aoc_3(0),
b1 => aoc_3(1),
b2 => aoc_3(2),
w => w,
w2 => w2,
w3 => w3,
Shift-Add Tree:
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;
entity shift_add_tree is
generic ( n : in integer := 16);
port (
q0 : in std_logic_vector(n-1 downto 0);
q1 : in std_logic_vector(n-1 downto 0);
q2 : in std_logic_vector(n-1 downto 0);
q3 : in std_logic_vector(n-1 downto 0);
yn : out std_logic_vector(n-1 downto 0)
);
end;
begin
adder1 <= ("000000" & q0) + ("0000" & q1(n-1 downto 0) & "0");
adder2 <= ("000000" & q2) + ("0000" & q3(n-1 downto 0) & "0");
adder3 <= adder1 + (adder2(n-5 downto 0) & "0000");
yn <= adder3(n-1 downto 0);
end;
The rest of this paper to introduce the existing system for the III. PROPOSED SYSTEM
paper is discussed in section II. Then, in section III, the
adaptive filter is discussed. Section IV presents the We are proposed a new architecture design for the delayed
LMS adaptive filter with 64 tap. Based on decomposition of
delay on the existing system, the Delayed LMS adaptive
1
filter can be implemented by a proposed structure shown in
Fig. 2.
2
Figure 6: Adder-structure of the filtering unit
Figure 7: Proposed structure of the weight-update block.
Table 1: comparison
Existing system Proposed system
Filter length 32 64
Number of slices 4060 8631
Power 0.24 0.132
consumption(mW)
3
[3]. Y.Yi.R.Woods, L.-K. Ting, R.Woods and C.F.N.
Cowan. 2005. High speed FPGA based
10000 implementations of delayed LMS filters, J. Very
8000 Large Scale Integr. (VLSI) Signal Process., vol. 39,
No.1-2, pp. 113 131, Jan 2005.
6000 Existing system [4]. L. D. Van and W. S. Feng, An efficient systolic
4000 Proposed system architecture for the DLMS adaptive filter and its
applications " IEEE Trans. Circuits Syst. II, Analog
2000
Digital Signal Process., vol. 48, no. 4, pp. 359-366,
0 Apr. 2001.
Number of slices [5]. H. Herzberg and R. Haimi-Cohen, A systolic array
realization of an LMS adaptive filter and the effects of
Figure 10: number of slices
delayed adaptation, IEEE Trans. Signal Process., vol.
40, no. 11, pp. 27992803, Nov. 1992.
[6]. M. D. Meyer and D. P. Agarwal, A high sampling rate
delayed LMS filter architecture, IEEE Trans. Circuits
0.3 Syst. II, Analog Digital Signal Process., vol. 40, no. 11,
0.25 pp. 727729, Nov. 1993.
[7]. S. Ramanathan and V. Visvanathan, A systolic
0.2
Existing system architecture for LMS adaptive filtering with minimal
0.15
Proposed system adaptation delay, in Proc. Int. Conf. Very Large Scale
0.1
Integr. (VLSI) Design, Jan. 1996, pp. 286289.
0.05 [8]. Y. Yi, R. Woods, L.-K. Ting, and C. F. N. Cowan,
0 High speed FPGA-based implementations of delayed-
Power consumption(W) LMS filter, J. Very Large Scale Integr. (VLSI) Signal
Process. vol. 39, nos. 12, pp. 113131, Jan. 2005.
Figure 11: power consumption [9]. L. D. Van and W. S. Feng, An efficient systolic
architecture for the DLMS adaptive filter and its
applications, IEEE Trans. Circuits Syst. II, Analog
V. CONCLUSION
Digital Signal Process., vol. 48, no. 4, pp. 359366,
Apr. 2001.
We proposed an efficient fixed point adaptive filter with low
[10]. L.-K. Ting, R. Woods, and C. F. N. Cowan, Virtex
adaptation delay. We have used a novel partial product
FPGA implementation of a pipelined adaptive LMS
generator of multiplications and inner product. We have
predictor for electronic support measures receivers,
proposed fixed point implementation scheme architecture
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.
with bit level clipping. From the synthesis, we also analysed
13, no. 1, pp. 8699, Jan. 2005.
the area, delay and power of the proposed system to be
[11]. P. K. Meher and M. Maheshwari, A high-speed FIR
optimized. The proposed dragon gives the efficient output
adaptive filter architecture using a modified delayed
reuslts compared with existing ones. Further we proceed
LMS algorithm, in Proc. IEEE Int. Symp. Circuits
pipeline implementation with partial product generator
Syst., May 2011, pp. 121124.
across the time consuming combinational blocks of filter
[12]. P. K. Meher and S. Y. Park, Low adaptation-delay
structure.
LMS adaptive filter part-I: Introducing a novel
multiplication cell, in Proc. IEEE Int. Midwest Symp.
REFERENCES
Circuits Syst., Aug. 2011, pp. 14.
[1]. Jyoti dhiman, shadab ahmad , kuldeep gulia, [13]. P. K. Meher and S. Y. Park, Low adaptation-delay
Comparison between Adaptive filter Algorithms (LMS, LMS adaptive filter part-II: An optimized architecture,
NLMS and RLS) International Journal of Science, in Proc. IEEE Int. Midwest Symp. Circuits Syst., Aug.
Engineering and Technology Research (IJSETR) 2011, pp. 14.
Volume 2, Issue 5, May 2013. [14]. K. K. Parhi, VLSI Digital Signal Procesing Systems:
[2]. B. Widrow, J. M. McCool, M. G. Larimore, and C. R. Design and Implementation. New York, USA: Wiley,
Johnson, Jr., Stationary and nonstationary learning 1999.
characteristics of the LMS adaptive filters, [15]. C. Caraiscos and B. Liu, A round off error analysis of
Proceedings of the IEEE, vol. 64, pp. 1151-1162, Aug. the LMS adaptive algorithm, IEEE Trans. Acoust.,
1976.
4
Speech, Signal Process., vol. 32, no. 1, pp. 3441, Feb.
1984.