Professional Documents
Culture Documents
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING AMINA INSTITUTE OF TECHNOLOGY (Approved by AICTE New Delhi,Affiliated to JNTU, Hyderabad) Babaguda road , Shamirpet, R.R.Dist . 2011-2012
I G.ANUDEEP hereby declare that the work described in this dissertation entitled CALCULATION OF LOG VALUES USING LOOK UP TABLE AND
INTERPOLATION which is being submitted by me in partial fulfillment for the award of the Master of Technology in Embedded Systems in the Dept. of Electronics and Communication Engineering to Jawaharlal Nehru Technological University Hyderabad is the result of investigations carried out by me under guidance of Prof. N.PAPARAO. The work is original and has not been submitted for any Degree/Diploma of this or any other university.
External Examiner
ACKNOWLEDGEMENTS
Any accomplishment requires the efforts of many people and this work is no different. First and foremost, I would like to acknowledge my deep sense of gratitude to our Prof.
I take pleasure of thanking Prof.N.PAPARAO (M.tech) for his valuable suggestions and support which was instrumental in completing this task.
Prof.R.M.NOORULLAH (MS),
(M.tech) and PhD Principal for their kind co-operation and constant encouragement
throughout this work.
I like to represent my sincere thanks for non-teaching staff that have co-operated me directly and indirectly for completion of my work. I would like to acknowledge gratefully to my parents for their fostering love and blessings.
My heart full thanks to my friends for providing the help and encouragement for making this work a grand success. G.ANUDEEP K.SINDHU T.V.R DIVAKAR 08N11A0427 08N11A0446 08N11A04A6
Abstract
The realization of function such as log() and antilog() in hardware is of considerable relevance, due to their importance in several computing applications. This approach is based on a table look up, followed by an interpolation step.
The interpolation step is implemented in combinational logic, in a field programmable gate array (FPGA), resulting in an area efficient, fast design. Our method performs both log() and antilog() operation using the same hardware architecture.
This approach results in significantly lower memory resource utilization, for the same approximation errors .Also this method scales very well with an increase in the required accuracy, compare to existing techniques.
VLSI circuits are everywhere your computer, your car, your brand new state-of-the-art digital camera, the cell-phones, and what have you. All this involves a lot of expertise on many fronts within the same field VLSI has been around for a long time, there is nothing new about it but as a side effect of advances in the world of computers, there has been a dramatic proliferation of tools that can be used to design VLSI circuits.
APPLICATIONS: The log() and antilog() finds uses in many areas such as digital signal processing (DSP), 3-D computer graphics, scientific computing, artificial neutral works, logarithimic number system(LNS), and other multimedia applications.
BLOCK DIAGRAM
Look up tables
Interpolation
Inputs
Log()
Antilog()
Introduction to VLSI
Microelectronics has been the enabling technology for the development of hardware and software systems in the recent decades. The continuously increasing level of integration of electronic devices on a single substrate has led to the fabrication of increasingly complex systems. The integrated circuit technology, based on the use of semiconductor materials, has progressed tremendously. While a handful of devices were integrated on the first circuits in the 1960s, circuits with over one million devices have been successfully manufactured in the late 1980s. Such circuits often called Very Large Scale Integration (VLSI). At present, many electronic systems require integrated dedicated components that are specialized to perform task or a limited set of tasks. These are called Application Specific Integrated Circuits, or ASICs. Some circuits in this class may not be produced in large volume because of the specificity of their application. Very-large-scale integration (VLSI) is the process of creating integrated circuits by combining thousands of transistor-based circuits into a single chip. This is the field which involves packing more and more logic devices into smaller and smaller areas. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. Circuits that would have taken boardfuls of space can now be put into a small space few millimeters across. This has opened up a big opportunity to do things that were not possible before. VLSI circuits are everywhere your computer, your car, your brand new state-of-the-art digital camera, the cell-phones, and what have you. All this involves a lot of expertise on many fronts within the same field VLSI has been around for a long time, there is nothing new about it but as a side effect of advances in the world of computers, there has been a dramatic proliferation of tools that can be used to design VLSI circuits.
The combined effect of these two advances is that people can now put diverse functionality into the IC's, opening up new frontiers. Examples are embedded systems, where intelligent devices are put inside everyday objects, and ubiquitous computing where small computing devices proliferate to such an extent that even the shoes you wear may actually do something useful like monitoring your heartbeats! These two fields are kind of related, and getting into their description can easily lead to another article.
The generation of elementary functions such as log() and antilog() finds uses in many areas such as digital signal processing (DSP), 3-D computer graphics, scientific computing, artificial neural networks,
logarithmic number systems (LNS), and other multimedia applications [1]. Our approach provides a good solution for field-programmable gate array (FPGA)-based applications that require high accuracy with a low cost in terms of required lookup table (LUT) size. Such applications include LNS, DSP cores, etc. In fact the fast generation of these functions is critical to performance in many of these applications. Using software algorithms to generate these elementary functions [2], [3] are often not fast enough as stated in [1]. Hence, the use of dedicated hardware to compute log() and antilog() is of great value. Over the past few decades, many authors have proposed various hardware approaches to approximate these elementary functions in an areaefficient manner, while maintaining high speed and accuracy.
Two methods that are well researched and used for the generation of the logarithm function are digit-recurrence algorithms and LUT-based approaches. Out of these methods the digit-recurrence methods are efficient from an area and accuracy per-spective, but have longer latencies and convergence problems. The LUT-based methods are widely used to approximate the logarithm and antilogarithm functions. Some of the previous works involving LUT-based methods include, LUTs combined with
The main objective of all these works is to utilize minimum circuit area while retaining the accuracy of the approximation. The main idea of our approach is to use LUTs along with linear or quadratic interpolation and approximates the multiplication required for interpolation using approximate log() and antilog() functions while computing a more accurate log() and antilog(). We show that the most cost effective implementation is a LUT with a linear interpolation, implemented in a manner that optimizes the area and delay while providing good accuracy.We apply our method to generate the logarithm of a number and also show that a similar methodology can be used to generate the antilogarithm of a number. In this paper, the number format used is similar to the IEEE 754 single-precision floating point format that has 32 bits. The leading bit is the sign bit, followed by an 8-bit exponent and a 23 bit mantissa . The value of a number, represented 3
In this format is given by We use a similar number format representation, but assume the number of bits in the mantissa to be variable. We also assume that the number is positive since the logarithm of a negative number does not exist. We target 10 or more bits of accuracy in this work. One of the earliest approaches to approximate the binary logarithm of a number was given by Mitchell [9]. In his method, the logarithm of a number is found by attaching the mantissa part of the number to the exponent part. This method is extremely easy to implement but gives an absolute error as high as 0.086 which is only 3.5 bits of accuracy. There are various authors who have reduced this error by using error correction techniques implemented by simple logic gates without involving multiplications or divisions [10][13].
Although these methods are better than Mitchells approach, they all give less than seven accurate bits (which might be adequate for some applications). Compared to these methods, our approach is applicable for applications that require a much higher accuracy. [14] gives an approach for logarithmic multiplication, but they provide results based on random input vectors and compute the average error. We find the worst case error over all possible inputs. In recent times, most elementary functions are generated using LUTs. This approach, initially proposed by Brubaker in [15], involves the computation of a function using a single LUT. The accuracy of the approach depends solely on the size of the LUT used. Another method given in [16] improves Brubakers method by concatenating the lower order bits of the mantissa to the value looked up from the table. However, this gives only a small error improvement of one bit. Kmetz proposes to store the error that occurs due to the Mitchell approximation in the table and add it to the mantissa of the number. This results in a further improvement in error (by 3 bits) with the overhead of an add operation. In our approach, we follow the same scheme of storing the error values of the Mitchell approximation in a table. Other methods like [7] and [8] make use of bigger LUTs to give more accurate results, with a good speed of computation. In our approach, we try to find the optimum table size to use for a required accuracy. Apart from these simple methods, there are several other complex methods for the generation of these elementary functions. References [3] and [6] use a LUT combined with a polynomial approximation to interpolate the function between many small intervals. Reference also presents a tradeoff between the table sizes and the degree of the polynomial to use while interpolating. The problem with these methods is that there are multiplications and divisions involved in the computation. Our approach focuses on using a smaller size table along with a simple linear function to interpolate between the table values.
This also allows us to pipeline the implementation and achieve a higher throughput. The multipliers in our target FPGA are restricted to a speed of 135 MHz, and by avoiding their use we can achieve much greater speeds, as our results indicate. There are many papers on LNS. LNS systems also require the computation of log () quantities while finding the approximate value of some functions like addition and subtraction. Uses only ROMs to compute the log () without any interpolation. The paper use ROMs and linear tangent and/or secant interpolation. We explore linear least squares based interpolation which gives us one extra bit of accuracy over our own linear secant implementation. Also our method lends itself elegantly to perform antilog () operation using the same hardware architecture and accuracy as log (), we do not talk about antilog ()
This work uses a LUT-based approach combined with a linear interpolation to generate the logarithm of a number. The multiplication required in this linear interpolation is avoided, resulting in an area and delay reduction
The error curve shown in Fig. 1 is sampled at points (depending on the size of the LUT required). These values are rounded depending on the width of the word required and stored in the LUT. The LUT is addressed by the first bits of the mantissa portion of the number. Now we investigate the option of interpolating between the values stored in the table. This is done by the following equation:
Here is the mantissa part, is the error value from the table accessed by the first bits and is the next table value adjacent to. Also is the total number of MSBs in the mantissa used for the interpolation and is the decimal value of the last bits of the mantissa. Essentially we find an error value from the table based on the first bits and interpolate between this value and the next value based on the remaining bits.
The third term in (4) requires a multiplication. In order to circumvent the multiplication (which is expensive in terms of area and delay), we investigate the option of interpolating repeatedly between any two adjacent values stored in the table. This is done using Algorithm 1.
Algorithm 1
Recursive Bi-partitioning
STEP 1: The first bits of the mantissa address the table to obtain the stored left value and the adjacent right value.
STEP 2:
Bisect the two values obtained in the previous step and find the middle value.
Else
End if
Go to STEP 2
Else
Choose the left or right value based on, if the last bit is a 0 or 1 respectively
end if
The error performance of Algorithm 1 is shown in Fig. 3. The maximum error is . This gives us 14 bits of accuracy. The only problem with this approach is that there are too manysteps involved as all the mantissa bits are considered
Trying another approach, we investigate the case where a limited number of interpolations are done. We tabulate the maximum error incurred when the previous algorithm is implemented for and so on until mantissa bits and ignoring the rest. This is the same as doing different levels of interpolation from 0 to 8. The maximum error for this approach is shown in Table I. In this case, the size of the LUT used is 64 words and the width of each word is 16 bits. The width of each word in the table is chosen in such a way that the accuracy is not reduced due to rounding.
From Table I, we see that 1 or 2 interpolations are not enough to give a good error performance. Reasonable accuracy is obtained for either 7 or 8 bits, but this requires as many interpolations, and therefore results in larger delays in computing the logarithm. In order to obtain better accuracy, we need to implement the multiplication of and . However, implementing multiplication is expensive in terms of area and delay. Therefore, we approximate the multiplication, so as to obtain good error performance as well as low delay and area utilization. We will show in the following section(s) that our approach gives similar error performance as the 7 bit interpolation, however with lower delay.
In this section, we propose a more efficient approach to do interpolation without the multiplication of and in (4). The essential idea is that the multiplication of and is simplified by taking the antilogarithm of the sum of the logarithm of and the logarithm of . In order to perform this operation with a small delay, we consider the following options.
a) Mitchell approximation; b) LUT: For the LUT option the constants for each of the intervals is stored in the original LUT. Recall that we stored the error values obtained by using a Mitchel approximation for the log function. The values are stored along with the error values and are indexed by the same address lines.
2) Antilog of may be approximated by either of the following two options: a) Mitchell approximation;
b) LUT: To obtain the antilogarithm of a number by this method, we need to construct another LUT. The error due to the Mitchell approximation of the antilogarithm function is stored in this LUT as shown in Section IV. This antilog LUT is utilized to compute the antilogarithm of a number, since the multiplication of and is performed by taking the antilogarithm of the sum of the logarithm of and the logarithm of . The maximum error in the logarithm of a number incurred by using each of these options along with the number of accurate bits in the result is shown in Table II. From Table II we see that the combination of 1b) and 2b) has the best error performance. Therefore, to perform the interpolation described in (4), it makes sense to find the antilogarithm using a LUT. The additional advantage of this is that the same LUT can be used while computing the antilogarithm of a number as well. Table II also shows that our approach allows scalability of the system, by making a tradeoff between the accuracy and the number of values stored in the table.
Architecture of Implementation
Fig. 4 shows the block diagram of the log() engine which is essentially an implementation of (4). The architecture of the interpolator block is shown in Fig. 5. The exponent part of the input number trivially becomes the decimal part of the logarithm.This is because we assume that in (1). Also, we assume that the number we operate on (which is expressed as in (1) is positive. Here we only show the operations on the mantissa, . The implementation is pipelined, with 12 stages in the pipeline. We use a 16-bit mantissa and a by 16 bit LUT as an example. The width of each word in the table is chosen as 16 bits so that the error due to rounding does not dominate the overall error. Rounding of a number, to bits is done as
Recall that the accuracy, from Table II, was 13.74 bits in this case. One of the adders is a three input fixed point adder as shown in Fig. 5. It is implemented as 2 adders. The width of the mantissa bits processed by each block in the architecture is shown in the diagrams. Since takes both negative and positive values for , the values stored in the lookup tables are actually the logarithm of the absolute values of . It is found that changes sign from positive to negative for m> 0.4427
This is equivalent to comparing the decimal value of the first six bits of the mantissa with 28, as shown in Fig. 5. Hence, if the first six bits have a value greater than 28, the comparator block sends a control signal to the ADD/SUB block instructing it to perform a subtract operation. The leading one detector (LOD) block is used to find the LOD detects the first bit that has a value 1. It then uses the remaining bits as the mantissa portion to access the lookup table. Since there are 7 bits given as input to the LOD in this example, the LOD finds the first position that has a value 1 and the remaining bits which can be as wide as 6 bits is used to access the LUT. The decimal part of which is indicated by the position of the first bit with value 1, is directly sent to the three input fixed point adder to be added to the decimal part of . The output of the adder after the antilog stage has to be shifted to the right or left depending on the decimal output of the fixed point adder. Also there is a term in the denominator of (4) and this is accounted for by a constant right shift of 7 bits at the output of the adder of the antilog LUT. In Fig. 5, the blocks representing a LUT for and a LUT for are identical and can be implemented using a dual port RAM. Also note that, the address of each of the three LUTs shown in Fig. 5 has a width of 6 bits. In each case the 6 bit address word is obtained by rounding off the value of the next LSB in the word from which the address word is derived. The round() operation is implemented using an adder as shown in (5). All the quantities that are rounded off in this manner are annotated as such in fig5 Error Analysis The expression for error due to a Mitchell approximation of the logarithm is given by (3). As mentioned before, the error curve due to the Mitchel approximation is sampled at various points, and these samples are stored in the lookup table. It is observed that while interpolating between any two adjacent values in this table, the maximum error is bound to occur when the difference between these two adjacent error values is largest.
The largest difference in error values occurs for the first pair of points in the lookup table. The size of this largest difference (assuming a table of size and width 16) is given by
The expression for error due to our approximation is given from (4) As
There are three sources of error in our method. An error upper bound is obtained by adding the maximum errors due to all these sources. In other words
Where is the error due to rounding of bits stored in the lookup table, is the error due to interpolation, and is the error incurred due to the use of the antilogarithm function during interpolation. The input number or mantissa is split into three different ranges in terms of bit width, in order to analyze the error. For example, a mantissa of bit width 5 implies that the remaining LSBs take on a value of 0.
Substituting, n1, and in (9), we get the interpolation term in the first interval as
If the error due to antilog is assumed to be zero, we can find that the error expression due to interpolation by using the following equation
The maximum error is found by differentiating this equation and setting it to zero. We get the maximum error as at . The antilog error depends on the values of which is stored in the table and .We find the error due to antilog by simulating the lookup table-based antilog approximation for these particular values and find the maximum antilog error. . The results of the simulation are shown in Fig. 6 for all possible values of ranging from 0 to 1.
Case 3MantissaWidth Greater Than 13 bits: The rounding and antilog errors are the same as above. As for the interpolation error, we proceed in a fashion similar to Case 2. Here the value of is given by , where the function represents a round-off to the closest integer given by (5). The error function for this case is given by
Plotting this expression from to in Fig. 7, we find the maximum error as . Out of all the previous cases, the third case has the worst error due to interpolation, while the errors due to rounding and antilog approximation remain the same. Hence, the error bound, given by plugging in the maximum values of each of the error components in (8) is
POWERING (XY), logarithm (log X), and exponential (2X) are important operations in computer 3D graphics, digital signal processing (DSP), scientific computing, artificial neural networks, logarithmic number systems (LNS), and multimedia applications. As other elementary functions, such as square root, reciprocal square root, and trigonometric functions, they have been traditionally computed by software routines .These routines provide very accurate results, but are often too slow for numerically intensive or real-time applications. The timing constraints of these applications have led to the development of dedicated hardware for the computation of elementary functions, including the implementation of table-based algorithms Accurately computing the floatingpoint powering function is considered difficult and the prohibitive hardware requirements of a table-based implementation (note that XY is a 2-variable function) have led only to partial solutions, such as powering algorithms for a
constant exponent p .A direct implementation of a digit-recurrence algorithm for powering computation is not feasible due to its high intrinsic complexity. In this paper, we give a detailed description of an optimized composite iterative algorithm for the computation of the powering function (XY ), for a floating-point input operand X Mx2Ex and integer1 b-bit operand Y, and extend it to powering operations with exponents of the type Y 1=q, with q integer. An abbreviated previous version of our algorithm with integer exponent was presented in [23]. The final result, XY , was computed as Z Mz2Ez eY lnMz2YEz through a sequence of overlapped operations. The first step consisted of computing ln Mz by u sing a high-radix (r 2b) digitrecurrence algorithm with selection by rounding [22]. An intermediate computation YL ln Mz, with YL Y log2e, was carried out by a high radix left-to-right carry-free (LRCF) multiplication operation. Another LRCF multiplication by ln 2 was performed to guarantee the convergence of the algorithm and, as the last step, the exponential of the resulting product was computed by an online high-radix algorithm, with online delay _ 2 and selection by rounding. In the optimized version of our algorithm, the final result XY can be computed directly as Z Mz2Ez 2Y log2Mx2YEx . Which allows avoiding the computation of YL Y log2e and eliminates the need for a second LRCF multiplication, reducing the overall latency by one cycle. The expressions of the recurrences in the computations of the logarithm and the exponential must be slightly modified and extra lookup tables storing _lj= ln2 and _ej= ln2 are now required. In the stages computing such operations (logarithm and exponential), selection by table look-up is still performed in the first iteration to guarantee the convergence of both algorithms, with the online delay of two cycles in the exponential scheme allowing addressing of the initial tables one cycle in advance and reducing the delay of the critical path in this stage.
A sequential architecture for powering computation was proposed , with radix r 128. Such an architecture can actually be implemented with any radix r _ 8, although the overall hardware requirements increase with r and an analysis of the trade offs between area and speed is necessary for determining which radix values result in the most efficient implementations. We perform such an analysis in this paper for our optimized algorithm.2 The analysis performed is based on estimates obtained for single and double-precision computations and for radix values going from r 8 to r 1; 024, according to an approximate model for the delay and area cost of the main logic blocks employed in the proposed architecture. The main results of our analysis are that a fast implementation can be obtained when using r 128, but an implementation with r 32 is more suitable for applications with tighter area constraints. There is no advantage in using values r > 128 since similar or slower execution times are achieved, with much higher hardware requirements. Since the computations of logarithm and exponential are included in our algorithm for powering computation, some minor changes can be made to the architecture to allow for the independent computation of logarithm and exponential, with lower latencies than powering, making the architecture an interesting alternative when implementing a dedicated unit for elementary function computation.
Considering a floating-point3 input operand X Mx2Ex , with Mx the n-bit significant and Ex the exponent, and an integer b-bit4 input operand Y :
According to (2), the powering function can be calculated by a sequence of operations consisting of the logarithm of the significand Mx, a multiplication by Y, and the exponential of the resulting product. The form of the result is a significand 2Y log2Mx and an exponent YEx, typical of a floating-point operand.5 For an efficient implementation of the powering function, the computation of the operations involved must be overlapped, which requires a left-to-right most-significant digit first (MSDF) mode of operation and the use of a redundant number system. A problem of the proposed algorithm is the range of the digit-recurrence exponential, which is _ ln 2; ln 2, while the argument of the exponential here is Y log2Mx, with Y an integer. To extend the range of convergence of this algorithm and, thus, guarantee the convergence of the overall proposed method, we extract the integer I and fractional part F of the product serially, which requires Y to be a b-bit (or 2bbit) integer. The exponential becomes
With I and F the integer and fractional parts of Y log2Mx,resulting in a bounded argument for the exponential. In summary, as illustrated in Fig. 1 for single-precision computations with r 128, our algorithm for the computation of the powering function consists of three steps.
The use of redundancy results in 2F 2 0:5; 2 and, therefore, a normalization of the final result may be necessary. However, the conditionF < 0 can be determined in advance to the last iterations of the exponential and the final normalization can be performed with no extra delay. The overall latency of the algorithm, as shown in Fig. 1,can be estimated as
Error Analysis
The final error in the algorithm consists of the accumulation of the errors due to the cascaded implementation of a set of operations and must be bounded by 2_n_1 before the final rounding. Let _l be the error in the computation of the logarithm of the input significant: lnMx _l. When the LRCF multiplication is performed, the associated error is:
With _m the error due to the LRCF multiplication scheme in the computation of the product. The integer I and fractional F parts of the product Y log2Mx are extracted serially, with no error affecting the integer part. The next operation to be performed is the exponential 2F . The obtained result is:
With
between the exact result and the obtained result must be bounded:
For a precision of n bits and a radix r 2b, a set of minimum values for _l, _m, and _e must be determined to guarantee a final result accurate to n bits.
The values of the error parameters set the precision to be reached in each stage, nl, ne, and nm. These parameters determine a minimum number of iterations of the logarithm (Nl) and the exponential (Ne) to be performed. As shown in Fig. 1, the number of iterations of the LRCF multiplication to be performed must be the same as those of the logarithm, Nl, because all the information must reach the exponential stage. The parameters Nl, Ne, nl, ne, and nm set the size of the look-up tables, adders, and multipliers to be used.Moreover, gl, ge, and gm guard bits must be employed to guarantee that, in each stage of the powering algorithm, the iteration errors do not affect the achievement of the required precisions nl, ne, and nm. The critical parameter to be minimized first is _e since it is directly related to the required precision ne to be reached in the exponential stage and, therefore, to Ne, the number of iterations of the online exponential algorithm to be performed.6 As an example, we show in Table 1 the set of minimum values for the considered parameters for single (n 24) and double-precision (n 53) computations, when the radix is r 128. Ne and Nl are given in cycles, and ne, nl, and nm are given in number of bits.
IMPLEMENTATION
A sequential architecture is proposed for the implementation of our high-radix powering algorithm. We outline here the main computations involved: 1) high-radix logarithm, 2) high-radix LRCF multiplication, and 3) online high-radix exponential,
and then describe the main features of the logic blocks employed. Fig. 2 shows the block diagram of the proposed architecture. Single thick lines denote longword (around n bits) operands/variables in parallel form, single thin lines denote short-word (up to 11 bits) operands/variables in parallel form, and double lines denote single-digit (b bits) variables (Rj, I, and Fj).
High-Radix Logarithm
A high-radix digit-recurrence algorithm for the computation is described in detail . Some modifications and optimizations have been made in the algorithm used here, according to the operation flow in the optimized algorithm for powering computation, and also a slightly different notation from that used in is employed.
Radix and lj is a radix-r digit. This form of fj allows the use of a shift-and-add implementation. The recurrences for performing such multiplicative normalization and computing the logarithm digits are
with j _ 1, Wl1_ rMx _ 1, R1_ 0, and R1 0. For a result precision of nl bits, Nl dnl=be iterations are necessary. The scaled recurrence Rj has been defined as
in order to extract a radix-r digit Rj per iteration from the same bit-positions in all iterations.The block diagram of the high-radix logarithm stage is shown in Fig. 3, with double lines for SD operands, thin ones for single-digit operands, and thick lines for parallel operands. TABrl1 and TAB_ log l1 are the look-up tables storing rl1 and _log21 l1r_1, respectively, addressed zy the b 1 most significant bits of the input operand Mx.
The selection of the digits lj in iterations j _ 2 is done by rounding an estimate of the residual. Such an estimate is obtained by truncating the signed-digit representation of Wlj_ to t fractional bits. The selection function is
The sign of the digit lj is defined as opposite of the sign of Wlj_ in order to satisfy a bound on the residual, thus assuring the convergence. The digit set for lj is f_r _ 1; . . . ;_1; 0; 1; . . . ; r _ 1g. Iteration j 1 does not converge with selection by rounding, so the selection of l1 is performed by table look-up. This table is addressed by the b 1 most significant bits of the input operand Mx and the selection is done in such a way that the value of jl2j is bounded according to the convergence conditions . However, this results in an over redundant digit l1 (b 1 bits), increasing by one bit the size of the multiplier operand.
A multiply-add unit is used for the computation of the residual recurrence, unlike in the algorithm proposed in , where a separated SD multiplier and SDA4 were used. The digit l1 is stored in the look-up table TABrl1 already in SD-4 recoded form to reduce the delay of the path containing the table and the multiply-add unit. The logarithm constants are stored in a look-up table whose size grows exponentially with the radix. However, an approximation _ljr_1=ln2 can be used in iterations j _ dNl=2e 1 in order to reduce the overall hardware requirements of the algorithm, with a look-up table storing lj=ln2.
The use of redundant representation is mandatory in our algorithm for powering computation due to the left-to-right operation flow and results in faster execution times by making the additions independent of the precision.
BLOCK DIAGRAM
Look up tables
Interpolation
Inputs
Log()
Antilog()
REFRENCES
1. Microprocessors and Microcontroller by A.K.RAY 2. The 8051 Microcontroller and Embedded Systems by MUHAMMAD ALI MAZIDI 3. Fundamentals of Embedded Software by DANIEL W LEWIS 4. Programming and Customize the Microcontroller by MYKE PREBKO 5. Programming and Customize the AVR by DHANANJAY and V.GADRE 6. www.electronicsforu.com 7. www.futurelec.com 8. www.uctros.com