Monday, August 5, 2019

Implement Synthesizable Square Root Algorithm On Fpga Engineering Essay

Implement Synthesizable Square Root Algorithm On Fpga Engineering Essay The main objective of this paper is to implement synthesizable square root algorithm on FPGA. As square root function is not synthesizable on Silicon, this paper proposes optimized non restoring square root algorithm for unsigned 8 bit number on ED2C20F484C7 device in Cyclone II family. This algorithm is implemented in gate level abstraction of Verilog HDL. The basic building block of the design is CSM (Controlled Subtract Multiplex) block. It makes use of only subtract operation and append 01 which is an improvement over restoring algorithm. Keyword: FPGA,CSM,Verilog HDL,fixed point Introduction The square root function is a basic operation in computer graphic and scientific calculation application. Due to its algorithm complexity, the square root operation is hard to be designed in digital system. As known, digital system has been used in daily life or industrial purpose that may have been in need of square root operation to fully execute its functions. Scientists have developed various algorithms for square root calculation. But the implementation of algorithms is difficult because of their complexities and thus results into long delays for its completion. There are two main families of algorithms that can be used to extract square roots. The first family is that of digit recurrence, which provides one digit (often one bit) of the result at each iteration[6]. Each iteration consists of additions and digit-by-number multiplications (which have comparable cost) Such algorithms have been widely used in microprocessors that didnt include hardware multipliers. Most of the FPGA implementations in vendor tools or in the literature use this approach. Second family of algorithms uses multiplications. It includes quadratic convergence recurrences derived from the Newton-Raphson iteration [5]. The digit recurrence approaches allow one to build minimal hardware, while multiplicative approaches allow one to make the best use of available resources when these include multipliers. Also there are estimation method and digit-by-digit method. Digit-by-digit method is classified into two distinct classes: restoring and non- restoring algorithm [1]. In restoring algorithm, remainder is restored in the regular flow. So its implementation needs more hardware. Compared to the restoring algorithm, the non restoring algorithm does not restore the remainder, which can be implemented with fewest hardware resource and the result is hardware simple implementation. It is most suitable for FPGA implementation. Restoring and non restoring square root calculation Restoring Algorithm Step 1: If it is a 2n bit number then divide it in a group of 2 bits Step2: Subtract 1 from the first 2 digits (starting from MSB) Step3: Whenever the result of the subtraction is positive then the developed root is 1 otherwise 0 Step4: Whenever the result is negative, write it as it is. We have to restore the wrong guess by appending 01 and guessed square root. Step5: Now take the next two digits Step6: Append 01 (to be subtracted from next two digits of dividend) and guessed square root to subtract from the remainder. Step7: If the result of subtraction is negative then restore previous remainder by adding wrong guess by appending 01 and guessed square root. Step8: Every time guessed square root has to be updated while appending 01. Step9: Continue the steps until the group of two digits end 1 0 0 1.1 0 1 0 01 01 11 01.00 00 00 00 01 00 01 take next two digits from dividend 1 01 Append 01 Negative value 11 00 + 1 01 0 0 01 11 -10 01 11 10 Negative value + 10 01 01 11 01 10 00 01 11 00 00 10 01 01 01 01 1 00 10011 01 1011111 + 1001101 010110 00 010011001 000010111 00 100110101 1100100111 Figure 1: The example of restoring algorithm to solve square root B. Proposed Modified Non Restoring Algorithm A little modification in non restoring algorithm makes calculation faster. It uses only subtract operation and appends 01. It uses n stage pipelining to find square root of 2n bit number. The following algorithm describes the modified non restoring square root algorithm. Step1: Start Step2: Initialize the radicand (p) which is 2n bit number. Divide the radicand in two bits beginning at binary point in both directions. Step3: Beginning on the left (most significant), select the first group of one or two digit (If n is odd then first group is one digit ,else two bits) Step4: Select the first group of bits and subtract 01 from it. If borrow is zero, result is positive and quotient is 1 else it is 0. Step5: Append 01(to be subtracted next two digits of dividend) and guessed square root to subtract from remainder of previous stage Step6: If result of subtraction is negative, write previous remainder as it is and quotient is considered as 0, else write the difference as remainder and quotient as 1. Step7: Repeat step 5 and step 6 until end group of two digits. Step8: End 1 0 0 1.1 0 1 0 01 01 11 01.00 00 00 00 00 01 00 01 take next two digits from dividend 1 01 Append 01 11 10 01 01 11 01 100 01 11 00 00 1001 01 001011 00 1001101 00101100 00 10011001 0000010111 00 100110101 001011100 Figure 2: The example of modified non restoring algorithm to solve square root Basic Building Block for Non restoring algorithm Inputs of the building block are x,y,b and u while d and b0(borrow) are outputs. If b0=0, then d b0=( ~ x .y)+(b.~x)+(by); d= (~x.y.~b.~u)+(~x.~y.b.~u)+(x~y.~b)+(x.u)+(x.y.b); csmblock.jpg Figure 3: RTL schematic of CSM block The generalization of simple implementation of non restoring digit by digit algorithm for unsigned 6 bit square root by array structure is shown in Fig.4. Each row of the circuit executes one iteration of non restoring digit by digit square algorithm, where it only uses subtract operation and appends 01. Figure 4: Pipelined structure of 6 bit unsigned square root number The design can be optimized by minimizing the logic expressions and can be implemented by modifying CSM block. The specialized entities A,B,C,D,E,F,G and H are derived from CSM block and are defined as follows: For csmA, ybu = 100 b0 = ~x d = ~x For csmB, yu = 00 b0 = ~x.b d = ~x.b + ~b.x For csmC, u = 0 b0 = ~x.y + ~x.b + y.b d = ~x.y.~b + ~x.~y.b + x.~y.~b + x.y.b For csmD, yb = 10 b0 = ~x d = ~x.~u + x.u For csmE, y = 0 b0 = ~x.b d = ~x.b.~u + ~b.x + x.u For csmF, xy = 00 b0 = b d = b.~u For csmG, xyb = 010 b0 = ~x d = ~u For csmH, xyu = 000 b0 = b Figure 5: Optimized Pipelined structure of 8 bit unsigned square root number Results and analysis The Non Restoring algorithm can be implemented with least hardware resources and the result will be the faster than restoring square rooting techniques. The source code is implemented in such a way that it can be extended according to users requirement to calculate complicated square root in FPGA. Figure 6: Simulation result of 8 bit square root using non restoring algorithm The DE1 kit has 4 seven segment displays only so the maximum number which can be displayed is 9999d and also it doesnt have a decimal point. Hence output obtained is less precise if one of the displays is considered as a decimal point. Table 1 shows the list of Logic Elements usage for 8 bit implementation. This indicates the size of the implemented circuit hardware resource. Table 1: Comparison of LEs usage in 8 bit implementation No Implementation of non restoring algorithm for 8 bit LEs 1 8 bit (with seven segment) 85 2 8 bit (without seven segment) 71 3 optimized 8 bit (with seven segment) 64 4 optimized 8 bit (without seven segment) 50 Table 2: PowerPlay Power Analyzer Status No PowerPlay Power Analyzer Status 8 bit with optimization (mW) 8 bit without optimization (mW) Low Medium Low 1 Total Thermal Power Dissipation 71.65 447.96 72.84 2 Core Dynamic Thermal Power Dissipation 0 190.47 0 3 Core Static Thermal Power Dissipation 47.36 48.06 47.36 4 I/O Thermal Power Dissipation 24.29 209.44 25.48 Conclusion: This implementation and analysis shows that proposed method is most efficient of hardware resource. This is reasonable, because it only uses subtract operation and append 01. The result shows that the proposed algorithm is easy to implement and also uses less resources. The result is extended for square root implementation of 8 bit floating point number and also it can be expanded to larger numbers to solve complicated square root problem in FPGA implementation.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.