at40k-fft ATMEL Corporation, at40k-fft Datasheet - Page 6

at40k-fft

Manufacturer Part Number

at40k-fft

Description

Fast Fourier Transform Intellectual Property Core At40k Fpgas

Manufacturer

ATMEL Corporation

Datasheet

1.AT40K-FFT.pdf (8 pages)

Current page: 6 of 8
Download datasheet (46Kb)

Design Analysis

Design Performance

The design requires Nlog

FFT, where N is the transform length. In this design N is

fixed at 256, requiring a total of 2048 clock cycles. Timing

analysis of the design (see section Timing Analysis) indi-

cates a maximum clock rate of 21 MHz. This gives a total

processing time for the FFT of 98 s.

It should be noted that the design does not support concur-

rent data IO and processing. Consequently, in real applica-

tions the user must also allow time to transfer data in and

out. 256 clock cycles are required to write data into the

internal RAM, yielding a minimum data input time of 12 s

at 21 MHz. Determination of the time required to read data

out from the device is somewhat more difficult, partly

because read accesses to the on chip RAM are asynchro-

nous, and partly because the user may not require the

complete output data set. However, it is not unrealistic to

assume a similar data transfer time to the input. This gives

a likely total data IO time of approximately 24 s at 21 MHz.

Suggested Techniques to Improve

Performance

Three main methods exist to increase the rate at which

AT40K FPGAs can perform FFTs, namely:

• Increasing the design clock speed.

• Double buffer data to hide data IO time.

• Consider other FFT architectures.

The simplest method of increasing performance is to

increase the clock frequency of the design. Timing analysis

(see section Timing Analysis) indicates that the limiting

path is through the asynchronous ROM block. Conversion

of the ROM block to synchronous operation would improve

performance somewhat.

In general the clock rate of the design is limited by two fac-

tors: the loading on the internal address and data busses

and the speed of the multipliers. The first factor is affected

by both the length of the transform and its precision.

Decreasing these reduces the size of the dual ported mem-

ory, leading to reduced bus loadings and increasing the

data IO bandwidth between the butterfly and memory. The

second factor is affected only by the precision of the trans-

form. Increasing the precision increases the size of the

multipliers, reducing the system’s performance. It should

be noted that increasing either the data precision or the

transform length will lead to a reduction in performance.

For large transform lengths the use of an external dual

ported memory should be considered, this may also pro-

vide faster data transfer times by reducing on chip bus

loadings.

AT40K-FFT

N clock cycles to compute the

System performance could potentially be improved by pro-

viding suitable buffering to permit concurrent data process-

ing and IO. Double buffering designs would most probably

have to use external memory devices.

The only technique to radically improve performance is to

consider other FFT architectures, probably involving multi-

ple FPGAs. The most probable architecture is the “pipeline

FFT” processor described in [4]. This requires log

flies arranged in a line and interspersed with delay lines.

Data is then fed continuously through the pipeline to com-

pute the FFT. Using the current butterfly design such an

FFT processor would require multiple FPGAs, each con-

taining one or two butterflies and their associated line

delays. For an transform length of N such an architecture

would produce an log

pared to the current design, (i.e. an 8 times improvement

for a 256 point FFT.) An alternative strategy would be to

attempt to reduce the complexity of the butterfly by using

bit serial arithmetic, thus permitting more butterflies to be

implemented on a single FPGA.

Recommendations to Improve Functionality

Two potential improvements to the design are suggested:

• Use of block floating point.

• Reduction of the size of the twiddle factor ROM.

Currently the design uses fixed scaling by ½ at the output

of the butterfly to prevent numeric overflow. However, this

can significantly reduce the dynamic range of the output

data when the input signals are weak. An alternative

approach is to use a block floating point strategy [4]. In this

case the scaling by ½ is only included in each FFT column

of calculations if the results from the previous column are

likely to cause overflow in the current column. This leads to

an improvement in the dynamic range of the output data.

The additional logic required in the butterfly to implement a

block floating point is a conditional divide by 2, i.e. a simple

shifter. Inclusion of this function in the butterfly should nei-

ther significantly increase its size or degrade its perfor-

mance. In addition to changes to the butterfly some extra

logic is required in the controller unit to operate the shifter.

The overall size of the design could be reduced by investi-

gating techniques to decrease the size of the twiddle factor

ROM. Currently this stores half of the unit circle. By includ-

ing logic to manipulate the input address and sign of the

output data it should be possible to reduce the size of the

ROM to store only a quarter or eighth of the unit circle. This

is, however, only likely to be of significant benefit for large

FFTs.

N fold performance increase com-

N butter-

at40k-fft ATMEL Corporation, at40k-fft Datasheet - Page 6

at40k-fft

Related parts for at40k-fft