at40k-fft ATMEL Corporation, at40k-fft Datasheet - Page 6

no-image

at40k-fft

Manufacturer Part Number
at40k-fft
Description
Fast Fourier Transform Intellectual Property Core At40k Fpgas
Manufacturer
ATMEL Corporation
Datasheet
Design Analysis
Design Performance
The design requires Nlog
FFT, where N is the transform length. In this design N is
fixed at 256, requiring a total of 2048 clock cycles. Timing
analysis of the design (see section Timing Analysis) indi-
cates a maximum clock rate of 21 MHz. This gives a total
processing time for the FFT of 98 s.
It should be noted that the design does not support concur-
rent data IO and processing. Consequently, in real applica-
tions the user must also allow time to transfer data in and
out. 256 clock cycles are required to write data into the
internal RAM, yielding a minimum data input time of 12 s
at 21 MHz. Determination of the time required to read data
out from the device is somewhat more difficult, partly
because read accesses to the on chip RAM are asynchro-
nous, and partly because the user may not require the
complete output data set. However, it is not unrealistic to
assume a similar data transfer time to the input. This gives
a likely total data IO time of approximately 24 s at 21 MHz.
Suggested Techniques to Improve
Performance
Three main methods exist to increase the rate at which
AT40K FPGAs can perform FFTs, namely:
• Increasing the design clock speed.
• Double buffer data to hide data IO time.
• Consider other FFT architectures.
The simplest method of increasing performance is to
increase the clock frequency of the design. Timing analysis
(see section Timing Analysis) indicates that the limiting
path is through the asynchronous ROM block. Conversion
of the ROM block to synchronous operation would improve
performance somewhat.
In general the clock rate of the design is limited by two fac-
tors: the loading on the internal address and data busses
and the speed of the multipliers. The first factor is affected
by both the length of the transform and its precision.
Decreasing these reduces the size of the dual ported mem-
ory, leading to reduced bus loadings and increasing the
data IO bandwidth between the butterfly and memory. The
second factor is affected only by the precision of the trans-
form. Increasing the precision increases the size of the
multipliers, reducing the system’s performance. It should
be noted that increasing either the data precision or the
transform length will lead to a reduction in performance.
For large transform lengths the use of an external dual
ported memory should be considered, this may also pro-
vide faster data transfer times by reducing on chip bus
loadings.
6
AT40K-FFT
2
N clock cycles to compute the
System performance could potentially be improved by pro-
viding suitable buffering to permit concurrent data process-
ing and IO. Double buffering designs would most probably
have to use external memory devices.
The only technique to radically improve performance is to
consider other FFT architectures, probably involving multi-
ple FPGAs. The most probable architecture is the “pipeline
FFT” processor described in [4]. This requires log
flies arranged in a line and interspersed with delay lines.
Data is then fed continuously through the pipeline to com-
pute the FFT. Using the current butterfly design such an
FFT processor would require multiple FPGAs, each con-
taining one or two butterflies and their associated line
delays. For an transform length of N such an architecture
would produce an log
pared to the current design, (i.e. an 8 times improvement
for a 256 point FFT.) An alternative strategy would be to
attempt to reduce the complexity of the butterfly by using
bit serial arithmetic, thus permitting more butterflies to be
implemented on a single FPGA.
Recommendations to Improve Functionality
Two potential improvements to the design are suggested:
• Use of block floating point.
• Reduction of the size of the twiddle factor ROM.
Currently the design uses fixed scaling by ½ at the output
of the butterfly to prevent numeric overflow. However, this
can significantly reduce the dynamic range of the output
data when the input signals are weak. An alternative
approach is to use a block floating point strategy [4]. In this
case the scaling by ½ is only included in each FFT column
of calculations if the results from the previous column are
likely to cause overflow in the current column. This leads to
an improvement in the dynamic range of the output data.
The additional logic required in the butterfly to implement a
block floating point is a conditional divide by 2, i.e. a simple
shifter. Inclusion of this function in the butterfly should nei-
ther significantly increase its size or degrade its perfor-
mance. In addition to changes to the butterfly some extra
logic is required in the controller unit to operate the shifter.
The overall size of the design could be reduced by investi-
gating techniques to decrease the size of the twiddle factor
ROM. Currently this stores half of the unit circle. By includ-
ing logic to manipulate the input address and sign of the
output data it should be possible to reduce the size of the
ROM to store only a quarter or eighth of the unit circle. This
is, however, only likely to be of significant benefit for large
FFTs.
2
N fold performance increase com-
2
N butter-

Related parts for at40k-fft