AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 30

AN2094

Manufacturer Part Number

AN2094

Description

ITU-T G.729 Implementation on StarCore SC140

Manufacturer

Freescale Semiconductor / Motorola

Datasheet

1.AN2094.pdf (52 pages)

Current page: 30 of 52
Download datasheet (348Kb)

Implementation Strategies

These results show that the C compiler generates efficient code when optimization techniques are used. Code

generated for the inner loop is especially efficient—four macs with two moves appear in the same execution set.

However, the compiler does not create the best code if two maximum values are used. Code generated for

comparisons is similar to the first assembly version, taking eight cycles to compute all four comparisons. However,

even if the generated code is not optimum, it is very efficient and performs well.

This chapter discusses different SC140 implementation strategies for complex DSP applications. Theoretical

approaches are examined which suggest a limit for the percentage of functions to be optimized in either C or

assembly; this limit is established at either 80 or 94 percent of application execution time. More practical

approaches are also discussed, such as optimizing only those functions which are strictly necessary to meet the

project performance target, or implementing a larger set of functions to improve the performance parameters.

The discussions in this section are based only on the encoder portion of the vocoder. The code generated from the

optimized C version of the encoder required improvement from assembly implementation, primarily in the fixed

codebook search module, to meet the performance target. The decoder met the performance target using only

optimized C.

All implementation strategies should start with an analysis of the information provided by the profile data. This

information is used to indicate, among other things, which functions are the most time consuming, and which

functions may be equivalent (require the same number of cycles to execute). Optimizing C code is well worth the

effort. Even if the optimizations are not always well-reflected in the compiler output, the optimized C code serves

as a reference point for starting the assembly implementation in the early stages of the project. The optimized C

maintains the bit-exactness and serves as an implementation pattern for assembly programming.

One way to determine when to begin the assembly implementation is to use a graph of performance versus effort.

This graph is viewed as an asymptotic curve, especially when functions are optimized in descending order of

execution time. Given a deadline date, assembly implementation should begin at the point where the tangent to the

curve crosses the project performance target after the effort point corresponding to the project deadline.

5.1 Theoretical Background

The basic formula that links the performance of the group of functions targeted for optimization (G1) to the overall

application performance is shown in Equation 3.

Implementation Strategies

ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1

Initial C version

Optimized C version

Final assembly version

Note:

Includes three calls.

Table 12. Lag_max() Performance Summary

Version

1 – P (f –1)

P(1 – f) + f

Cycles per Frame

12756

3625

3247

Size

324

574

362

Freescale Semiconductor

Equation 3

AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 30

AN2094

Related parts for AN2094