AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 25

AN2094

Manufacturer Part Number

AN2094

Description

ITU-T G.729 Implementation on StarCore SC140

Manufacturer

Freescale Semiconductor / Motorola

Datasheet

1.AN2094.pdf (52 pages)

Current page: 25 of 52
Download datasheet (348Kb)

4.1.1 Function-Level C Optimizations

The C optimizations applied to Norm_Corr() to speed up the function include the following:

The initial code for computing energy and correlation was inefficient because the compiler did not take advantage

of input vector alignment. The compiler could not use the registers efficiently when dealing with large loops, which

contain several inner loops. To increase this efficiency, the two major loops that performed correlation and energy

computation were moved to separate functions in separate files to avoid function inlining. This incurred a calling

overhead of up to 6 cycles, but the performance gain was about 10 cycles. This gain was significant because the

functions are called 14 times. After these function-level optimizations, the Norm_Corr() function ran 2.22 times

faster than the initial version.

4.1.2 Algorithmic Changes

The major modifications to the algorithms in Norm_Corr() included the following:

Freescale Semiconductor

•

RWH

Align the input and local vectors to allow parallel data moves.

Unroll the scaling excitation loop.

Apply split summation to the energy and correlation computation loops.

Use multisample to compute the excitation for next iteration.

Replace the tests that use subtraction with direct comparisons.

for example, replace if (sub(a,b)>0) with if (a>b).

Replace functions calls with operators when integer values are used.

for example, replace i= sub( i,1) with i--

Replace the L_shl() function call with the << operator.

Replace unmodified variables with constants defined in the G.729 reference code.

Compute the scaled filtered excitation vector only when overflow occurs. In the ITU-T speech.bit

test vector, overflow occurs in only 205 out of 3 750 frames.

Compute the factors that affect the new scaled filtered excitations in a separate loop and store the

results in a separate vector. This modification enables the use of multisample to use the registers more

efficiently.

Exploit the 32-bit capabilities of the processor by using 32-bit variables instead of the DPF format

defined in the G.729 standard, and replace the Mpy_32() function with a new function which works

with native 32-bit data but preserves bit-exactness.

Eliminate the else branch of the if()…else… statement to reduce the number of branch-like instructions.

Pipeline the main loop to avoid the unnecessary scaling of filtered excitation for the final iteration.

Compute the energy in the same loop which computes the new scaled filtered excitation values to avoid

extra memory moves.

Scale the filtered excitation to avoid overflow and compute the energy of the scaled filtered excitation.

For every possible delay between minimum and maximum, compute the normalized correlation vector

and modify the excitation for the next iteration.

Uuv†Ã€‚qvsvph‡v‚ÃqvqÃ‚‡Ã‚…xÃh†Ãv‡rqrqÃqˆrÃ‡‚Ã†ˆi‚ƒ‡v€hyÃ…rtv†‡r…Ãˆ‡vyv“h‡v‚Ãiˆ‡Ãv‡Ãh†Ã

v€ƒyr€r‡rqÃvÃh††r€iy’

ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1

Details of Selected Functions

AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 25

AN2094

Related parts for AN2094