AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 25

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
4.1.1 Function-Level C Optimizations
The C optimizations applied to Norm_Corr() to speed up the function include the following:
The initial code for computing energy and correlation was inefficient because the compiler did not take advantage
of input vector alignment. The compiler could not use the registers efficiently when dealing with large loops, which
contain several inner loops. To increase this efficiency, the two major loops that performed correlation and energy
computation were moved to separate functions in separate files to avoid function inlining. This incurred a calling
overhead of up to 6 cycles, but the performance gain was about 10 cycles. This gain was significant because the
functions are called 14 times. After these function-level optimizations, the Norm_Corr() function ran 2.22 times
faster than the initial version.
4.1.2 Algorithmic Changes
The major modifications to the algorithms in Norm_Corr() included the following:
Freescale Semiconductor
4.
5.
RWH
Align the input and local vectors to allow parallel data moves.
Unroll the scaling excitation loop.
Apply split summation to the energy and correlation computation loops.
Use multisample to compute the excitation for next iteration.
Replace the tests that use subtraction with direct comparisons.
for example, replace if (sub(a,b)>0) with if (a>b).
Replace functions calls with operators when integer values are used.
for example, replace i= sub( i,1) with i--
Replace the L_shl() function call with the << operator.
Replace unmodified variables with constants defined in the G.729 reference code.
Compute the scaled filtered excitation vector only when overflow occurs. In the ITU-T speech.bit
test vector, overflow occurs in only 205 out of 3 750 frames.
Compute the factors that affect the new scaled filtered excitations in a separate loop and store the
results in a separate vector. This modification enables the use of multisample to use the registers more
efficiently.
Exploit the 32-bit capabilities of the processor by using 32-bit variables instead of the DPF format
defined in the G.729 standard, and replace the Mpy_32() function with a new function which works
with native 32-bit data but preserves bit-exactness.
Eliminate the else branch of the if()…else… statement to reduce the number of branch-like instructions.
Pipeline the main loop to avoid the unnecessary scaling of filtered excitation for the final iteration.
Compute the energy in the same loop which computes the new scaled filtered excitation values to avoid
extra memory moves.
Scale the filtered excitation to avoid overflow and compute the energy of the scaled filtered excitation.
For every possible delay between minimum and maximum, compute the normalized correlation vector
and modify the excitation for the next iteration.
Uuv†Ã€‚qvsvph‡v‚ÃqvqÁ‚‡Ã‚…xÃh†Ãv‡rqrqÃqˆrǂƈi‚ƒ‡v€hyÃ…rtv†‡r…ȇvyv“h‡v‚Ãiˆ‡Ãv‡Ãh†Ã
v€ƒyr€r‡rqÃvÃh††r€iy’
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
Details of Selected Functions
25

Related parts for AN2094