AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 24

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
Details of Selected Functions
One way to decrease the channel data size is to remove the storage area allocated to hold new samples. For
example, the speech data structure contains two main parts—old speech, which contains previous frames, and new
speech, which contains the current frame. Only the old speech portion is required to process the next frame.
However, eliminating the unneeded data requires modifying the vocoder either to use two vectors (for old speech
and new speech) or to copy the old values into a local array. Thus, reducing channel data in this way significantly
increases the stack and code sizes.
3.6.3 Stack Size
The greatest demand on stack size in our application came from the fixed-codebook search. The maximum top of
the stack occurs in the D4i40_17() function, primarily due to the large data structures stored on the
ACELP_Codebook() stack frame. The algorithmic changes and reoptimizations that were performed on
D4i40_17() created supplementary data structures which increased the stack size.
The compiler time-optimized some functions by always storing variables in registers rather than the stack frame.
However, the unused stack frames were not removed until the assembly phase. Directly implementing
D4i40_17() in assembly accounts for most of the stack reduction after this phase.
One way to reduce stack size is to examine the life cycle of variables (especially large arrays) to collapse arrays
with disjointed life cycles into one data structure. Allocating blocks for variables may help improve the
management of local variables, even in large functions.
3.7 Testing
Our test procedures included both unit testing and integration tests. All tests were run on the SC140 simulator, and
several tests were also run on the reference C compiler. One complete test on the reference C compiler took about
10 minutes, while a worst-case analysis on the simulator (which also served as an integration test for bit-exactness)
took 6 to 10 hours, depending on the optimization stage. Tests on the SDP board were faster (less than 1 hour), but
did not provide cycle counts.
All builds were tested on the simulator to verify bit-exactness. The SDP board tests were especially useful after the
assembly phase.
4
This section presents details of the optimization process for three functions, Norm_Corr(),
ACELP_Codebook(), and Lag_max(). The function-level, algorithmic, and assembly changes for each
function are presented, as well as the effect of these changes on the efficiency of code generated by the compiler.
4.1 Optimizations in Norm_Corr()
The Norm_Corr() function finds the normalized correlation (correlation divided by the square root of the energy
of filtered excitation) between the target vector and the filtered past excitation. The main steps of this function in
the reference C code include the following:
24
Details of Selected Functions
1.
2.
3.
Compute the filtered excitation for the minimum delay.
Scale the excitation vector to avoid overflow.
Compute the energy of the filtered excitation to check overflow.
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
Freescale Semiconductor

Related parts for AN2094