AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 31

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
where S is the application performance improvement
Given S, f is computed as
A common rule of thumb for software performance is the so-called 80-20 rule, which states that 80 percent of run
time is spent in 20 percent of the code. This suggests that the optimization effort should focus on the most time-
consuming functions which account for 80 percent of the total execution time. With a performance improvement
factor of four (based on four ALUs), the run time for our optimized code should be [(80 ÷ 4) + 20]=40 percent of
the original run time. This amount of improvement is not always achieved with only C optimization. Another way
to interpret the 80-20 rule is to apply it to the optimized software. In other words, if 20 percent of the optimized
software accounts for 80 percent of the run time, what percent P of the original run time should be the focus for
optimization?
Let
Then
and P = 94 percent. The final application execution time is estimated as (P/4 + (1 – P)) = 1 – 3P/4, which is about
29.5 percent of the original time.
To achieve target performance with the least amount of development time, the number of functions implemented in
assembly should be minimized. Performance prediction is very important here—the analyst must decide which
combination of functions will achieve target performance with the least amount of development time. The best
candidates are time-consuming functions for which the compiler does not produce optimum code.
5.2 Project Implementation
The G.729 vocoder optimization was the first StarCore project for our team. After running the first profile, the first
functions we selected for C optimization were general DSP functions such as Autocorr(), Convolve(),
Cor_h_X(), Syn_filt(), Residu(), and Lag_max(). We also spent some time on less important
functions, such as Inv_sqrt() and Get_lsp_pol(), mostly for training purposes. The experience gained
with the C compiler from these functions helped in addressing the more important functions D4i40_17(),
Cor_h(), Norm_Corr(), Az_lsp(), Chebps(), Levinson(), and search_del().
At the end of the C optimization phase, we decided to follow two directions in parallel. The first group
implemented algorithmic changes for the most time-consuming functions, after which they were reoptimized. This
provided a C source reference for assembly implementation. The second group rewrote certain functions in
assembly which were not the target of algorithmic changes, including Syn_filt(), Residu(),
Convolve(), and Autocorr(). The experience gained by the second group was successfully applied to the
assembly implementation of the optimized C code produced by the first group. The assembly optimization phase
ended when the gain versus effort curve saturated.
Freescale Semiconductor
P is the percentage of the application run-time taken by the G1 functions, and
f is the optimization factor.
P = percent of original ported code to be optimized
T = total original run time
t1 = run time of optimized portion after optimization
t2 = run time of the unoptimized portion
= PT/4 with an optimization factor of four.
= T(1 – P).
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
t1
t2
=
80
20
f
=
=
1 +SP – S
T(1 – P)
PT/4
SP
=
(1 – P)
P/4
Implementation Strategies
Equation 4
Equation 5
31

Related parts for AN2094