AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 15

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
It is also useful to examine the relationship between module results and internal computed values. Often the
number of values a module computes and stores is significantly larger than the number of values returned. For
example, the output of a function might be the offset of a single value, while numerous values are computed and
stored internally to compute it. In this case, it may be worthwhile to examine the relationships of the output and the
internal variables to determine if the output is obtained directly or with fewer internal variables.
Special care should be taken to maintain bit-exactness when necessary. This is particularly true for algorithms in
which the order of operations is changed. In these cases, it is wise to mathematically verify that bit-exactness is
preserved. Other operations, such as search algorithms which return the position of a value with a given property,
are more flexible and may not require such rigorous initial verification. In all cases, algorithm changes should be
verified by implementing them in C rather than another high-level language.
2.5.1 Identifying Algorithms to Change
Determining which algorithms to optimize is a process that is difficult to characterize precisely. It generally
involves a search of G1 functions tempered by experience. The following two guidelines reduce the scope of this
search to a manageable range:
Although the functions in step 2 tend to be small and not included in the G1 function set, they are usually easily
modified so that the data they provide to the functions selected in step 1 is accessed and manipulated more
efficiently. In our project we selected the three most time-consuming modules, which together accounted for more
than 60% of the vocoder’s total execution time after the function-level C optimization phase. In the ported version
these modules are also the most time consuming, each taking more than 5% of the total execution time.
Ideally, total development time would be decreased by selecting the algorithms to modify before implementing the
function-level C optimizations. However, this is possible only if the improvement in execution time after function-
level C optimization or assembly implementation is predicted fairly accurately, which cannot be done without a
great deal of experience with the algorithms and optimization techniques.
After the algorithm changes are complete and the final C implementation passes all ITU-T test vectors, it is
recommended that another round of function-level C optimizations be performed on the changed functions. The
result should provide the fastest possible C implementation of the vocoder. This version also serves as excellent
reference code for the assembly implementation. The split summation is already tested, concerns about bit-
exactness have been addressed, and debugging of the assembly version is performed very easily by comparing it to
the C version.
2.5.2 Platform-Independent Changes
Platform-independent changes to algorithms are improvements in code flow which apply to all processor types and
C compilers. The following are general guidelines for implementing platform-independent changes.
Freescale Semiconductor
1.
2.
Replace the time-consuming operations div and log with multiplications.
Remove repeated computations of the same value.
Reorder computations to avoid repeated fetches of the same value.
Reduce the number of tests.
Use profile data to select only the most time-consuming functions from the G1 set.
Add the functions that provide inputs to and receive outputs from the functions chosen in step 1.
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
Optimization Process
15

Related parts for AN2094