AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 16

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
Optimization Process
2.5.3 Platform-Dependent Changes
Platform-dependent changes to algorithms are those that reorder and restructure data or reorder and regroup
computation blocks to take advantage of the parallel architecture of a particular processor. Such changes, if they
are done, prove to be even more effective than platform-independent changes. The following are general guidelines
for performing platform-dependent changes to a system based on the SC140:
2.5.4 Procedural Notes
Because our project had an important investigative aspect, we performed algorithmic changes after implementing
the function-level C optimizations. Although this resulted in some duplication of effort, we were able to
demonstrate the limits of the implementation without global knowledge of the application. In a real-life application
we recommend performing algorithmic changes in the first stages of the project. We performed the algorithmic
changes focusing on modules that grouped several functions, including:
2.5.5 Results
The results of the algorithmic changes proved to be quite rewarding, as shown in Table 7.
2.6 Function Implementation in Assembly
In general, rewriting C functions in assembly increases speed and reduces code size. However, because of
improvements in C compiler performance, the trend is to keep most code in C and optimized C and to implement
fewer functions in assembly. In our project, implementing selected functions in assembly was performed in parallel
with final function-level C optimization. The functions to be implemented in assembly were selected based on
profiling data and experience.
16
12.8 MCPS
Speed
Data structure addressing is sequential (linear), using indices rather than pointers.
Internal pointers are initialized on multiples of four.
Searches based on interval splitting use interval division by four, rather than two.
Computations are adapted to pipelines with four computation units.
DPF formats are translated to native 32-bit representation.
Vector lengths are multiples of four. The time penalty (the difference between the time it takes to
compute (N
substantially smaller and more clear.
Sequential, identical computations are grouped together.
D4i40_17() + Cor_h() + Cor_h_X()
Az_lsp() + Chebps()
Lag_max() + Pitch_ol()
pst_ltp() + search_del()
Table 7. Performance Characteristics After Algorithmic Changes and C Reoptimization
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
×
4 + 1) samples and (N + 1)
Program Size
42.7 KB
6.95 KB
×
Tables
4 samples) is negligible, and the resulting code is
Channel Data
3.18 KB
Freescale Semiconductor
Stack Size
3.05 KB

Related parts for AN2094