AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 16

AN2094

Manufacturer Part Number

AN2094

Description

ITU-T G.729 Implementation on StarCore SC140

Manufacturer

Freescale Semiconductor / Motorola

Datasheet

1.AN2094.pdf (52 pages)

Current page: 16 of 52
Download datasheet (348Kb)

Optimization Process

2.5.3 Platform-Dependent Changes

Platform-dependent changes to algorithms are those that reorder and restructure data or reorder and regroup

computation blocks to take advantage of the parallel architecture of a particular processor. Such changes, if they

are done, prove to be even more effective than platform-independent changes. The following are general guidelines

for performing platform-dependent changes to a system based on the SC140:

2.5.4 Procedural Notes

Because our project had an important investigative aspect, we performed algorithmic changes after implementing

the function-level C optimizations. Although this resulted in some duplication of effort, we were able to

demonstrate the limits of the implementation without global knowledge of the application. In a real-life application

we recommend performing algorithmic changes in the first stages of the project. We performed the algorithmic

changes focusing on modules that grouped several functions, including:

2.5.5 Results

The results of the algorithmic changes proved to be quite rewarding, as shown in Table 7.

2.6 Function Implementation in Assembly

In general, rewriting C functions in assembly increases speed and reduces code size. However, because of

improvements in C compiler performance, the trend is to keep most code in C and optimized C and to implement

fewer functions in assembly. In our project, implementing selected functions in assembly was performed in parallel

with final function-level C optimization. The functions to be implemented in assembly were selected based on

profiling data and experience.

•

12.8 MCPS

Speed

Data structure addressing is sequential (linear), using indices rather than pointers.

Internal pointers are initialized on multiples of four.

Searches based on interval splitting use interval division by four, rather than two.

Computations are adapted to pipelines with four computation units.

DPF formats are translated to native 32-bit representation.

Vector lengths are multiples of four. The time penalty (the difference between the time it takes to

compute (N

substantially smaller and more clear.

Sequential, identical computations are grouped together.

D4i40_17() + Cor_h() + Cor_h_X()

Az_lsp() + Chebps()

Lag_max() + Pitch_ol()

pst_ltp() + search_del()

Table 7. Performance Characteristics After Algorithmic Changes and C Reoptimization

ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1

4 + 1) samples and (N + 1)

Program Size

42.7 KB

6.95 KB

Tables

4 samples) is negligible, and the resulting code is

Channel Data

3.18 KB

Freescale Semiconductor

Stack Size

3.05 KB

AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 16

AN2094

Related parts for AN2094