AN2203 Freescale Semiconductor / Motorola, AN2203 Datasheet - Page 51

no-image

AN2203

Manufacturer Part Number
AN2203
Description
MPC7450 RISC Microprocessor Family Software Optimization Guide
Manufacturer
Freescale Semiconductor / Motorola
Datasheet

Available stocks

Company
Part Number
Manufacturer
Quantity
Price
Part Number:
AN22030A
Manufacturer:
PANASONIC/松下
Quantity:
20 000
4.4.4
Transforming code to reference vector data as opposed to scalar data can produce significant performance
benefits for certain types of code. The MPC7400 and MPC7450 support the AltiVec extension to the
PowerPC architecture, which enables vector SIMD computing.
The analysis required to automatically vectorize scalar applications is quite sophisticated and requires
significant infrastructure to incorporate into a compiler. Note that it is possible to create a preprocessor that
takes a C file, performs the autovectorization using the AltiVec programming interface, and outputs a vector
version of the C file. Now the file can be compiled using any AltiVec-enabled compiler and no modifications
to the compiler itself were required.
The AltiVec Programming Interface Manual, available on the Motorola website, contains information on the
AltiVec programming interface and should be referenced.
To take the example in Section 4.4.3, “Loop Unrolling for Long Pipelines,” one step further, this code
sequence could also be vectorized. Table 4-2 is a vectorized (and loop unrolled) version of the following
code sequence. This code assumes that the data is aligned on a 128-bit boundary. Note that the lack of a
vector update form means a few extra integer registers must be reserved for holding constants, but because
the primary computation is now in the vector registers, this should not be a problem. A vector sum across
(vsumsws) is needed after the loop body to sum the four words within the vector into a single final result.
xxxxxx00
xxxxxx04
xxxxxx08
xxxxxx0C
xxxxxx10
xxxxxx14
xxxxxx18
xxxxxx1C
xxxxxx20
xxxxxx24
xxxxxx28
Table 4-2 shows that the code has been vastly accelerated from the original example. For this code, four
effective iterations (lwz/add) are completing per cycle. Vectorization quadruples performance over the loop
unrolled example and provides a full 12x performance increase from the original example in Table 1-1.
MOTOROLA
Table 4-1. MPC7450 Execution of One—Two Iterations of Code Loop Example (continued)
Vectorization
add (4)
bdnz
lwzu (5)
add (5)
loop:
Instruction
MPC7450 RISC Microprocessor Family Software Optimization Guide
Freescale Semiconductor, Inc.
For More Information On This Product,
lvx v10,r8,r9
vaddsws v11,v11,v10
lvx v10,r7,r9
vaddsws v11,v11,v10
lvx v10,r6,r9
vaddsws v11,v11,v10
lvx v10,r5,r9
vaddsws v11,v11,v10
addi r9,r9,0x10
bdnz loop
vsumsws v11,v11,v0
0
1
Go to: www.freescale.com
BE
2
D
D
3
D
D
4
I
Other Optimizations Worth Investigating
5
I
I
E0
6
E1
7
E2
8
E
C
C
C
E
9
51

Related parts for AN2203