AN2203 Freescale Semiconductor / Motorola, AN2203 Datasheet - Page 51

AN2203

Manufacturer Part Number

AN2203

Description

MPC7450 RISC Microprocessor Family Software Optimization Guide

Manufacturer

Freescale Semiconductor / Motorola

Datasheet

1.AN2203.pdf (76 pages)

Available stocks

Company

Part Number

Manufacturer

Quantity

Price

Company:

Meier Automation Equipment Co., Limited

Part Number:

AN22030A

Manufacturer:

PANASONIC/松下

Quantity:

20 000

Current page: 51 of 76
Download datasheet (650Kb)

4.4.4

Transforming code to reference vector data as opposed to scalar data can produce signiﬁcant performance

beneﬁts for certain types of code. The MPC7400 and MPC7450 support the AltiVec extension to the

PowerPC architecture, which enables vector SIMD computing.

The analysis required to automatically vectorize scalar applications is quite sophisticated and requires

signiﬁcant infrastructure to incorporate into a compiler. Note that it is possible to create a preprocessor that

takes a C ﬁle, performs the autovectorization using the AltiVec programming interface, and outputs a vector

version of the C ﬁle. Now the ﬁle can be compiled using any AltiVec-enabled compiler and no modiﬁcations

to the compiler itself were required.

The AltiVec Programming Interface Manual, available on the Motorola website, contains information on the

AltiVec programming interface and should be referenced.

To take the example in Section 4.4.3, “Loop Unrolling for Long Pipelines,” one step further, this code

sequence could also be vectorized. Table 4-2 is a vectorized (and loop unrolled) version of the following

code sequence. This code assumes that the data is aligned on a 128-bit boundary. Note that the lack of a

vector update form means a few extra integer registers must be reserved for holding constants, but because

the primary computation is now in the vector registers, this should not be a problem. A vector sum across

(vsumsws) is needed after the loop body to sum the four words within the vector into a single ﬁnal result.

xxxxxx00

xxxxxx04

xxxxxx08

xxxxxx0C

xxxxxx10

xxxxxx14

xxxxxx18

xxxxxx1C

xxxxxx20

xxxxxx24

xxxxxx28

Table 4-2 shows that the code has been vastly accelerated from the original example. For this code, four

effective iterations (lwz/add) are completing per cycle. Vectorization quadruples performance over the loop

unrolled example and provides a full 12x performance increase from the original example in Table 1-1.

MOTOROLA

Table 4-1. MPC7450 Execution of One—Two Iterations of Code Loop Example (continued)

Vectorization

add (4)

bdnz

lwzu (5)

add (5)

loop:

Instruction

MPC7450 RISC Microprocessor Family Software Optimization Guide

Freescale Semiconductor, Inc.

For More Information On This Product,

lvx v10,r8,r9

vaddsws v11,v11,v10

lvx v10,r7,r9

vaddsws v11,v11,v10

lvx v10,r6,r9

vaddsws v11,v11,v10

lvx v10,r5,r9

vaddsws v11,v11,v10

addi r9,r9,0x10

bdnz loop

vsumsws v11,v11,v0

—

Go to: www.freescale.com

—

Other Optimizations Worth Investigating

—

AN2203 Freescale Semiconductor / Motorola, AN2203 Datasheet - Page 51

AN2203

Available stocks

Related parts for AN2203