PNX1301 PHILIPS [NXP Semiconductors], PNX1301 Datasheet - Page 76

PNX1301

Manufacturer Part Number

PNX1301

Description

Media Processors

Manufacturer

PHILIPS [NXP Semiconductors]

Datasheet

1.PNX1301.pdf (548 pages)

Available stocks

Company

Part Number

Manufacturer

Quantity

Price

Company:

Bonase Electronics (HK) Co., Limited

Part Number:

PNX1301EH

Manufacturer:

MARVELL

Quantity:

335

Company:

BOSTOCK HK LIMITED

Part Number:

PNX1301EH

Manufacturer:

HAR

Quantity:

Company:

BOSTOCK HK LIMITED

Part Number:

PNX1301EH,557

Manufacturer:

NXP Semiconductors

Quantity:

10 000

Company:

Meier Automation Equipment Co., Limited

Part Number:

PNX1301EH/G(ROHS)

Manufacturer:

PHILIPS/飞利浦

Quantity:

20 000

Current page: 76 of 548
Download datasheet (6Mb)

PNX1300/01/02/11 Data Book

Figure 4-3. On the left is a complete list of operations to perform the byte-matrix transposition of

and

ed for further computations (the PNX1300 optimizing C

compiler performs this analysis automatically). In this ex-

ample, the transpose matrix is placed in registers R18,

R19, R20, and R21. The final four store-word operations

put the transposed matrix back into memory.

Thus, using the PNX1300 custom operations, the byte-

matrix transposition requires four load-word operations

and four store-word operations (the minimum possible)

and eight register-to-register data-manipulation opera-

tions. The result is 16 operations, or byte-matrix transpo-

sition at the rate of one operation per byte.

While the advantage of the custom-operation-based al-

gorithm over the brute-force code that uses 24 load- and

store-byte instruction seems to be only eight operations

(a 33% reduction), the advantage is actually much great-

er. First, using custom operations, the number of memo-

ry references is reduced from 24 to eight (a factor of

three). Since memory references are slower than regis-

ter-to-register operations (such as the custom operations

in this example), the reduction in memory references is

significant.

Further, the ability of the PNX1300 VLIW compilation

system to exploit the performance potential of the

PNX1300 microprocessor hardware is enhanced by the

custom-operation-based code. This is because it is eas-

ier for the compilation system to produce an optimal

schedule (arrangement) of the code when the number of

memory references is in balance with the number of reg-

ister-to-register operations. The PNX1300 CPU (like all

high-performance microprocessors) has a limit on the

Figure 4-2. Application of merge and pack instructions to the byte-matrix transposition of

4-4

Figure

ld32d(0) r100

ld32d(4) r100

ld32d(8) r100

ld32d(12) r100

mergemsb r10 r11

mergemsb r12 r13

mergelsb r10 r11

mergelsb r12 r13

pack16msb r14 r15

pack16lsb r14 r15

pack16msb r16 r17

pack16lsb r16 r17

st32d(0) r101 r18

st32d(4) r101 r19

st32d(8) r101 r20

st32d(12) r101 r21

4-2. On the left is an equivalent C-language fragment.

PRELIMINARY SPECIFICATION

Row Major

r10

r11

r12

r13

r14

r15

r16

r17

r18

r19

r20

r21

mergemsb

mergelsb

a e b f

c g d h

k o l p

i m j n

number of memory references that can be processed in

a single cycle (two is the current limit). A long sequence

of code that contains only memory references can result

in empty operation slots in the long PNX1300 instruc-

tions. Empty operation slots waste the performance po-

tential of the PNX1300 hardware.

As this example has shown, careful use of custom oper-

ations has the potential to not only reduce the absolute

number of operations needed to perform a computation

but can also help the compilation system produce code

that fully exploits the performance potential of the

PNX1300 CPU.

4.3

The complete MPEG video decoding algorithm is com-

posed of many different phases, each with computational

intensive kernels. One important kernel deals with recon-

structing a single image frame given that the forward-

and backward-predicted frames and the inverse discrete

cosine transform (IDCT) results have already been com-

puted. This kernel provides an excellent opportunity to il-

lustrate of the power of PNX1300’s specialized custom

operators.

In the code fragments that follow, the backward-predict-

ed block is assumed to have been computed into an ar-

ray back[], the forward-predicted block is assumed to

have been computed into forward[], and the IDCT results

are assumed to have been computed into idct[].

char matrix[4][4];

int *m = (int *) matrix;

temp0 = MERGEMSB(m[0], m[1]);

temp1 = MERGEMSB(m[2], m[3]);

temp2 = MERGELSB(m[0], m[1]);

temp3 = MERGELSB(m[2], m[3]);

m[0]

m[1]

m[2]

m[3]

pack16msb

pack16lsb

pack16msb

pack16lsb

EXAMPLE 2: MPEG IMAGE

RECONSTRUCTION

= PACK16MSB(temp0, temp1);

= PACK16LSB(temp0, temp1);

= PACK16MSB(temp2, temp3);

= PACK16LSB(temp2, temp3);

Column Major

Philips Semiconductors

Figure

4-1.

Figure 4-1

PNX1301 PHILIPS [NXP Semiconductors], PNX1301 Datasheet - Page 76

PNX1301

Available stocks

Related parts for PNX1301