PNX1301 PHILIPS [NXP Semiconductors], PNX1301 Datasheet - Page 76

no-image

PNX1301

Manufacturer Part Number
PNX1301
Description
Media Processors
Manufacturer
PHILIPS [NXP Semiconductors]
Datasheet

Available stocks

Company
Part Number
Manufacturer
Quantity
Price
Part Number:
PNX1301EH
Manufacturer:
MARVELL
Quantity:
335
Part Number:
PNX1301EH
Manufacturer:
HAR
Quantity:
8
Part Number:
PNX1301EH,557
Manufacturer:
NXP Semiconductors
Quantity:
10 000
Part Number:
PNX1301EH/G(ROHS)
Manufacturer:
PHILIPS/飞利浦
Quantity:
20 000
PNX1300/01/02/11 Data Book
Figure 4-3. On the left is a complete list of operations to perform the byte-matrix transposition of
and
ed for further computations (the PNX1300 optimizing C
compiler performs this analysis automatically). In this ex-
ample, the transpose matrix is placed in registers R18,
R19, R20, and R21. The final four store-word operations
put the transposed matrix back into memory.
Thus, using the PNX1300 custom operations, the byte-
matrix transposition requires four load-word operations
and four store-word operations (the minimum possible)
and eight register-to-register data-manipulation opera-
tions. The result is 16 operations, or byte-matrix transpo-
sition at the rate of one operation per byte.
While the advantage of the custom-operation-based al-
gorithm over the brute-force code that uses 24 load- and
store-byte instruction seems to be only eight operations
(a 33% reduction), the advantage is actually much great-
er. First, using custom operations, the number of memo-
ry references is reduced from 24 to eight (a factor of
three). Since memory references are slower than regis-
ter-to-register operations (such as the custom operations
in this example), the reduction in memory references is
significant.
Further, the ability of the PNX1300 VLIW compilation
system to exploit the performance potential of the
PNX1300 microprocessor hardware is enhanced by the
custom-operation-based code. This is because it is eas-
ier for the compilation system to produce an optimal
schedule (arrangement) of the code when the number of
memory references is in balance with the number of reg-
ister-to-register operations. The PNX1300 CPU (like all
high-performance microprocessors) has a limit on the
Figure 4-2. Application of merge and pack instructions to the byte-matrix transposition of
4-4
Figure
ld32d(0) r100
ld32d(4) r100
ld32d(8) r100
ld32d(12) r100
mergemsb r10 r11
mergemsb r12 r13
mergelsb r10 r11
mergelsb r12 r13
pack16msb r14 r15
pack16lsb r14 r15
pack16msb r16 r17
pack16lsb r16 r17
st32d(0) r101 r18
st32d(4) r101 r19
st32d(8) r101 r20
st32d(12) r101 r21
4-2. On the left is an equivalent C-language fragment.
PRELIMINARY SPECIFICATION
m
a
e
i
Row Major
b
n
f
j
c
g
k
o
d
h
p
l
r10
r11
r12
r13
r14
r15
r16
r17
r18
r19
r20
r21
mergemsb
mergemsb
mergelsb
mergelsb
a e b f
c g d h
k o l p
i m j n
number of memory references that can be processed in
a single cycle (two is the current limit). A long sequence
of code that contains only memory references can result
in empty operation slots in the long PNX1300 instruc-
tions. Empty operation slots waste the performance po-
tential of the PNX1300 hardware.
As this example has shown, careful use of custom oper-
ations has the potential to not only reduce the absolute
number of operations needed to perform a computation
but can also help the compilation system produce code
that fully exploits the performance potential of the
PNX1300 CPU.
4.3
The complete MPEG video decoding algorithm is com-
posed of many different phases, each with computational
intensive kernels. One important kernel deals with recon-
structing a single image frame given that the forward-
and backward-predicted frames and the inverse discrete
cosine transform (IDCT) results have already been com-
puted. This kernel provides an excellent opportunity to il-
lustrate of the power of PNX1300’s specialized custom
operators.
In the code fragments that follow, the backward-predict-
ed block is assumed to have been computed into an ar-
ray back[], the forward-predicted block is assumed to
have been computed into forward[], and the IDCT results
are assumed to have been computed into idct[].
char matrix[4][4];
int *m = (int *) matrix;
temp0 = MERGEMSB(m[0], m[1]);
temp1 = MERGEMSB(m[2], m[3]);
temp2 = MERGELSB(m[0], m[1]);
temp3 = MERGELSB(m[2], m[3]);
m[0]
m[1]
m[2]
m[3]
pack16msb
pack16lsb
pack16msb
pack16lsb
EXAMPLE 2: MPEG IMAGE
RECONSTRUCTION
= PACK16MSB(temp0, temp1);
= PACK16LSB(temp0, temp1);
= PACK16MSB(temp2, temp3);
= PACK16LSB(temp2, temp3);
.
.
.
.
.
.
a
b
d
Column Major
c
Philips Semiconductors
e
g
h
f
k
i
j
l
m
n
o
p
Figure
4-1.
Figure 4-1

Related parts for PNX1301