PNX1301EH NXP Semiconductors, PNX1301EH Datasheet - Page 75

no-image

PNX1301EH

Manufacturer Part Number
PNX1301EH
Description
Manufacturer
NXP Semiconductors
Datasheet

Specifications of PNX1301EH

Lead Free Status / RoHS Status
Not Compliant

Available stocks

Company
Part Number
Manufacturer
Quantity
Price
Part Number:
PNX1301EH
Manufacturer:
MARVELL
Quantity:
335
Part Number:
PNX1301EH
Manufacturer:
HAR
Quantity:
8
Part Number:
PNX1301EH,557
Manufacturer:
NXP Semiconductors
Quantity:
10 000
Part Number:
PNX1301EH/G(ROHS)
Manufacturer:
PHILIPS/飞利浦
Quantity:
20 000
Philips Semiconductors
Table 4-2. Key Multimedia Custom Operations
Listed by Operand Size
4.1.3
The next three sections illustrate the advantages of using
custom operations. Also, the more complex examples il-
lustrate how custom operations can be integrated into
application code by providing listings of C-language pro-
gram fragments. The examples progress in complexity
from simple to intricate; the most interesting examples
are taken from actual multimedia codes, such as MPEG
decompression.
4.2
The goal of this example is to provide a simple, introduc-
tory illustration of how custom operations can significant-
ly increase processing speed in small kernels of applica-
tions. As in most uses of custom operations, the power
of custom operations in this case comes from their ability
to operate on multiple data items in parallel.
Imagine that our task is to transpose a packed, 4-by-4
matrix of bytes in memory; the matrix might, for example,
contain 8-bit pixel values.
organization of the matrix in memory and the task to be
performed in standard mathematical notation.
Performing this operation with traditional microprocessor
instructions is straight forward but time consuming. One
way to perform the manipulation is to perform 12 load-
byte instructions (since only 12 of the 16 bytes need to
be repositioned) and 12 store-byte instructions that place
the bytes back in memory in their new positions. Another
way would be to perform four load-word instructions, re-
8-bit
Op. Size
EXAMPLE 1: BYTE-MATRIX
TRANSPOSITION
Example Uses of Custom Ops
quadumax
quadumin
dspuquadaddui
ifir8ii
ifir8iu
ufir8uu
mergelsb
mergemsb
packbytes
quadavg
quadumulmsb
ume8ii
ume8uu
Custom Op
Figure 4-1
Unsigned bytewise quad max
Unsigned bytewise quad min
Quad clipped add of unsigned/
signed bytes
Signed sum of products of
signed bytes
Signed sum of products of
signed/unsigned bytes
Unsigned sum of products of
unsigned bytes
Merge least-significant bytes
Merge most-significant bytes
Pack least-significant bytes
Unsigned byte-wise quad aver-
age
Unsigned quad 8-bit multiply
most significant
Unsigned sum of absolute val-
ues of signed 8-bit differences
Unsigned sum of absolute val-
ues of unsigned 8-bit differ-
ences
Description
illustrates both the
Figure 4-1. Byte-matrix transposition. Top shows
byte matrices packed into memory words; bottom
shows mathematical matrix representation.
position the bytes in registers, and then perform four
store-word instructions. Unfortunately, repositioning the
bytes in registers would require a large number of in-
structions to properly shift and mask the bytes. Perform-
ing the 24 loads and stores makes implicit use of the
shifting and masking hardware in the load/store units and
thus yields a shorter instruction sequence.
The problem with performing 24 loads and stores is that
loads and stores are inherently slow operations because
they must access at least the cache and possibly slower
layers in the memory hierarchy. Further, performing byte
loads and stores when 32-bit word-wide accesses run
just as fast wastes the power of the cache/memory inter-
face. We would prefer a fast algorithm that takes full ad-
vantage of cache/memory bandwidth while not requiring
an inordinate number of byte-manipulation instructions.
PNX1300 has instructions that merge and pack bytes
and 16-bit halfwords directly and in parallel. Four of
these instructions can be applied in this case to speed up
the manipulation of bytes that are packed into words.
Figure 4-2
the byte-matrix transposition problem, and the left side of
Figure 4-3
plement the matrix transpose. When assembled into ac-
tual PNX1300 instructions, these custom operations
would be packed as tightly as dependencies allow, up to
five operations per instruction.
Note that a programmer would not need to program at
this level (PNX1300 assembler). The matrix transpose
would be expressed just as efficiently in C-language
source code, as shown on the right side of
The low-level code is shown here for illustration purpos-
es only.
The first sequence of four load-word operations in
Figure 4-3
into registers R10, R11, R12, and R13. The next se-
quence of four merge operations produces intermediate
results into registers R14, R15, R16, and R17. The next
sequence of four pack operations could then replace the
original operands or place the transposed matrix in sep-
arate registers if the original matrix operands were need-
PRELIMINARY SPECIFICATION
Location
Memory
n+12:
n+0:
n+4:
n+8:
31
m n o p
shows the application of these instructions to
a b
e
shows a list of the operations needed to im-
brings the packed words of the input matrix
i
m
a
e
i
Row Major
Custom Operations for Multimedia
f
j
b
n
f
j
c
g
k
o
c
g h
k
d
h
p
l
d
l
0
Transpose
Transpose
31
a e
b
c
d h
Column Major
a
b
c
d
g
f
e
g
h
f
k
k
i
j
l
i
j
l
m
n
o
p
m
n
o
p
Figure
0
4-3.
4-3

Related parts for PNX1301EH