PNX1301EH NXP Semiconductors, PNX1301EH Datasheet - Page 75

PNX1301EH

Manufacturer Part Number

PNX1301EH

Description

Manufacturer

NXP Semiconductors

Datasheet

1.PNX1301EH.pdf (548 pages)

Specifications of PNX1301EH

Lead Free Status / RoHS Status

Not Compliant

Available stocks

Company

Part Number

Manufacturer

Quantity

Price

Company:

Bonase Electronics (HK) Co., Limited

Part Number:

PNX1301EH

Manufacturer:

MARVELL

Quantity:

335

Company:

BOSTOCK HK LIMITED

Part Number:

PNX1301EH

Manufacturer:

HAR

Quantity:

Company:

BOSTOCK HK LIMITED

Part Number:

PNX1301EH,557

Manufacturer:

NXP Semiconductors

Quantity:

10 000

Company:

Meier Automation Equipment Co., Limited

Part Number:

PNX1301EH/G(ROHS)

Manufacturer:

PHILIPS/飞利浦

Quantity:

20 000

Current page: 75 of 548
Download datasheet (6Mb)

Philips Semiconductors

Table 4-2. Key Multimedia Custom Operations

Listed by Operand Size

4.1.3

The next three sections illustrate the advantages of using

custom operations. Also, the more complex examples il-

lustrate how custom operations can be integrated into

application code by providing listings of C-language pro-

gram fragments. The examples progress in complexity

from simple to intricate; the most interesting examples

are taken from actual multimedia codes, such as MPEG

decompression.

4.2

The goal of this example is to provide a simple, introduc-

tory illustration of how custom operations can significant-

ly increase processing speed in small kernels of applica-

tions. As in most uses of custom operations, the power

of custom operations in this case comes from their ability

to operate on multiple data items in parallel.

Imagine that our task is to transpose a packed, 4-by-4

matrix of bytes in memory; the matrix might, for example,

contain 8-bit pixel values.

organization of the matrix in memory and the task to be

performed in standard mathematical notation.

Performing this operation with traditional microprocessor

instructions is straight forward but time consuming. One

way to perform the manipulation is to perform 12 load-

byte instructions (since only 12 of the 16 bytes need to

be repositioned) and 12 store-byte instructions that place

the bytes back in memory in their new positions. Another

way would be to perform four load-word instructions, re-

8-bit

Op. Size

EXAMPLE 1: BYTE-MATRIX

TRANSPOSITION

Example Uses of Custom Ops

quadumax

quadumin

dspuquadaddui

ifir8ii

ifir8iu

ufir8uu

mergelsb

mergemsb

packbytes

quadavg

quadumulmsb

ume8ii

ume8uu

Custom Op

Figure 4-1

Unsigned bytewise quad max

Unsigned bytewise quad min

Quad clipped add of unsigned/

signed bytes

Signed sum of products of

signed bytes

Signed sum of products of

signed/unsigned bytes

Unsigned sum of products of

unsigned bytes

Merge least-significant bytes

Merge most-significant bytes

Pack least-significant bytes

Unsigned byte-wise quad aver-

age

Unsigned quad 8-bit multiply

most significant

Unsigned sum of absolute val-

ues of signed 8-bit differences

Unsigned sum of absolute val-

ues of unsigned 8-bit differ-

ences

Description

illustrates both the

Figure 4-1. Byte-matrix transposition. Top shows

byte matrices packed into memory words; bottom

shows mathematical matrix representation.

position the bytes in registers, and then perform four

store-word instructions. Unfortunately, repositioning the

bytes in registers would require a large number of in-

structions to properly shift and mask the bytes. Perform-

ing the 24 loads and stores makes implicit use of the

shifting and masking hardware in the load/store units and

thus yields a shorter instruction sequence.

The problem with performing 24 loads and stores is that

loads and stores are inherently slow operations because

they must access at least the cache and possibly slower

layers in the memory hierarchy. Further, performing byte

loads and stores when 32-bit word-wide accesses run

just as fast wastes the power of the cache/memory inter-

face. We would prefer a fast algorithm that takes full ad-

vantage of cache/memory bandwidth while not requiring

an inordinate number of byte-manipulation instructions.

PNX1300 has instructions that merge and pack bytes

and 16-bit halfwords directly and in parallel. Four of

these instructions can be applied in this case to speed up

the manipulation of bytes that are packed into words.

Figure 4-2

the byte-matrix transposition problem, and the left side of

Figure 4-3

plement the matrix transpose. When assembled into ac-

tual PNX1300 instructions, these custom operations

would be packed as tightly as dependencies allow, up to

five operations per instruction.

Note that a programmer would not need to program at

this level (PNX1300 assembler). The matrix transpose

would be expressed just as efficiently in C-language

source code, as shown on the right side of

The low-level code is shown here for illustration purpos-

es only.

The first sequence of four load-word operations in

Figure 4-3

into registers R10, R11, R12, and R13. The next se-

quence of four merge operations produces intermediate

results into registers R14, R15, R16, and R17. The next

sequence of four pack operations could then replace the

original operands or place the transposed matrix in sep-

arate registers if the original matrix operands were need-

PRELIMINARY SPECIFICATION

Location

Memory

n+12:

n+0:

n+4:

n+8:

m n o p

shows the application of these instructions to

a b

shows a list of the operations needed to im-

brings the packed words of the input matrix

Row Major

Custom Operations for Multimedia

g h

Transpose

a e

d h

Column Major

Figure

4-3.

4-3

PNX1301EH NXP Semiconductors, PNX1301EH Datasheet - Page 75

PNX1301EH

Specifications of PNX1301EH

Available stocks

Related parts for PNX1301EH