PNX1302EH NXP Semiconductors, PNX1302EH Datasheet - Page 82

PNX1302EH

Manufacturer Part Number

PNX1302EH

Description

Manufacturer

NXP Semiconductors

Datasheet

1.PNX1302EH.pdf (548 pages)

Specifications of PNX1302EH

Lead Free Status / RoHS Status

Not Compliant

Available stocks

Company

Part Number

Manufacturer

Quantity

Price

Company:

Bonase Electronics (HK) Co., Limited

Part Number:

PNX1302EH

Manufacturer:

NXP

Quantity:

201

Company:

BOSTOCK HK LIMITED

Part Number:

PNX1302EH

Manufacturer:

XILINX

Company:

Meier Automation Equipment Co., Limited

Part Number:

PNX1302EH

Manufacturer:

PHILIPS/飞利浦

Quantity:

20 000

Company:

BOSTOCK HK LIMITED

Part Number:

PNX1302EH,557

Manufacturer:

NXP Semiconductors

Quantity:

10 000

Company:

Bonase Electronics (HK) Co., Limited

Part Number:

PNX1302EH/G

Manufacturer:

NXP

Quantity:

5 510

Company:

Meier Automation Equipment Co., Limited

Part Number:

PNX1302EH/G

Manufacturer:

NXP/恩智浦

Quantity:

20 000

Current page: 82 of 548
Download datasheet (6Mb)

PNX1300/01/02/11 Data Book

overhead of the inner loop has been eliminated, further

increasing the performance advantage.

4.4.2

The code transformations of the previous section

achieved impressive performance improvements, but

given the VLIW nature of the PNX1300 CPU, more can

be done to exploit PNX1300’s parallelism.

The code in

erations (excluding loop overhead). Since PNX1300’s

branches have a 3-instruction delay and each instruction

can contain up to 5 operations, a fully utilized minimum-

sized loop can contain 16 operations (20 minus loop

overhead).

The PNX1300 compilation system performs a wide vari-

ety of powerful code transformation and scheduling opti-

mizations to ensure that the VLIW capabilities of the

CPU are exploited. It is still wise, however, to make pro-

gram parallelism explicit in source code when possible.

Explicit parallelism can only help the compiler produce a

fast running program.

To this end, we can unroll the loop of

number of times to create explicit parallelism and help

the compiler create a fast running loop. In this case,

where the number of iterations is a power-of-two, it

makes sense to unroll by a factor that is a power-of-two

to create clean code.

Figure 4-15

The compiler can apply common sub-expression elimi-

nation and other optimizations to eliminate extraneous

operations in the array indexing, but, again, improve-

ments in the source code can only help the compiler pro-

duce the best possible code and fastest-running pro-

gram.

4-10

Figure 4-14. The loop of

More Unrolling

Figure 4-12

shows the loop unrolled by a factor of eight.

unsigned int *IA = (unsigned int *) A;

unsigned int *IB = (unsigned int *) B;

for (row = 0; row < 16; row += 1)

{

}

int rowoffset = row * 4;

for (col4 = 0; col4 < 4; col4 += 1)

PRELIMINARY SPECIFICATION

has a loop containing only 4 op-

cost += UME8UU(IA[rowoffset + col4], IB[rowoffset + col4]);

Figure 4-13

Figure 4-12

recoded with 32-bit array accesses and the ume8uu custom operation.

some

Figure 4-16

pler array indexing.

Figure 4-15. Unrolled version of

code makes good use of PNX1300’s VLIW capabili-

ties.

Figure 4-16. Code from

array index calculations.

unsigned char A[16][16];

unsigned char B[16][16];

unsigned int *IA = (unsigned int *) A;

unsigned int *IB = (unsigned int *) B;

for (i = 0; i < 64; i += 8, IA += 8, IB += 8)

{

}

unsigned int *IA = (unsigned int *) A;

unsigned int *IB = (unsigned int *) B;

for (i = 0; i < 64; i += 8)

{

}

cost0 = UME8UU(IA[0], IB[0]);

cost1 = UME8UU(IA[1], IB[1]);

cost2 = UME8UU(IA[2], IB[2]);

cost3 = UME8UU(IA[3], IB[3]);

cost4 = UME8UU(IA[4], IB[4]);

cost5 = UME8UU(IA[5], IB[5]);

cost6 = UME8UU(IA[6], IB[6]);

cost7 = UME8UU(IA[7], IB[7]);

cost += cost0 + cost1 + cost2 +

cost0 = UME8UU(IA[i+0], IB[i+0]);

cost1 = UME8UU(IA[i+1], IB[i+1]);

cost2 = UME8UU(IA[i+2], IB[i+2]);

cost3 = UME8UU(IA[i+3], IB[i+3]);

cost4 = UME8UU(IA[i+4], IB[i+4]);

cost5 = UME8UU(IA[i+5], IB[i+5]);

cost6 = UME8UU(IA[i+6], IB[i+6]);

cost7 = UME8UU(IA[i+7], IB[i+7]);

cost += cost0 + cost1 + cost2 +

shows one way to modify the code for sim-

cost3 + cost4 + cost5 +

cost6 + cost7;

cost3 + cost4 + cost5 +

cost6 + cost7;

Figure 4-15

Philips Semiconductors

Figure

with simplified

4-12. This

PNX1302EH NXP Semiconductors, PNX1302EH Datasheet - Page 82

PNX1302EH

Specifications of PNX1302EH

Available stocks

Related parts for PNX1302EH