Motion Video Instructions

Beginning with the PCA56 processor, DEC added the Motion Video Instructions (MVI) to accelerate algorithms related to motion video formats such as MPEG1 and MPEG2^[1]. Compared to other SIMD instruction sets of the time, MVI is very simple. In order to prevent complicating the instruction decode logic, the MVI extension contains only 13 SIMD instructions.

Unlike Intel's MMX and SSE SIMD extensions, MVIs use the Alpha's general purpose registers.

Added Instructions

Mnemonic	Description
minub8	minimum of packed unsigned bytes
maxub8	maximum of packed unsigned bytes
minsb8	minimum of packed signed bytes
maxsb8	maximum of packed signed bytes
minuw4	minimum of packed unsigned words
maxuw4	maximum of packed unsigned words
minsw4	minimum of packed signed words
maxsw4	maximum of packed signed words
pkwb	pack words into bytes
unpkwb	unpack words into bytes
pklb	pack longs into bytes
unpklb	unpack longs into bytes
perr	sum the absolute differences of each byte (pixel error)^[2]

Determining Presence

To determine the presence of MVI, use the amask instruction.

Latency and Slotting

On the in-order PCA56, all MVIs have a latency of 2 cycles. This means, at least one instruction must separate MVIs to prevent stalling. On the out-of-order EV6 and newer, MVIs have a latency of 3 cycles and are slotted U0^[3].

Usage

Unsigned Saturated Arithmetic

By using the packed minimum, packed unsigned saturated addition and subtraction can be easily performed.

For instance, to add the packed unsigned bytes stored in $16 with those in $17 with saturation and store the result in $0:

ornot  $31,$16,$1
minub8 $17,$1,$17
addq   $16,$17,$0

To subtract the packed unsigned bytes stored in $16 with those in $17 with saturation and store the result in $0:

minub8 $17,$16,$17
subq   $16,$17,$0

Note, these are not optimized for register usage or latency.

To use this in C, the following functions may be used.

#define __minub8        __builtin_alpha_minub8
#define __minuw4        __builtin_alpha_minuw4

/* Add the 8-bit values in M1 to the 8-bit values in M2 using unsigned
 * saturated arithmetic (MMX equivalent: paddusb) */
static inline __m64
addusb8(__m64 m1, __m64 m2) {
        return m1 + __minub8(m2, ~m1);
}

/* Add the 16-bit values in M1 to the 16-bit values in M2 using unsigned
 * saturating arithmetic (MMX equilvant: paddusw) */
static inline __m64
addusw4(__m64 m1, __m64 m2) {
        return m1 + __minuw4(m2, ~m1);
}

/* Subtract the 8-bit values in M1 to the 8-bit values in M2 using unsigned
 * saturated arithmetic (MMX equivalent: psubusb) */
static inline __m64
subusb8(__m64 m1, __m64 m2) {
        return m1 - __minub8(m2, m1);
}

/* Subtract the 16-bit values in M1 to the 16-bit values in M2 using unsigned
 * saturating arithmetic (MMX equivalent: psubusw) */
static inline __m64
subusw4(__m64 m1, __m64 m2) {
        return m1 - __minuw4(m2, m1);
}

External Links

References

[1] Template:Cite web

[2] Template:Cite web

[3] Template:Cite web

[1]

[2]

[3]

Motion Video Instructions

Contents

Added Instructions

Determining Presence

Latency and Slotting

Usage

Unsigned Saturated Arithmetic

External Links

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools