Motion Video Instructions
Beginning with the PCA56 processor, DEC added the Motion Video Instructions (MVI) to accelerate algorithms related to motion video formats such as MPEG1 and MPEG2^{[1]}. Compared to other SIMD instruction sets of the time, MVI is very simple. In order to prevent complicating the instruction decode logic, the MVI extension contains only 13 SIMD instructions.
Unlike Intel's MMX and SSE SIMD extensions, MVIs use the Alpha's general purpose registers.
Contents
Added Instructions
Mnemonic | Description |
---|---|
minub8 | minimum of packed unsigned bytes |
maxub8 | maximum of packed unsigned bytes |
minsb8 | minimum of packed signed bytes |
maxsb8 | maximum of packed signed bytes |
minuw4 | minimum of packed unsigned words |
maxuw4 | maximum of packed unsigned words |
minsw4 | minimum of packed signed words |
maxsw4 | maximum of packed signed words |
pkwb | pack words into bytes |
unpkwb | unpack words into bytes |
pklb | pack longs into bytes |
unpklb | unpack longs into bytes |
perr | sum the absolute differences of each byte (pixel error)^{[2]} |
Determining Presence
To determine the presence of MVI, use the amask instruction.
Latency and Slotting
On the in-order PCA56, all MVIs have a latency of 2 cycles. This means, at least one instruction must separate MVIs to prevent stalling. On the out-of-order EV6 and newer, MVIs have a latency of 3 cycles and are slotted U0^{[3]}.
Usage
Unsigned Saturated Arithmetic
By using the packed minimum, packed unsigned saturated addition and subtraction can be easily performed.
For instance, to add the packed unsigned bytes stored in $16 with those in $17 with saturation and store the result in $0:
ornot $31,$16,$1 minub8 $17,$1,$17 addq $16,$17,$0
To subtract the packed unsigned bytes stored in $16 with those in $17 with saturation and store the result in $0:
minub8 $17,$16,$17 subq $16,$17,$0
Note, these are not optimized for register usage or latency.
To use this in C, the following functions may be used.
#define __minub8 __builtin_alpha_minub8 #define __minuw4 __builtin_alpha_minuw4 /* Add the 8-bit values in M1 to the 8-bit values in M2 using unsigned * saturated arithmetic (MMX equivalent: paddusb) */ static inline __m64 addusb8(__m64 m1, __m64 m2) { return m1 + __minub8(m2, ~m1); } /* Add the 16-bit values in M1 to the 16-bit values in M2 using unsigned * saturating arithmetic (MMX equilvant: paddusw) */ static inline __m64 addusw4(__m64 m1, __m64 m2) { return m1 + __minuw4(m2, ~m1); } /* Subtract the 8-bit values in M1 to the 8-bit values in M2 using unsigned * saturated arithmetic (MMX equivalent: psubusb) */ static inline __m64 subusb8(__m64 m1, __m64 m2) { return m1 - __minub8(m2, m1); } /* Subtract the 16-bit values in M1 to the 16-bit values in M2 using unsigned * saturating arithmetic (MMX equivalent: psubusw) */ static inline __m64 subusw4(__m64 m1, __m64 m2) { return m1 - __minuw4(m2, m1); }
External Links
- MVI Code Examples
- Digital's Motion Video Instruction Extensions for Alpha Whitepaper
- Digital, MIPS Add Multimedia Extensions
- Real Time MPEG-I and MPEG-II Compression with the Alpha Microprocessors