Difference between revisions of "Motion Video Instructions"
(Imported from http://web.archive.org/web/20100713090023/http://www.alphalinux.org/wiki/index.php?title=Motion_Video_Instructions&action=edit) 
(No difference)

Latest revision as of 18:17, 29 August 2019
Beginning with the PCA56 processor, DEC added the Motion Video Instructions (MVI) to accelerate algorithms related to motion video formats such as MPEG1 and MPEG2^{[1]}. Compared to other SIMD instruction sets of the time, MVI is very simple. In order to prevent complicating the instruction decode logic, the MVI extension contains only 13 SIMD instructions.
Unlike Intel's MMX and SSE SIMD extensions, MVIs use the Alpha's general purpose registers.
Contents
Added Instructions
Mnemonic  Description 

minub8  minimum of packed unsigned bytes 
maxub8  maximum of packed unsigned bytes 
minsb8  minimum of packed signed bytes 
maxsb8  maximum of packed signed bytes 
minuw4  minimum of packed unsigned words 
maxuw4  maximum of packed unsigned words 
minsw4  minimum of packed signed words 
maxsw4  maximum of packed signed words 
pkwb  pack words into bytes 
unpkwb  unpack words into bytes 
pklb  pack longs into bytes 
unpklb  unpack longs into bytes 
perr  sum the absolute differences of each byte (pixel error)^{[2]} 
Determining Presence
To determine the presence of MVI, use the amask instruction.
Latency and Slotting
On the inorder PCA56, all MVIs have a latency of 2 cycles. This means, at least one instruction must separate MVIs to prevent stalling. On the outoforder EV6 and newer, MVIs have a latency of 3 cycles and are slotted U0^{[3]}.
Usage
Unsigned Saturated Arithmetic
By using the packed minimum, packed unsigned saturated addition and subtraction can be easily performed.
For instance, to add the packed unsigned bytes stored in $16 with those in $17 with saturation and store the result in $0:
ornot $31,$16,$1 minub8 $17,$1,$17 addq $16,$17,$0
To subtract the packed unsigned bytes stored in $16 with those in $17 with saturation and store the result in $0:
minub8 $17,$16,$17 subq $16,$17,$0
Note, these are not optimized for register usage or latency.
To use this in C, the following functions may be used.
#define __minub8 __builtin_alpha_minub8 #define __minuw4 __builtin_alpha_minuw4 /* Add the 8bit values in M1 to the 8bit values in M2 using unsigned * saturated arithmetic (MMX equivalent: paddusb) */ static inline __m64 addusb8(__m64 m1, __m64 m2) { return m1 + __minub8(m2, ~m1); } /* Add the 16bit values in M1 to the 16bit values in M2 using unsigned * saturating arithmetic (MMX equilvant: paddusw) */ static inline __m64 addusw4(__m64 m1, __m64 m2) { return m1 + __minuw4(m2, ~m1); } /* Subtract the 8bit values in M1 to the 8bit values in M2 using unsigned * saturated arithmetic (MMX equivalent: psubusb) */ static inline __m64 subusb8(__m64 m1, __m64 m2) { return m1  __minub8(m2, m1); } /* Subtract the 16bit values in M1 to the 16bit values in M2 using unsigned * saturating arithmetic (MMX equivalent: psubusw) */ static inline __m64 subusw4(__m64 m1, __m64 m2) { return m1  __minuw4(m2, m1); }
External Links
 MVI Code Examples
 Digital's Motion Video Instruction Extensions for Alpha Whitepaper
 Digital, MIPS Add Multimedia Extensions
 Real Time MPEGI and MPEGII Compression with the Alpha Microprocessors