Difference between revisions of "Floating-Point Instruction Extension"
(Imported from http://web.archive.org/web/20100713091534/http://www.alphalinux.org/wiki/index.php?title=Floating-Point_Instruction_Extension&action=edit) |
(No difference)
|
Latest revision as of 18:22, 29 August 2019
Initially implemented in the EV6 processor, the Floating-Point Instruction Extension adds nine instructions to calculate the square root and to transfer data directly between floating-point and integer registers.
On previous generations, such as the EV5, in order to copy data from a floating-point register to an integer register or vice versa, a temporary memory location must be used. Since memory is orders of magnitude slower than registers, this made transferring data between the register classes very slow and inefficient. Square root calculations must be done (slowly) in software without this extension, making programming more difficult.
Contents
Added Instructions
Mnemonic | Description | Floating-Point Format |
---|---|---|
itofs | Copy Low 32 bits from Integer Register to Floating-Point Register using S_floating Format | IEEE |
itoft | Copy 64 bits from Integer Register to Floating-Point Register | IEEE |
itoff | Copy Low 32 bits from Integer Register to Floating-Point Register using F_floating Format | VAX |
ftois | Copy Floating-Point Register using S_floating Format to Integer Register | IEEE |
ftoit | Copy 64 bits from Floating-Point Register to Integer Register | IEEE |
sqrts | Calculate square root of S_floating Formatted Value | IEEE |
sqrtt | Calculate square root of T_floating Formatted Value | IEEE |
sqrtf | Calculate square root of F_floating Formatted Value | VAX |
sqrtg | Calculate square root of G_floating Formatted Value | VAX |
Determining Presence
To determine the presence of FIX, use the amask instruction.
Latency and Slotting
Instruction | Precision | Latency | Slotting |
---|---|---|---|
itof | 4 | L0, L1 | |
ftoi | 3 | FST0, FST1, L0, L1 | |
fsqrt | Single | 18 | FA |
Double | 33 | FA[1]. |
The latency of fsqrt instructions also depends on the slotting of the instruction consuming the result. If the consuming instruction is also slotted FA, then the latency is reduced by 3. That is, the latency of fsqrtt in the following example is 30 since faddt is slotted FA.
fsqrtt $f0,$f1 ; $f1 = sqrt($f0) faddt $f1,$f2,$f3 ; $f3 = $f1 + $f2
Note that fsqrt instructions are not pipelined.
Usage
Data Movement
In order to copy data from an integer register to a floating-point register on Alphas prior to the EV6, a sequence similar to the following must be used.
stq Rs,temp ldt Fd,temp
where Rs is the integer register source, temp is the temporary memory location, and Fd is the floating-point register source.
With FIX, this code is replaced by
itoft Rs,Fd
Square Root
To calculate the square root on Alphas prior to the EV6, a routine must be written in software.
Using FIX, the square root can be calculated with
sqrtt Fs,Fd
where Fs is the floating-point register source and Fd is the floating-point register destination.