Difference between revisions of "Floating-Point Instruction Extension"

From AlphaLinux
Jump to: navigation, search
(Imported from http://web.archive.org/web/20100713091534/http://www.alphalinux.org/wiki/index.php?title=Floating-Point_Instruction_Extension&action=edit)
 
(No difference)

Latest revision as of 18:22, 29 August 2019

Initially implemented in the EV6 processor, the Floating-Point Instruction Extension adds nine instructions to calculate the square root and to transfer data directly between floating-point and integer registers.

On previous generations, such as the EV5, in order to copy data from a floating-point register to an integer register or vice versa, a temporary memory location must be used. Since memory is orders of magnitude slower than registers, this made transferring data between the register classes very slow and inefficient. Square root calculations must be done (slowly) in software without this extension, making programming more difficult.

Added Instructions

Mnemonic Description Floating-Point Format
itofs Copy Low 32 bits from Integer Register to Floating-Point Register using S_floating Format IEEE
itoft Copy 64 bits from Integer Register to Floating-Point Register IEEE
itoff Copy Low 32 bits from Integer Register to Floating-Point Register using F_floating Format VAX
ftois Copy Floating-Point Register using S_floating Format to Integer Register IEEE
ftoit Copy 64 bits from Floating-Point Register to Integer Register IEEE
sqrts Calculate square root of S_floating Formatted Value IEEE
sqrtt Calculate square root of T_floating Formatted Value IEEE
sqrtf Calculate square root of F_floating Formatted Value VAX
sqrtg Calculate square root of G_floating Formatted Value VAX

Determining Presence

To determine the presence of FIX, use the amask instruction.

Latency and Slotting

Instruction Precision Latency Slotting
itof   4 L0, L1
ftoi   3 FST0, FST1, L0, L1
fsqrt Single 18 FA
  Double 33 FA[1].

The latency of fsqrt instructions also depends on the slotting of the instruction consuming the result. If the consuming instruction is also slotted FA, then the latency is reduced by 3. That is, the latency of fsqrtt in the following example is 30 since faddt is slotted FA.

fsqrtt $f0,$f1     ; $f1 = sqrt($f0)
faddt  $f1,$f2,$f3 ; $f3 = $f1 + $f2

Note that fsqrt instructions are not pipelined.

Usage

Data Movement

In order to copy data from an integer register to a floating-point register on Alphas prior to the EV6, a sequence similar to the following must be used.

stq Rs,temp
ldt Fd,temp

where Rs is the integer register source, temp is the temporary memory location, and Fd is the floating-point register source.

With FIX, this code is replaced by

itoft Rs,Fd

Square Root

To calculate the square root on Alphas prior to the EV6, a routine must be written in software.

Using FIX, the square root can be calculated with

sqrtt Fs,Fd

where Fs is the floating-point register source and Fd is the floating-point register destination.

References

  1. Template:Cite web