Superscalar gemm-based BLAS


Johannes G.D. Hausmann (joe@neuro.uni-oldenburg.de)
Sun, 29 Nov 1998 12:56:43 +0000


Hi,

There's a package available at netlib called ATLAS which does some
automatic tuning of GEMM routines. It's pretty darn good, even tho not
quite as fast as the assembler one we've gotten from K. Goto (around
90% or so, maybe try it on your non-alpha boxes :-)

Anyway, the Clint Whaley, the guy who coded it, pointed me to the
superscalar gemm-based BLAS by Bo Kagstrom and Per Ling, which is
available at netlib. On my box, the double precision routines
outperform those of M. Dayde and I. Duff by around 10%. The problem is
that you have to do some tuning of block sizes (by hand), and the best
parameters depend on cache size. Haven't tried the single precision
routines yet (which are included in ATLAS but not in
ssgemmbased.tar.gz).

Now, my box is kind of small and oldish (Now, my box is kind of small and oldish (PC164@500MHz, only 1MB 3rd
level cache, FPM-SIMMs...) so I can't tune for Ruffians and the like.

Anyone willing to do some tuning?

Jo

-- 
-----------------------------------------
Johannes Hausmann
Carl von Ossietzky Universitaet Oldenburg
FB Physik -- AG Komplexe Systeme

EMail : joe@neuro.uni-oldenburg.de -----------------------------------------



This archive was generated by hypermail 2.0b3 on Sun Nov 29 1998 - 08:32:29 EST