Re: request for perf improvement suggestions


Adam C. Powell, IV (adam.powell@nist.gov)
Wed, 25 Nov 1998 16:44:07 -0500


Jonathan L Dubois wrote:

> Hello everyone,
>
> I have some code I would like to optomize and I could use some suggestions
> and pointers for where to get started.
>
> The function which is called most often and the most likely candidate
> for improvement (I think) calculates the distance between two vectors.

Under some conditions you can go a long way by using K. Goto's optimized BLAS
routines (gemm, copy, dot, axpy). If you have two separate sets of n vectors
and want to calculate the n distances between corresponding vectors of each
set, you can store them as two arrays of coordinates, copy the first array,
use axpy to subtract the second array, and then use dot for each vector (dot
itself) to get the sum of squares. You might get some speedup too by saving
all of the square roots for last, and then doing them all at once, so
whatever tables are used by sqrt sit in cache the whole time.

On the other hand, if you have a collection of n vectors and want to get all
n(n+1)/2 distances between them, you'll have to be more creative. For
example, to get all of the distances from the first vector to the others
(assuming they're all stored as one big array of coordinates), you could
create an array of n-1 ones, use gemm to "multiply" the 3x1 vector by this
1x(n-1) array of ones to give a 3x(n-1) matrix with each column being the
first vector, then use axpy to subtract all of the other vectors from this,
and dot each column with itself to get sums of squares. Now repeat for the
second vector, and all the others. Finally, square root everything.

On the other hand, if you have n vectors and want some arbitrary subset of
the n(n+1)/2 distanes, well, this may not help much.

This approach would vectorize the subtractions very nicely, but the dot
products might be slowed down by the overhead of the function calls to ddot_
as much as they are sped up by the assembly acceleration- maybe there's a way
to use another BLAS function call to do this... (If gemm could be convinced
to only calculate the diagonals of the product, you could multiply the
transpose of the difference matrix by its non-transpose to give a vector of
the dot products; but gemm doesn't do this.) Oh- you'll also have to
substantially change the program architecture. (Fun! :-)

Hope this helps (and isn't too obscure).

Zeen,

-Adam `Cold Fusion' Powell, IV http://www.ctcms.nist.gov/~powell/ ____
USDoC, National Institute of Standards & Technology (NIST) |\ ||< |
Center for Theoretical and Computational Materials Science | \||_> |



This archive was generated by hypermail 2.0b3 on Wed Nov 25 1998 - 06:26:29 EST