Re: Questions on performance methods


Kazushige Goto (goto@statabo.rim.or.jp)
Sun, 29 Nov 1998 12:40:47 +0900


From: Jonathan L Dubois <dubois@bec.physics.udel.edu>
Date: Sat, 28 Nov 1998 14:28:39 -0500 (EST)

dubois> The only problem I foresee with doing all the sqrts in r12 at once is that
dubois> I occasionally get some computational savings when (rij <= b) The
dubois> probability of this occurring is usually fairly low but it increases with
dubois> N and is not easily predicted ahead of time. It will also probably
dubois> require some restructuring since q[i] is an array of structures which
dubois> contain information other than the 3vectors of interest here.

I modified your routine. One is slightly optimized, and another is
vectorlized sqrt version.

1. light optimize

The overhead of r12 is pretty heavy.

int find_jastrow(double *result,int j,struct coord *q,double r[3],double b){
  int i;
  double rij;
  double tmp;
  double tmp1, tmp2, tmp3;
  double r1, r2, r3;

  r1 = r[0];
  r2 = r[1];
  r3 = r[2];
  
  tmp = 1.0;
  if(b>0){
    for(i=0;i < N;i++){
      if(i!=j){

        tmp1 = q[i].x[0]-r1;
        tmp1 = tmp1*tmp1;
        
        tmp2 = q[i].x[1]-r2;
        tmp2 = tmp2*tmp2;

        tmp3 = q[i].x[2]-r3;
        tmp3 = tmp3*tmp3;

        rij = sqrt(tmp1 + tmp2 + tmp3);

        if(rij > b){
          tmp *= (1.0 - b/rij);
        }
        else{
          *result = 0;
          return 0;
        }
      }
    }
  }
  *result = tmp;
  return 1;
}

2. using sqrti routine

          tmp *= (1.0 - b/rij);

is pretty slow. You can use sqrti routine instead of sqrt.
This "sqrti" routine(calculates 1.0/sqrt() quickly) will be included
next libffm version.

double sqrti(double);

int find_jastrow(double *result,int j,struct coord *q,double r[3],double b){
  int i;
  double rij;
  double tmp;
  double tmp1, tmp2, tmp3;
  double r1, r2, r3;
  double bi = 1./b;

  r1 = r[0];
  r2 = r[1];
  r3 = r[2];
  
  tmp = 1.0;
  if(b>0){
    for(i=0;i < N;i++){
      if(i!=j){

        tmp1 = q[i].x[0]-r1;
        tmp1 = tmp1*tmp1;
        
        tmp2 = q[i].x[1]-r2;
        tmp2 = tmp2*tmp2;

        tmp3 = q[i].x[2]-r3;
        tmp3 = tmp3*tmp3;

        rij = sqrti(tmp1 + tmp2 + tmp3);

        if(rij < bi){
          tmp *= (1.0 - b * rij);
        }
        else{
          *result = 0;
          return 0;
        }
      }
    }
  }
  *result = tmp;
  return 1;
}

3. using vectorlized sqrtiv routine.

The sqrti routine is still bottle neck. We can modify to use
vectorlized sqrti routine. But it'll be slower if the probability of
"rij < bi" is low. If it's much improved(depend on your source
parameter), this routine can be faster still more.

I attached libffm.a, but it is a beta version.

void dsqrtiv(double *, double *, int);

int find_jastrow(double *result,int j,struct coord *q,double r[3],double b){

  int i;
  double tmp;
  double tmp1, tmp2, tmp3;
  double r1, r2, r3;
  double bi;
  double vtemp[N];

  r1 = r[0];
  r2 = r[1];
  r3 = r[2];
  
  tmp = 1.0;

  if(b>0){
    for(i=0;i < N;i++){

        tmp1 = q[i].x[0]-r1;
        tmp1 = tmp1*tmp1;
        
        tmp2 = q[i].x[1]-r2;
        tmp2 = tmp2*tmp2;

        tmp3 = q[i].x[2]-r3;
        tmp3 = tmp3*tmp3;

        vtemp[i] = tmp1 + tmp2 + tmp3;
      }

    dsqrtiv(vtemp, vtemp, N);

    bi = 1.0/b;

    for(i=0;i < N;i++){

      if(i!=j){
        if(vtemp[i] < bi){
          tmp *= (1.0 - b * vtemp[i]);
        }
        else{
          *result = 0;
          return 0;
        }
      }
    }
  }
  *result = tmp;
  return 1;
}

---
goto@statabo.rim.or.jp




This archive was generated by hypermail 2.0b3 on Sun Nov 29 1998 - 08:32:29 EST