Page 2 of 8 FirstFirst 1234 ... LastLast
Results 31 to 60 of 226

Thread: More precise orbital simulation program

  1. #31
    Looking at your code, this is already implemented in an unreleased beta version Gravity Simulator. But in the released version, I take it a step farther. There's a plot / don't plot button. When not plotting, it only crunches numbers and is 10-20 times faster when simulating a few objects. But when the number of objects grows, the computation time rises quadratically, while the plotting time rises linearly. So the speed improvement asymptotically approaches 0% as the number of objects rise. I believe at about 50 objects, the improvement is negligable.

  2. #32
    If your program keeps track of the total time that has elapsed since the simulation started so that it can be displayed and so forth, that will have to be multiplied by 'qw' as well, so you actually might want to put the line " qw=10; " in the very beginning of your program.

  3. #33
    Quote Originally Posted by tony873004 View Post
    Looking at your code, this is already implemented in an unreleased beta version Gravity Simulator. But in the released version, I take it a step farther. There's a plot / don't plot button. When not plotting, it only crunches numbers and is 10-20 times faster when simulating a few objects. But when the number of objects grows, the computation time rises quadratically, while the plotting time rises linearly. So the speed improvement asymptotically approaches 0% as the number of objects rise. I believe at about 50 objects, the improvement is negligable.
    Oh, well then, back to the drawing boards.

  4. #34
    Quote Originally Posted by grav View Post
    Oh, well then, back to the drawing boards.
    It was still a great idea. I've tweaked the code many times trying to squeeze as much speed out of it as possible. That's why its nice to still find some room for improvement like your d/d/d idea.

  5. #35
    Join Date
    Sep 2005
    Posts
    8,792
    Tony,

    Just some musings: what I'd like to do is get the whole thing off into C code (or Fortran -- I've got that too) and let Intel's compiler have at the whole thing.

    Off the top of my head, what I would do is just a launch a thread to crank the integrations, letting it run wide open. It would fill a little array (of a certain step size) with the position info, and that array would be like a circular buffer. The plotting program would be in a separate thread and would just peek at the results and plot them at its own pace.

    If the compiler could see the whole picture, especially for a large number of objects, it could vectorize better (SIMD instructions). Now the problem is, for SSE only, SIMD is single precision only. It requires SSE2 to do vector double precision operations.

    Get me the whole loop off in C, and I'll bet Intel's compiler could go to down using SSE2 instructions.

    -Richard

  6. #36
    Join Date
    Sep 2005
    Posts
    8,792
    Quote Originally Posted by Intel Compiler

    CodeOptimizer.cpp
    icl: Command line warning: overriding '/Qvc8' with '/Qvc7.1'
    .\CodeOptimizer.cpp(29) : (col. 2) remark: loop was not vectorized: not inner loop.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: loop was not vectorized: existence of vector dependence.
    .\CodeOptimizer.cpp(64) : (col. 2) remark: loop was not vectorized: vectorization possible but seems inefficient.
    I'll take it's word for the latter, but it sees a data dependency that is preventing it from vectoring the inner loop. Can that be fixed? I'll knock around.......

    -Richard

  7. #37
    Join Date
    Sep 2005
    Posts
    8,792
    And what data dependencies did Mr. Compiler think:

    Code:
    .\CodeOptimizer.cpp(29) : (col. 2) remark: loop was not vectorized: not inner loop.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed FLOW dependence between objvz line 58 and objvy line 57.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed FLOW dependence between objvz line 58 and objvx line 56.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed FLOW dependence between objvz line 58 and ObjMass line 50.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed FLOW dependence between objvz line 58 and objvy line 47.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed FLOW dependence between objvz line 58 and objvx line 46.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed FLOW dependence between objvz line 58 and objsize line 36.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed FLOW dependence between objvz line 58 and objsize line 36.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed FLOW dependence between objvz line 58 and objz line 34.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed FLOW dependence between objvz line 58 and objz line 34.
    ..............
    40KB of similiar
    ............
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed OUTPUT dependence between CollisionObjectA line 38 and objvz line 48.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed OUTPUT dependence between CollisionObjectA line 38 and objvy line 47.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed OUTPUT dependence between CollisionObjectA line 38 and objvx line 46.
    .\CodeOptimizer.cpp(31) : (col. 3) remark: vector dependence: assumed ANTI dependence between Collisions line 38 and Collisions line 37.
    Yikes! But not these are *assumed*, not *proven* -- may be just the pointers as the compiler does not know from where the data is coming......

    I'm but a amatuer vectorizer/parallelizer.........

    ETA: I'm afraid the thing will need a rewrite to vectorize properly -- the data dependence is tricky. I'll mess around a bit after I think about it. Trouble here is one needs to think in parallel, not sequentially; ie doing several things at once.

    -Richard
    Last edited by publius; 2007-Apr-19 at 04:12 AM.

  8. #38
    Just to test the speed difference between the 2 compilers, you could always just hard-code the starting variables, creating a stand-alone program:

    numobjects=2;
    CollisionObjectA=0;
    CollisionObjectB=0;
    objx[1]=1;
    objy[1]=1;
    objz[1]=1;
    objvx[1]=1;
    objvy[1]=1;
    objvz[1]=1;
    objmass[1]=332000;
    objx[2]=150000000000;
    objy[2]=0;
    objz[2]=0;
    objvx[2]=0;
    objvy[2]=30000;
    objvz[2]=0;
    objmass[2]=1;

    This approximately gives you the Earth around the Sun.

    Then just add a counter and a way for you to break out of the loop.

  9. #39
    Join Date
    Sep 2005
    Posts
    8,792
    Tony,

    Thanks. First though, I like to attack trying to re-write to get it to vectorize -- ie be able to use the SIMD, single instruction multiple data instructions to try to do some of the work in parallel.

    Thinking about it, the larger time loop can NOT parallel, because the results of each iteration depend on the previous one -- you just can't split that up. However, for a large number of objects, we ought to be able to get some of that vectorized.

    What we want to do is arrange it so each iteration doesn't depend on the results of the previous one. The way it's written now is what is killing that.

    -Richard

  10. #40
    Join Date
    Sep 2005
    Posts
    8,792
    Tony,

    Oh, BTW, I got that .dll compiled with Intel's compiler (C/C++ v9.1) and working. I compiled several versions using different instruction set possibilities. Anyway, you might like to try out the one with SSE2 instructions. PM me and I'll e-mail you a copy. It's only 48KB in size.

    It requires a CPU with SSE2 capability and will NOT gracefully fail if the CPU doesn't have it, but will fault out.

    Note this is not vectorized, just doing what it could with the code as is using the SSE2 instructions.

    I haven't had much time to really play with it, but it will take some modifying to allow vectorization, and, not being any expert on this, it's gonna take some thought. How can we modify that loop to allow parallel execution of different iterations.......... If succesful, the results will blow you away.

    -Richard

  11. #41
    Go for parallel computation of different bodies within the same iteration, that way you won't have data dependencies across the parallel processes.
    Granted, you may actually see degradation of speed for a two body simulation, but for more it should shine.
    __________________________________________________
    Reductionist and proud of it.

    Being ignorant is not so much a shame, as being unwilling to learn. Benjamin Franklin
    Chase after the truth like all hell and you'll free yourself, even though you never touch its coat tails. Clarence Darrow
    A person who won't read has no advantage over one who can't read. Mark Twain

  12. #42
    Join Date
    Mar 2007
    Posts
    393
    Hi,
    Sorry for getting into the thread so late in discussions and asking dumb questions but I was wondering about why the approach wasn't to do the main loop moving the objects along their ellipse and adjust the ellipse periodically for changes in the orbital parameters. Obviously a much more complex calculation can be done periodically than one occurring every iteration when it comes to time consumption.

  13. #43
    Quote Originally Posted by HenrikOlsen View Post
    Go for parallel computation of different bodies within the same iteration, that way you won't have data dependencies across the parallel processes.
    Granted, you may actually see degradation of speed for a two body simulation, but for more it should shine.
    I don't know how to do that. Does this require a multiple processors?

  14. #44
    If you can formulate it in an easily parallelizable way, the compiler can probably do a lot more vectorization.
    __________________________________________________
    Reductionist and proud of it.

    Being ignorant is not so much a shame, as being unwilling to learn. Benjamin Franklin
    Chase after the truth like all hell and you'll free yourself, even though you never touch its coat tails. Clarence Darrow
    A person who won't read has no advantage over one who can't read. Mark Twain

  15. #45
    Quote Originally Posted by cbacba View Post
    Hi,
    Sorry for getting into the thread so late in discussions and asking dumb questions but I was wondering about why the approach wasn't to do the main loop moving the objects along their ellipse and adjust the ellipse periodically for changes in the orbital parameters. Obviously a much more complex calculation can be done periodically than one occurring every iteration when it comes to time consumption.
    That's the way most planetarium software operates. In that case, they're trying to show you the solar system at some date in the future or the past. So they compute the positions every time they want to plot a point using an analytic method like you describe. I belive most use VSOP87.

    But this limits you to simulating that which you already know.

    The difference between that and an n-body program is that in n-body you do not have to tell it how orbital elements change with time. This leaves you free to discover new properties about an orbit. For example, when the asteroid Cruithne was discovered, simply propogating its orbit forward along its predicted ellipse would not have revealed its horseshoe behavior. You would need to know of this behavior in advance, and work that into your code. In n-body I do not need to tell it of this behavior. It is just a natural consequence of following Newton's laws of motion and gravity.

    You're right that the coding is much more complex. But that's not the reason I avoid it. My simulator is designed to test new stuff.

    But for planetarium programs, the method you describe is ideal. Someone wants to know where Saturn will be tonight. They would need a lot of patience to use my program and propogate it forward from a date of known positions, to the present date.

  16. #46
    Quote Originally Posted by HenrikOlsen View Post
    If you can formulate it in an easily parallelizable way, the compiler can probably do a lot more vectorization.
    I'm not familiar with the term. Do you have a link that describes what easily parrallelizable code is?

  17. #47
    Join Date
    Sep 2005
    Posts
    8,792
    Tony,

    That is well beyond the scope of what I can tell you, but search on vectorization/OpenMP and all that, and you can go to town. But here's what it is in a nutshell. The following is a parallelizable loop:

    for(i = 0; i < N; i++)
    {

    a[i] = b[i]*c[i]
    };

    The compiler can, using the vector instructions (single instruction, mulitple data, basically a "vector" arithmetic operation operating on more than one set of operands at the same time), do different iterations at the same time.

    This, however, is not vectorizable:

    for(i = 1; i < N; i++
    {
    a[i] = a[i -1] * b[i]
    };

    Each iteration depends on results of previous iterations. Trying to do that in parallel would result in the wrong values being used. The a[2] calculation depends on the previous a[1] iteration.

    That is what is killing the vectorization in your loop here -- there is a *LOT* of this dependency. And there doesn't have to be. The accelerations of all N bodies is know from the position data at any time step. The ith body's acceleartion does not depend on the current result of the jth body. We should be able to split up the acceleration calculations. However, the way it is written (which is very nice and tight for sequential operations), it just can't be done. We need to change the way we do it, to get right of the cross-iteration dependency.

    -Richard

  18. #48
    Code:
     
    for (int k=1;k<=NumObjects;k++) {   
      tM = ObjMass[k] * 398600440000000;   
      for (int j=k+1;j<=NumObjects;j++) {   
        dx = objx[j] - objx[k];   
        dy = objy[j] - objy[k];   
        dz = objz[j] - objz[k];   
        D = sqrt(dx*dx+dy*dy+dz*dz);   
        if  (((objsize[k] + objsize[j])/2) > D) {   
    	Collisions = Collisions + 1;   
    	CollisionObjectA[Collisions] = j;   
    	CollisionObjectB[Collisions] = k;   
        }   
        if (tM > 0) { 
           f = tM/D/D/D;  
           objvx[j] = objvx[j] - dx*f;   
           objvy[j] = objvy[j] - dy*f;   
           objvz[j] = objvz[j] - dz*f;   
        }   
        tM2 = ObjMass[j] * 398600440000000;   
        if (tM2 > 0) {   
           f = tM2/D/D/D;
           objvx[k] = objvx[k] + dx*f;   
           objvy[k] = objvy[k] + dy*f;   
           objvz[k] = objvz[k] + dz*f;   
        }   
    	
      }   
     }
    So if I understand it correctly, if the code is parallelizable I should be able to go through the loop in any order I want ? In the above block of code, I believe this is possible, as all it is doing is updating velocities. The positions are not altered in this embedded loop.

    So for example, consider a simulation with 3 objects, [1,2,3]. The compiler is free to do them in any order, as long as it does all the pairings. Doing it in this sequential order:
    1,2
    2,1
    1,3
    2,3
    3,2

    should yield the same results as this random order:
    2,1
    3,2
    1,2
    1,3
    2,3

    To compute the new velocities, it is only summing up all the accelerations, and when you sum, the order is unimportant. Position is unaltered in this embedded loop, so it seems to me that it should be parallelizable.

    But the next loop must be done seperately. Again, the order it does this loop is not important as all it is doing is adding the new velocities to the old positions.
    Code:
     
    for (int h=0;h<=NumObjects;h++) {   
      objx[h] = objx[h] + objvx[h];   
      objy[h] = objy[h] + objvy[h];   
      objz[h] = objz[h] + objvz[h];   
     }
    So would this make the code 2 parallelizable loops, the first one being the embedded kj loop and the second being the single h loop?

    Are you suggesting (either of you) that if I can code this without the position-updating loop being external to the velocity-updating loop that the entire thing would be parallelizable, and thus faster?

  19. #49
    Join Date
    Mar 2007
    Posts
    393
    Quote Originally Posted by tony873004 View Post
    That's the way most planetarium software operates. In that case, they're trying to show you the solar system at some date in the future or the past. So they compute the positions every time they want to plot a point using an analytic method like you describe. I belive most use VSOP87.

    But this limits you to simulating that which you already know.

    The difference between that and an n-body program is that in n-body you do not have to tell it how orbital elements change with time. This leaves you free to discover new properties about an orbit. For example, when the asteroid Cruithne was discovered, simply propogating its orbit forward along its predicted ellipse would not have revealed its horseshoe behavior. You would need to know of this behavior in advance, and work that into your code. In n-body I do not need to tell it of this behavior. It is just a natural consequence of following Newton's laws of motion and gravity.

    You're right that the coding is much more complex. But that's not the reason I avoid it. My simulator is designed to test new stuff.

    But for planetarium programs, the method you describe is ideal. Someone wants to know where Saturn will be tonight. They would need a lot of patience to use my program and propogate it forward from a date of known positions, to the present date.
    I tried an orbit program once - but that was a long time ago - done in microsoft 8k basic (or imsai 8k basic) on an imsai computer. It kept blowing up on me due to errors accumulating - not that double precision was even available. The purpose of the ellipse in my suggestion was a stablizing component which could then be modified based on other influences. While the center of gravity shouldn't change, the non symmetrical distribution of mass means one cannot simply place all mass at the center and go - like is possible with a spherical uniform body. However, by periodically modifying the ellipse parms based upon the effects of the external bodies, it would seem one could reduce the amount of processing and the apparently critical nature of the time interval.

    Anyway, it was a thought.

  20. #50
    Okay, I'm starting to get really frustrated here. I must have looked through at least 50 websites trying to determine the process for the RK4 method. I've found only a couple with examples, like this one, and have been able to work through it enough that I can even tell that K3 for their third example is incorrect (should be f(1.5, 1.6875)=-1.53125 for y(1.5), although their final result is right), but can't seem to figure it out for something as simple as y(t)+a(t)*t. I've even found sample programs, like here, but I'll be danged if I can tell what they are doing within each loop. Somebody please help.

  21. #51
    Join Date
    Sep 2005
    Posts
    8,792
    Tony,

    The main problem we have with parallelizing that loop (based on the compiler's detailed complaints) is the following thing:

    for( stuff)
    {
    x = x + a[i]

    }

    Note that can't parallel because the previous result of x has to be used.

    As I'm reading it (and I could be wrong, but that is what the compiler said), each element of our objv[] arrays gets several pieces added to it on different iterations of the loop.

    A given objv element plays the role of 'x' above, although it happens over both inner and outer loop.

    And the last loop of updating the position -- the compiler said that was vectorizable but "seems inefficient". IOW, it thought any speed gains would not be worth the effort to set up the vector operation. It could be wrong, but I assume it knows what it's talking about.

    -Richard

    -Richard

  22. #52
    Join Date
    Sep 2005
    Posts
    8,792
    Tony,

    Vector/paralleling stuff can be a bear to understand, especially if it's your first exposure to it. Me trying to explain it is sort of the blind leading the blind.

    The position loop at the last doesn't have anything to do with the above loop. That's fine as is, and the compiler didn't even think it was worth vectorizing.

    It's the data dependence of the second loop -- the way pieces in different iterations are added to each velocity element. As Henrik was noting, we might have to code in a way that would be inefficient for sequential (one iteration at the time) execution, but would turn out to be much faster for a large N.

    Which reminds me of our e-mail exchange. The SSE2 version of that .dll was nearly twice as fast on my Athlon64 box with your "moonbuilder" example. It may well be slower itself for a small N. I'll look at that tonight after supper. But it may be the overhead for setting up the SSE2 operations was more than it was worth compared to the reqular IEEE x87 FPU instructions for small N.

    But it absolutely blew away the old .dll for the moonbuilder.

    -Richard

  23. #53
    Quote Originally Posted by tony873004
    So if I understand it correctly, if the code is parallelizable I should be able to go through the loop in any order I want ?
    I'm not sure if it applies directly to what you are discussing or not, but I've noticed that the orbit can be found by using only the original distances of the bodies from each other and original relative velocities, and then both the velocities and coordinates can be figured through a separate loop after the main one, like the coordinates are being done now. It amounts to the same thing in the long run, and takes the same amount of time to perform to the same accuracy, slightly longer even, but I thought it was pretty interesting that one could do this systematically using only the original distances, kind of like some sort of fundamental element is at work here or something.

    It works like this. The way the program is now, after one loop, the velocity (for x) is found to be vx'=vx-dx/d^3*G*M*ti and the new coordinate is x'=x+vx'*ti=x+vx*ti-dx/d^3*G*M*ti^2. Continuing this, the loops progress like this.

    vx''=vx'-dx'/d'^3*G*M*ti
    =vx-dx/d^3*G*M*ti-dx'/d'^3*G*M*ti
    =vx-[dx/d^3+dx'/d'^3]*G*M*ti
    x''=x'+vx''*ti
    =x+2*vx*ti-2*dx/d^3*G*M*ti-dx'/d'^3*G*M*ti^2
    =x+2*vx*ti-[2*dx/d^3+dx'/d'^3]*G*M*ti^2

    the next resulting in
    vx'''=vx-[dx/d^3+dx'/d'^3+dx''/d''^3]*G*M*ti
    x'''=x+[3*dx/d^3+2*dx'/d'^3+dx/d^3]*G*M*ti^2

    and so forth. So we could actually find the resulting vx, vy, x, and y just by running through any number of loops for the relative distances and adding the terms for dx/d^3 together to apply toward vx and vy and adding those to themselves again within each loop as well to apply toward x and y. So all we have to do now is to figure out how to get dx', dy', and d' successively while running through each loop without any reference to vx', vy', x', and y'. As it turns out, this is rather simple, and follows a similar pattern. With u=vx(1)-vx(2) and v=vy(1)-vy(2) with the original relative velocities and MM=M(1)+M(2) for the sum of the masses of the bodies, for the first loop, we have

    dx=x(1)-x(2)
    dy=y(1)-y(2)
    d=sqrt(dx*dx+dy*dy)

    For the second (without going through the details here), one finds that

    dx'=dx+u*ti-G*MM*dx/d^3*ti^2
    dy'=dy+v*ti-G*MM*dy/d^3*ti^2
    d'=sqrt(dx'*dx'+dy'*dy')

    The third run gives

    dx''=dx+2*u*ti-2*G*MM*dx/d^3*ti^2-G*MM*dx'/d'^3*ti^2
    dy''=dy+2*v*ti-2*G*MM*dy/d^3*ti^2-G*MM*dy'/d'^3*ti^2
    d''=sqrt(dx''*dx''+dy''*dy'')

    After this, a regular pattern forms, from which we can form into the main loop, making dx(1)=dx, dy(1)=dy, d(1)=d, dx(2)=dx', etc., and the loop would then be, for say 20 quick runs

    for z=4 to 20
    dx(z)=dx(z-1)+dx(z-2)-dx(z-3)-G*MM*dx(z-1)/[d(z-1)]^3*ti^2-G*MM*dx(z-2)/[d(x-2)]^3*ti^2
    dy(z)=dy(z-1)+dy(z-2)-dy(z-3)-G*MM*dy(z-1)/[d(z-1)]^3*ti^2-G*MM*dy(z-2)/[d(z-2)]^3*ti^2
    d(z)=sqrt(dx(z)*dx(z)+dy(z)*dy(z))
    next z

    so this is actually all that is necessary within the main loop to find the distances between any number of bodies from each other, but we would still have to add in a few lines that add these distances together as dx/d^3 and dy/d^3 and then add those to themselves so that we can find the velocities and coordinates with a separate loop. We could run this main loop as far as we want around the orbit and then if it is wished, at some point we could stop it and plot the coordinates with a separate loop, and we really never even need to know the currect velocities at all, but we can find them with a separate loop if we wanted to as well. Pretty neat, huh? All we really need is the short loop above to run this, and a couple of extra lines to add them so that we can find the coordinates at some point. This is fascinating to me, anyway, that it can even be done in this way. Even if we don't add any more lines to the loop, we can still find the distances bodies are from each other as they orbit over any length of time to exactly the same degree of accuracy without ever referring to the current velocities or coordinates again.

  24. #54
    Join Date
    Sep 2005
    Posts
    8,792
    Okay. Here's the crazy way I think about this. We've got N objects, and we want to calculate the acceleration on each. That depends on the positions of the other N-1 objects. So think of an NxN matrix, F, where F_ij is the force on the ith object due to jth object, but we have no self-force, so the diagonal of that matrix is not involved, no F_ii.

    Now, because of symmetry, F_ij = -F_ji, so that means we only have to calculate (less than) half of that matrix.

    That's how the code does it now, it only works on half of the F_ij, but does in that tight, neat loop that doesn't parallel too well as is (at least as I can see it).

    Anyway, the acceleation of a given object, F_k is found by summing the rows (or colums) of the F_ij matrix (except the diagonal terms). IOW, we sort of contract a matrix down to a vector by summing the rows. That operation will vectorize nicely.

    So, at first blush, I propose to calculate the F_ij matrix (or one half of it) in one loop, then do the summing in a second loop, the second loop vectorizing cleanly. Oh, and note each F_ij is exactly a 3 component vector itself.

    Now, can we calculate the F_ij in a vectorizable way. Well, it only depends on the position vector arrays of the first, ie F_ij ~ 1/(d_ij)^2.

    What are the number of elements in the F_ij matrix that we have to calculate? That is one half of the matrix less the diagonal, ie (N^2 - N)/2, or
    N/2 *(N - 1). Let's see for a 3 x 3, that would say 3/2 *2 = 3, which is correct. That is, if we had 3 objects, we just need to know
    F_12, F_13, and F_23.

    So I propose we try to calculate F_ij with a single loop, not a nested one of the form,

    for(i = 1; i <= N/2*(N - 1); i++)
    { whatever};

    That should be able to vectorize cleanly as well. All we have to do now is figure out the {whatever} in terms of i, to to get the 2D F_ij.

    Once we have that, we then sum the rows as mentioned above to get the total F_i on each, and then do the velocity and position changes, which will vectorize cleanly.

    -Richard

  25. #55
    Join Date
    Sep 2005
    Posts
    8,792
    I've been speaking in a very rough way. To make things more clear, what we're really after is an acceleration matrix, A_ij, not really a F_ij. There's no sense in mulitplying by a mass only to divide by it. And that will mean the A_ij is different from -A_ji by the mass factor.

    IOW, A_ij ~ m_j/d_ij^2 while A_ji ~ -m_i/d_ij^2

    So what I'm proposing to do (to see how it vectorizes) is build the A_ij matrix via a single index loop. We'll calculate the d_ij term, then fill in A_ij and A_ji accordingly, using m_i and m_j and the minus sign.

    Once we have the A_ij, we do the "contraction" operation, the row (or column) summing to get the A_k, which then apply to update the velocity and position arrays.

    Note what this approach is doing. We are going to need an array for our matrix. The original code does the A_ij and A_ji each iteration but does not save them. It just adds them right the velocity arrays, one at the time.

    What the above will do is save them to an array, then sum them later. We are going to use more memory and hope that allows the vectorizer to get more speed.

    As was noted above, all that overhead will make it slower for low N, but I think it will pay off well for large N.

    -Richard

  26. #56
    Quote Originally Posted by publius View Post
    I've been speaking in a very rough way. To make things more clear, what we're really after is an acceleration matrix, A_ij, not really a F_ij. There's no sense in mulitplying by a mass only to divide by it. And that will mean the A_ij is different from -A_ji by the mass factor.

    IOW, A_ij ~ m_j/d_ij^2 while A_ji ~ -m_i/d_ij^2

    So what I'm proposing to do (to see how it vectorizes) is build the A_ij matrix via a single index loop. We'll calculate the d_ij term, then fill in A_ij and A_ji accordingly, using m_i and m_j and the minus sign.

    Once we have the A_ij, we do the "contraction" operation, the row (or column) summing to get the A_k, which then apply to update the velocity and position arrays.

    Note what this approach is doing. We are going to need an array for our matrix. The original code does the A_ij and A_ji each iteration but does not save them. It just adds them right the velocity arrays, one at the time.

    What the above will do is save them to an array, then sum them later. We are going to use more memory and hope that allows the vectorizer to get more speed.

    As was noted above, all that overhead will make it slower for low N, but I think it will pay off well for large N.

    -Richard
    That sounds similar to what I wrote in my last post, which shows how to find only for dx/d^3 and dy/d^3 in the main loop, then find vx, vy, x, and y later in a separate one. The problem is, though, that in order to add them in later, one would still have to add

    fj=-G*M(j)*ti
    ax(i)=ax(i)+dx/d^3*fj
    ay(i)=ay(i)+dy/d^3*fj
    bx(i)=bx(i)+ax(i)
    by(i)=by(i)+ay(i)

    to the main loop, which are basically just placeholders, then, and the same for (j), which takes just as many lines as the original, maybe more. What this amounts to is ax(i)=fj*[dx/d^3 + dx'/d'^3 + dx''/d''^3 + ...] and bx(i)=fj*ti*[z*dx/d^3 + (z-1)*dx'/d'^3 + (z-2)*dx''/d''^3 + ...] , where z is the number of times the loop is run, which can then be added in a separate loop to find the velocities and coordinates with

    for i=1 to n
    x(i)=x(i)+bx(i)*ti+z*vx(i)*ti
    y(i)=y(i)+by(i)*ti+z*vy(i)*ti
    vx(i)=vx(i)+ax(i)
    vy(i)=vy(i)+ay(i)
    next n

    The interesting thing is that we can actually keep running through the main loop as far as we want and then break from it at some point to find the velocities and coordinates with the second loop. I don't think it would save any time, though, because it still runs through the same number of elements overall. Running the loop from i=1 to n/2*(n-1) instead of two smaller ones won't help much either, since it would require the same number of runs through the main loop with the same lines, but we would then need additional lines to find i and j. Now, if it could be found in the main loop for just i=1 to n, that would be something indeed, but I don't think there's any way around it.
    Last edited by grav; 2007-Apr-21 at 06:24 AM.

  27. #57
    Quote Originally Posted by publius
    To make things more clear, what we're really after is an acceleration matrix, A_ij, not really a F_ij. There's no sense in mulitplying by a mass only to divide by it. And that will mean the A_ij is different from -A_ji by the mass factor.
    Actually, that's not a bad idea, but in reverse. We're using the A_ij now, but if we used F_ij, then we could eliminate a few more lines in the main loop and divide out the masses in the second loop, which only runs through n times instead of n/2*(n-1), saving a lot of time, especially for multiple body systems. So here's the program for that.

    Code:
     
    for (int k=1;k<=NumObjects;k++) {   
      for (int j=k+1;j<=NumObjects;j++) {   
        tM = ObjMass[j] * ObjMass[k] * 398600440000000;
        dx = objx[j] - objx[k];   
        dy = objy[j] - objy[k];   
        dz = objz[j] - objz[k];   
        D = sqrt(dx*dx+dy*dy+dz*dz);   
        if  (((objsize[k] + objsize[j])/2) > D) {   
    	Collisions = Collisions + 1;   
    	CollisionObjectA[Collisions] = j;   
    	CollisionObjectB[Collisions] = k;   
        }   
        if (tM > 0) { 
           f = tM/D/D/D;  
           objvx[j] = objvx[j] - dx*f;   
           objvy[j] = objvy[j] - dy*f;   
           objvz[j] = objvz[j] - dz*f;   
           objvx[k] = objvx[k] + dx*f;   
           objvy[k] = objvy[k] + dy*f;   
           objvz[k] = objvz[k] + dz*f;   
        }   
    	
      }   
     }
    The initial values for the velocities of the bodies at the start of the program would have to be multiplied by the body's mass, though, and the mass of the body must be divided back out in the second loop when finding the coordinates. The second loop, then, would look like this. (And it won't matter if the object mass includes other factors in with it, such as G and ti, since those same factors will be divided back out in the second loop.)

    Code:
    for (int h=0;h<=NumObjects;h++) {   
      objx[h] = objx[h] + objvx[h]/objmass[h];   
      objy[h] = objy[h] + objvy[h]/objmass[h];   
      objz[h] = objz[h] + objvz[h]/objmass[h];   
     }
    Last edited by grav; 2007-Apr-21 at 07:42 AM.

  28. #58
    Better yet, the first loop can be shortened even further to just

    Code:
    for (int k=1;k<=NumObjects;k++) {   
      for (int j=k+1;j<=NumObjects;j++) {   
        dx = objx[j] - objx[k];   
        dy = objy[j] - objy[k];   
        dz = objz[j] - objz[k];   
        D = sqrt(dx*dx+dy*dy+dz*dz);   
        if  (((objsize[k] + objsize[j])/2) > D) {   
    	Collisions = Collisions + 1;   
    	CollisionObjectA[Collisions] = j;   
    	CollisionObjectB[Collisions] = k;   
        }   
        f=ObjMass[j] * ObjMass[k] * 398600440000000/D/D/D;
        if (f > 0) { 
           objvx[j] = objvx[j] - dx*f;   
           objvy[j] = objvy[j] - dy*f;   
           objvz[j] = objvz[j] - dz*f;   
           objvx[k] = objvx[k] + dx*f;   
           objvy[k] = objvy[k] + dy*f;   
           objvz[k] = objvz[k] + dz*f;   
        }   
    	
      }   
     }

  29. #59
    Of course, (Tony), if you don't want to mess with changing the values for the velocities since they might figure into other parts of the program as well, you can just use this for the first loop and leave the second loop and the rest of the program as they are. As a matter of fact, this would definitely be the best and simplest one to try first anyway and probably the best one to use.

    Code:
    for (int k=1;k<=NumObjects;k++) {   
      for (int j=k+1;j<=NumObjects;j++) {   
        dx = objx[j] - objx[k];   
        dy = objy[j] - objy[k];   
        dz = objz[j] - objz[k];   
        D = sqrt(dx*dx+dy*dy+dz*dz);   
        if  (((objsize[k] + objsize[j])/2) > D) {   
    	Collisions = Collisions + 1;   
    	CollisionObjectA[Collisions] = j;   
    	CollisionObjectB[Collisions] = k;   
        }   
        f=ObjMass[j] * ObjMass[k] * 398600440000000/D/D/D;
        if (f > 0) { 
           objvx[j] = objvx[j] - dx*f/ObjMass[j];   
           objvy[j] = objvy[j] - dy*f/ObjMass[j];   
           objvz[j] = objvz[j] - dz*f/ObjMass[j];   
           objvx[k] = objvx[k] + dx*f/ObjMass[k];   
           objvy[k] = objvy[k] + dy*f/ObjMass[k];   
           objvz[k] = objvz[k] + dz*f/ObjMass[k];   
        }   
    	
      }   
     }

  30. #60
    Join Date
    Sep 2005
    Posts
    8,792
    Grav,

    I'll look at what you've done after a while, don't think I'm ignoring you (well, I am not paying attention as I'm looking at what I've got here for now. ), which leads me to this:

    Tony,

    I've been comparing times with my SSE2 version of the .dll compiled with Intel's compiler vs the original version. Note this is not vectorized, this is just the compiler using the SSE2 instructions and doing its best to optimize otherwise.

    Running that on my Athlon64 box, the SSE2 version seems to run at least twice as fast or better. I don't know what's up with your Pentium4 (Tony reported the SSE2 was slower -- about 60% of the original).

    I ran the "barycenter" dataset, which focused on the earth-moon, system. After 30 secs of run time, the SSE2 version had advanced to 2012. In the same time, the old version only made it to 2004.

    And I had the same result with the "moonbuilder" dataset that had a large number of small objects in earth orbit.

    So at least on my Athlon64 (2GHz), the SSE2 instructions are indeed faster.

    -Richard

Similar Threads

  1. Does anybody have an Ansys (or similar) magnetic field simulation software program?
    By SRH in forum Space/Astronomy Questions and Answers
    Replies: 0
    Last Post: 2012-Jul-07, 12:58 PM
  2. Precise Distances of Solar System Objects Relative to Earth
    By cjackson in forum Space/Astronomy Questions and Answers
    Replies: 11
    Last Post: 2011-Mar-28, 12:25 PM
  3. Is there any planet evolution planetary simulation online program?
    By m1omg in forum Space/Astronomy Questions and Answers
    Replies: 6
    Last Post: 2007-Oct-29, 03:12 PM
  4. Precise Astronomical New Year
    By publius in forum Space/Astronomy Questions and Answers
    Replies: 28
    Last Post: 2007-Jan-04, 04:41 AM
  5. New Microsoft MSN Search Engine -- More Precise
    By 01101001 in forum Off-Topic Babbling
    Replies: 13
    Last Post: 2005-Feb-04, 10:31 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •