I was into pushing the parameters to the edge of chaos. This system got there and then went beyond:
I was into pushing the parameters to the edge of chaos. This system got there and then went beyond:
Everyone is entitled to his own opinion, but not his own facts.
Well, all you would have with what I was doing was a bunch of graphics of the inner solar system going around and around in near circles. The other one I used a lot, "Moon builder" wouldn't lend itself to any pictures anyway, as it was a bunch of moonlets colliding.
-Richard
Pretty picture-- but why is it translating to the right? Perhaps the same numerical problems causing the precession are violating momentum conservation, or did you start the system out with some momentum (if so, it might be better to stay in the center-of-mass frame, as a check)? Also some simulations seem to go from unbound, to bound for a few complete revolutions, and back to unbound again-- but that isn't physically possible without internal degrees of freedom. This brings up a very important consideration about computer programs-- they are great at making pretty pictures, and there's a tendency to think that something that is wrong cannot be pretty, but of course that isn't the case!
I just got a Fortran version of that .dll working. It was interesting. Things were going awry -- odd things were happening, such as the bodies jumping all around, even dissappearing. Collisions made thing go really wild. For instance, running the "moonbuilder" data set, one of the moon chunks begin orbiting (very fast) *inside* the earth, the graphic showing it whizzing around inside the blue dot representing earth.
Yep, something was wrong -- I looked at the type of arguments, calling conventions and all that trying to make sure the FORTRAN routine was thinking the data form was what it was supposed to be. It seemed to be fine.
I finally figured out what it was, array lower bounds. In Fortran, it starts at 1. That is, X(1) is the first memory element. In C (and VB now), the default lower bound is 0. That is X[1] points to the *second* element in line. That was the trouble.
Now, I thought the FORTRAN routine might be faster. The .asm output looked a bit tighter than the C generated code. However, the FORTRAN is slightly slower, at least for the PIII code. I haven't tried any PIV SSE2 versions yet.
-Richard
Originally Posted by Celestial MechanicOriginally Posted by Celestial MechanicI copied these from another thread.Originally Posted by Celestial Mechanic
Okay, so, as far as I can tell, the original program is all about conservation of momentum, so we just need conservation of energy. But to do this, we must use the ellipse thing as well anyway, so everything will be included here. The very first formula in the first quote above gives this. v, r, G, M, and m are all known, so now we can find 'a' with
a=1/[2/r-((vx(j)-vx(k))^2+(vy(j)-vy(k))^2)/G/(M+m)]
From this, we can find 'e*cos(E)' with r=a*(1-e*cos(E)). Now we can find 'p' with r=p*(1+e*cos(E)). Then we find 'e' from p=a*(1-e^2), 'b' from a*p=b^2, and 'E' from e*cos(E), since 'e' is now known.
From the second quote, we can find x, y, vx, and vy for each body on an "upright" ellipse. From the third quote, we can find the current M with just M=E-e*sin(E). Then we add the time interval to M with M'=M+n*ti, and find the new E iteratively. Now, a couple of tricks to find the angle of the ellipse relative to the upright one, and the frame of reference of the entire system in respect to the velocities and we can find the new positions and velocities. This is what I've got so far, but before I start working it all into the program, please, please, if anyone sees anything wrong with any of that, let me know.
Yeah, I was hoping it would be as well. I'm looking at this more in detail now.
Just in case any are interested, here is the FORTRAN version of that function. This is "free form" source format, not the old fixed form where columns were important. This will not compile under a compiler that doesn't do free-form. And I changed the variable names to something a little more in keeping with FORTRAN style (and easier to type)
And note at the last, I tried using FORTRAN's direct array operation features (again, FORTRAN is language made for numerical work and has loads of features like this). The vectorizer really likes those direct array expressions, as it can do things exactly like it wants, for one thing. FORTRAN's array handling syntax is quite extensive and powerful.
For example, among many other tricks, you can use so-called vector subscripts. Ie, make a reference to X(A), where A itself is an array. That expression means take the elements of X pointed to by the elements of A. For example, if A was 3 elements long and containe {1, 3, 5}, X(A) would refer to X(1), X(3), and X(5) as a unit. So one could write something like this:
Z(A) = X(A) * Y(A), and that would replace do that operation on each of the elements pointed to by A. You can see how that and other similar features can be very useful for doing "index gymnastics" of the sort that arise with complex tensor and matrix operations.
-RichardCode:! FortOptimizer.f90 ! ! FUNCTIONS/SUBROUTINES exported from FortOptimizer.dll: ! FortOptimizer - subroutine ! Integer*4 Function RunLoop(N, Mass, X, Y, Z, Vx, Vy, Vz, OSize, ColA, ColB ) !DEC$ ATTRIBUTES STDCALL, ALIAS:'RunLoop', DLLEXPORT::RunLoop ! Parameters Integer*4 N, ColA(0:N), ColB(0:N) Real*8 Mass(0:N), X(0:N), Y(0:N), Z(0:N), Vx(0:N), Vy(0:N), Vz(0:N), OSize(0:N) ! Local variables Integer*4 j, k, h, NCol Real*8 dx, dy, dz, f1, f2, qq, tM, tM2, D, D2, D3 Real*8, Parameter :: MuCon = 398600440000000.D0 ! Body NCol = 0 RunLoop = 0 If ( N .LT. 1) Return Do k = 1, N tM = Mass(k) * MuCon Do j = k + 1, N dx = X(j) - X(k) dy = Y(j) - Y(k) dz = Z(j) - Z(k) D2 = dx*dx + dy*dy + dz*dz D = DSQRT(D2) D3 = D * D2 If( (Osize(k) + OSize(j))/2D0 .GT. D ) Then NCol = NCol + 1 ColA(Ncol) = j ColB(Ncol) = k End If tM2 = Mass(j) * MuCon f1 = tM/D3 f2 = tM2/D3 f3 = f1 + f2 qq = 1D0 + f3*(1D0/6D0 + f3*(1D0/15D0 + f3/28D0)) If (tM .GT. 0D0) Then Vx(j) = Vx(j) - dx*f1*qq Vy(j) = Vy(j) - dy*f1*qq Vz(j) = Vz(j) - dz*f1*qq End If If (tM2 .GT. 0D0) Then Vx(k) = Vx(k) + dx*f2*qq Vy(k) = Vy(k) + dy*f2*qq Vz(k) = Vz(k) + dz*f2*qq End If End Do End Do ! Do h = 1, N ! ! X(h) = X(h) + Vx(h) ! Y(h) = Y(h) + Vy(h) ! Z(h) = Z(h) + Vz(h) ! End Do ! Try array copy rather than explicit loop X = X + Vx Y = Y + Vy Z = Z + Vz ! End of loop computations ------ RunLoop = 1 Return End Function RunLoop
Just a quick question: how many iterations have you been running your programs for? Thousands? Hundreds of thousands? Tens of millions?
Especially Tony's runs for estimating the precession of Mercury's perihelion.
The reason I ask is that I've taken a stab at this, and I run into trouble as soon as the step size gets up to 10^{5} s, which is just more than one day. My little test programs test the two-body solution, and they periodically print the angular momentum, energy, and eccentricity.
In post 108 I show a graph of Mercury's precession in arcseconds per century. I ran the simulation for 10,000 years, so with a time step of 2048 it performed about: 10000*365*86400/2048
or about 150 million iterations, and for time step of 128, about 2.5 billion iterations.
All,
I'm guess I'm like the politician who had never seen a supermarket checkout scanner before and thought it was amazing.
I ran the FORTRAN code through the PGO phases (profile guided optimization, using run-time data from an "instrumented build" to tell it better how to optimize), and was comparing the .asm output.
The PGO feedback caused the compiler to utterly rearrange the code and do things differently. One stark thing I couldn't understand at first was it moved whole chunks of code to the end of the code stream, inserting jumps in the front to jump down to the end, then back up.
I couldn't understand that at first, then got to looking at the compiler's "comments" (very terse numbers). It is very concerned with *probabilities* of jumps being taken. She's also very interested in what percentage of the time various code executes and spewed out those numbers as well (this is what the PGO data tells it). Why is that, I wondered....
Then it hit me what the game was. The compiler was arranging things using branch percentages to make the most likely code path be right there at the front and just fall straight through most of the time. Clever, indeed! And by doing that, it is also ensuring that most used code path can maybe stay right in the *L1* cache hopefully. For example, note the collision detection loop in the high level source code. It moved all code to do that to the end.
Clever. Intel's claim to fame is "we know our chips, use our compilers". Well, their compiler does indeed appear to know what it's doing.......
ETA: Something else I just noticed was the compiler would actually repeat code rather than jumping. I noticed repeated code in the low probability sections at the end. Apparently, with "pipelining" and speculative execution, jumps are really costly, and it will just repeat blocks of code to save jumping.
-Richard
Last edited by publius; 2007-May-09 at 03:46 AM.
Forgive me for rambling on about this optimization stuff, but it fascinates me to no end. I was playing around with the PGO optimization, and discovered I get slight speed differences depending on which type of run I do for the instrumented build.
For example, if the runt-time is all "moonbuilder", moonbuilder will run very fast. However, if I use run-time data from the other sets, I found it slows it down a bit (just every so slightly, mind you) for moon builder.
As I was rambling about above, the compiler weighs how many times the various blocks of code (as it defines a block) run relative to each other. If N is large, it does it one way, but if N is small, it does another.
I am trying to get the PGO data for both Fortan and C to be about the same and compare it. It looks like the C compiler/code are always just the slightest bit faster than the Fortran. That sort of deflates my little balloon, but that's the way this ball is bouncing here.
-Richard
The number of objects in Moonbuilder is continuously decreasing. Does this have an effect?
When I would run a profile on moobuilder, I would generally stop it before it got it all the way down to just 2-4 objects. The sun and planets are in there, so N never got below 12 or objects. The beginning of the run would bias it towards larger N, I'm sure.
What the compiler is looking for is how much each conditional sections actually execute, and many times a loop runs.
For example, if you have an "If (condition)" statement, it wants to know how many times condition is true. And for a "For I = 1 to N" loop, it wants to know on average what N is. Based on that, it decides the best way to do things, and if those numbers are different between test runs, it will do things differently.
What I'm doing here is just barely getting our feet wet with one small function. If the compiler could get it hands on the whole program, profiling the whole darn thing, letting it do all code and data arrangements, it could really go to town.
-Richard
Well, I've done the ellipse thing (that r=p*(1+e*cos(E)) above should have been r=p/(1+e*cos(E)), by the way). The program works very well, for a two body system, anyway. It didn't as I worked in the new velocities with each iteration for some reason, though, even though they seemed to match what I got with the original program over very small time steps reasonably well, but when I used only the original values for a, b, e, and so on with each one and only found the new positions accordingly, it formed the perfect ellipse every time, regardless of the time step used.
Now, this is all fine and good for a two body system, but all it's really doing is to produce the perfect ellipse for elliptical orbits in the first place and then finding where the bodies would lie upon these ellipses after a given amount of time has elapsed, which sounds like exactly what we are trying to do in the first place, but it's really not, since this gives no leeway for any additional factors of an orbit that might be worked in since the calculations are so rigid, only that for the original ellipse, that's all, and a multiple body system would have to found over small time steps anyway, since for example, if one body completes one orbit around another while we factor in for a third body at the same time, it would be as if the first orbit hadn't figured in at all, because it would be right back where it started from, and would end up being even less precise than the original program unless small enough time steps are used anyway. So all those extra calculations and iterations would really only serve to slow down and complicate the original program tremendously. Needless to say, then, I'm basically right back to square one, just trying to squeeze out that extra little bit of discrepency from the original program, by attempting to figure in something simple that amounts to some form of a conservation of energy on its own or a second order approximation or something of that nature, according to the way the new positions and velocities are currently found.
Tony,
I've been playing around with using those numerical differential equation solver routines I mentioned before. If you're interested, here's how the main program will have to be modifed.
I'll need all the variables in one chunk of contiguous memory as I mentioned before. That can be done in VB, and use VB's pointers or equivalent to continue to refer to the chunks as separate arrays. Do it this way:
BigArray = vx, vy, vz, x, y, x, where each one is the N sized array for each of those coordinate quanties. BigArray will therefore be a size 6N element linear array.
Second, I'll need the time step directly. The routines themselves require an initialization call, followed by the step calls, and finally a termination call. It will be best to follow that structure ourselves. So probably I'll have an initialization routine where you pass me the above array, the initial time (which I assume is just t0 = 0), and the initial step size.
You will then call another routine that evolves the system for the desired time step. This will be basically the plot interval. The routines decide their own internal time stepping, so it's best to let them go and stop only when you want to plot the data.
And then there will be a "cleanup" call.
When I get some code halfway working, I'll let you know so you can smoke over the interface I've come up with. We can certainly tweak that, of course.
-Richard
School's out in 2 weeks, so soon I'll have some time to make the changes.