Peter: As per your request, here are some 1st-cut timings of a 1-day run of tiegcm (480 3-minute steps) on ouray, ute, and blackforest. Obviously, ute wins the race, using OpenMP on DSM architecture. MPI overhead is large for only 1-d decomposition: its a master/slave thing right now, distributed over geographic latitude. Each task calculates nlat/ntask latitudes, and all tasks must exchange 4 boundary latitudes at every time step for 4th order differencing. In the shared memory machines, its all either in global memory, or is made available through direct memory transfers. Also, the dynamo is still serial, so master must collect geographic latitudes from slaves before running dynamo (which is short in cpu time), then distribute output afterwards at every step. FYI, the source is in /home/tgcm/tgcm14, with main driver routine being advnce.F. All tasks read redundantly from same history file and other data files at start-up, but only the master outputs histories -- however, only 1 history was written at the end in these tests, so i/o was not a big deal. I'm not sure about dedicated vs non-dedicated processors on the IBM, and I'm not sure I can distinguish between code efficiency and machine load effects. There were about 4-6 people using ute at the time of these runs, vs 20+ jobs in ouray. Also, I'm not totally sure that timex and timef are reporting strictly the same thing, but from a strictly practical standpoint, the clock on the wall does not lie. It compiles on ES40, but am working out some overflows, so don't have numbers for it yet. Maybe hybrid MPI+OpenMP will work on the Compaq machine. In short, I'm still in love w/ SGI Origin. Dataproc is very fast for serial post-proc as well. I will fill in 6-proc runs for j90 and o2k, and make some 1-proc runs also. Am going to Hammond's meeting in 40 mins. --Ben tgcm14: 1 day simulations at 3-min time-step (480 time iterations). (1-d decomposition over latitude with nlat=36) Host Model OpSys Parall nproc/ Elapsed User cpu method ntask (hrs) (hrs) ================================================================ ouray j90se Unicos !MIC$ 6 9 0.63 1.55 12 0.56 1.6 ---------------------------------------------------------------- ute o2k IRIX OpenMP 6 9 0.30 2.56 12 0.24 2.7 ---------------------------------------------------------------- black SP AIX MPI 6 2.5 forest 9 1.7 12 1.46 (ntasks using 2pe's per node, i.e., 6 tasks on 3 nodes, 9 tasks on 5 nodes (1 idle proc), and 12 tasks on 6 nodes) ---------------------------------------------------------------- Elapsed times are from timef on SP, timex on j90se and o2k. Queues were reg on j90, ded_16 on o2k, and com_pr on the SP.