Ray: Wed 7/21 I have written a timesgcm script that runs TGCM15 with options to run various timing and analysis tools. The script is ~roble/times/ timesgcm.ref. I set it up with lexical reads from timesgcm.ccm.sol and made copies of the two mod decks (timesgcm.ref.mods is same as timesgcm.ccm.sol.mods and timesgcm.ref.mods1 is same as timesgcm.ccm.sol.mods1). I have run it for a single 6 min time step (DISPOSE=0), and got the results for the 5 analysis tools. The results are in /ouray/d/times/HS*. I have put comments in the timesgcm.ref script showing how to use the various analysis tools. There are 2 that do fortran analysis on the source code (ftref and flint), and 3 that do timing and performance analysis during execution (perftrace, flowtrace, and a profiler). I saved hard copies of the script, reports from the single time step tgcm15 runs, and man pages from the tools. We can go over these when you get back next week, but the most obvious thing is that the subroutine QRJ used 25.3% of the total cpu time! At least part of the reason is the use of EXP, the floating point exponential system routine. The profiler reported that EXP used 19.5% of execution, when sampling memory "buckets". The consultants suggested using a 1/2 precision EXP routine, but I have'nt tried it yet. The performance monitor said that the program as a whole "appears to be an efficient vectorized code", and that overall it achieved 161.4 MFLOPS during execution, which is pretty good I guess. Another suggestion made by the performance monitor (and flowtrace) was to "in-line" the QVxxxx routines, i.e., put the code from the QVxxx routines into the calling modules. This would be more efficient because these routines are called many times but have very short cpu times (QVSUB0 was called 56664 times and its average execution was 1.92e-6 seconds.) I think Ciceley has already done this with DTUV, but that version is not in TGCM15 yet (maybe that's the one with the strange bug). See you on Monday... --Ben