Ray: Below are my thoughts on running tgcm23 on the IBM. If you get a chance, please read this over before we meet on Monday. I am making a 5-day test run on blackforest from your account now. Tomorrow we can look at the output, and go over some details about blackforest. Hopefully, you can make some runs while I'm at AGU -- I will have my laptop in SF, so will be available via email. --Ben The transfer of output listings from blackforest to hao following batch jobs is unreliable, so I will describe a procedure you can use for submitting tgcm23 jobs to blackforest, monitoring the job, and getting the output back. You can submit ibm jobs from ouray using the "submit" command, just like cray jobs, except you use a different run script. You can monitor the job with some aliases I have put in your .cshrc file that do rsh commands to blackforest, but the best way is to login to blackforest during and after a run and monitor the job from there. I have made a new directory on your account on ouray: ~roble/timegcm/tgcm23. Submit tgcm23 jobs to the ibm from here, e.g. "submit tgcm23_aix.job". (As I mentioned earlier, I put an alias in your .cshrc file so your submit command will execute /home/tgcm/bld/submit, which has the blackforest option.) Once you have submitted a job to blackforest, job scripts and output files are stored in the ~/submit directory on blackforest. (your home is /home/blackforest/roble): /home/blackforest/roble/submit This directory will contain run scripts and output files from a run. When you submit from ouray, this directory is cleared out, and scripts for the new job are copied into and submitted from here. As the job starts, there will be output files for compilation, execution (and finally the attempted rcp back to hao, which isnt working). For example, during a run, you can telnet to blackforest from ouray (password is same as on the Crays), then cd to submit and do an lf, e.g.: bf0915en% pwd /home/blackforest/roble/submit bf0915en% lf total 3992 -rwxr--r-- 1 roble ncar 1645262 Dec 10 13:36 build_step.csh* -rwxr--r-- 1 roble ncar 2543 Dec 10 13:36 exec_step.csh* -rw-r--r-- 1 roble ncar 1522 Dec 10 13:36 loadlev.job -rwxr--r-- 1 roble ncar 2210 Dec 10 13:36 rcp_step.csh* -rw-r--r-- 1 roble ncar 81255 Dec 10 13:41 tgcm23_build.out -rw-r--r-- 1 roble ncar 300319 Dec 10 14:05 tgcm23_exec.out tgcm23_build.out: output listing from the compilation tgcm23_exec.out : output listing from the execution The tgcm23_exec.out file will grow during the execution, and you can look at the end of the file to see how far the job is, e.g.: bf0915en% tail tgcm23_exec.out 6:Step 140 of 1800 mtime 80, 9,20, 0 5:Step 140 of 1800 mtime 80, 9,20, 0 2:Step 140 of 1800 mtime 80, 9,20, 0 11:Step 140 of 1800 mtime 80, 9,20, 0 4:Step 140 of 1800 mtime 80, 9,20, 0 8:Step 140 of 1800 mtime 80, 9,20, 0 9:Step 140 of 1800 mtime 80, 9,20, 0 1:Step 140 of 1800 mtime 80, 9,20, 0 10:Step 140 of 1800 mtime 80, 9,20, 0 7:Step 140 of 1800 mtime 80, 9,20, 0 bf0915en% The numbers at the beginning of each line are output from the 12 processors (or tasks). When the job completes successfully, the tgcm23_exec file is sorted and split out into separate files for each process, e.g., tgcm23_task0.out, tgcm23_task1.out, etc. The one you probably want to save is the task0 (master) output, e.g. the following command (executed from blackforest in the ~/submit directory) would copy the master output to your ntwk directory on ouray: blackforest> rcp tgcm23_task0.out ouray.hao:ntwk/tgcm23.out You can do all these things with rsh and rcp commands to blackforest from ouray, but its really easier to telnet to blackforest and do it from there. During a run, you can use aliases from ouray to monitor a job. Here are the aliases I have put in your .cshrc file on ouray: # # Blackforest: alias bfstat 'date ; rsh blackforest.ucar.edu ps -f -u $user' alias bstat 'date ; rsh blackforest.ucar.edu llq' alias bfqstat 'date ; rsh blackforest.ucar.edu llq -u $user' alias bfqstatl 'date ; rsh blackforest.ucar.edu llq -l -x -u $user' For example, executing "bfqstat" from ouray: (ouray) tgcm23 : bfqstat Sun Dec 10 14:17:39 MST 2000 Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- bf0915en.152009.1 roble 12/10 13:36 R 50 com_pr bf0811en bf0915en.152009.2 roble 12/10 13:36 NQ 50 interactive bf0915en.152009.0 roble 12/10 13:36 C 50 share 2 job steps in queue, 0 waiting, 0 pending, 1 running, 1 held (ouray) tgcm23 : This shows the 3 job steps (compilation is in the "share" queue, execution is "com_pr", the rcp that doesnt work is "interactive). The status flags "R" means running, "NQ" means "not queued", and "C" means "completed". The blackforest batch system is called "loadleveler". To get a list of current jobs on the machine, you can use the alias "bstat" command from ouray, or the "llq" command on blackforest. The loadleveler will send you email when a job step is completed, or if there was an error. There is a 6-hour wallclock limit on blackforest. I have found that a 5-day job at step=240, saving daily primary histories and hourly secondary histories in the last day takes about 5 wallclock hours. There are two community queues: com_pr and com_reg. You can set to either in the tgcm23_aix.job run script, e.g.: # @ class = com_reg Other important directories on blackforest are: (/ptmp is the parallel file system on the ibm, and is subject to the scrubber). /home/blackforest/roble/tgcm23 Contains source and object files from the most recent compilation. /ptmp/roble Directory from which the model is executed. /ptmp/roble/tgcm23 Directory for storing history and data files. At this point, run only one job at a time on blackforest, especially the same model version. You could probably make tgcm14 and tgcm23 runs at the same time, but we can arrange for simultaeneous jobs later.