|
This
guide shows you how to run serial and parallel
Gaussian 98 programs under batch job management
system "LoadLeveler" on the High
Performance Computing (HPC) Cluster - Orbit.
After
reading this guide, you should be able to know
the basic procedures of how to run serial and
parallel Gaussian 98 (G98) programs on Orbit.
Sample Gaussian 98 programs and LoadLeveler
job scripts are provided and explained in details.
In addition, simple performance analysis for
Gaussian 98 programs and usage tips are also
included.
1.
Access to the HPC Cluster "Orbit"
1.1) Access
on Campus
1.2) Access
off Campus
1.3) File
Transfer from your PC to Orbit
2.
Sample G98 Programs and Batch Job Scripts
2.1) Single
processor
2.1.1) Preparing
G98 Program, Specifying Locations for G98
Working and Checkpoint Files
2.1.2) Preparing
Batch Job Script for G98 Program
2.1.3) Job
Submission
2.2) 4
Processors
2.2.1) Preparing
G98 Program, Specifying Locations for G98
Working and Checkpoint Files
2.2.2) Preparing
Batch Job Script for G98 Program
2.2.3) Job
Submission
2.3) 8
Processors (2 nodes)
2.3.1) Preparing
G98 Program, Specifying Locations for G98
Working and Checkpoint Files
2.3.2) Preparing
Batch Job Script for G98 Program
2.3.3) Job
Submission
2.4) 16
Processors ( 4 nodes)
2.4.1) Preparing
G98 Program, Specifying Locations for G98
Working and Checkpoint Files
2.4.2) Preparing
Batch Job Script for G98 Program
2.4.3) Job
Submission
3. G98
Program Results
4. Performance
Analysis
5. Tips
6. Enquiries
1. Access to
the HPC Cluster "Orbit"
Before
establishing a connection across the Internet
from your PC to Orbit, you need to install
a free licensed Secure Shell (SSH) client
program http://www.cuhk.edu.hk/itsc/ssh/ into
your PC.
1.1)
Access on Campus
You
can directly connect your PC to Orbit by running
the SSH program.
On
the File menu of the SSH program,
you should select Quick Connect option.
Then, you can input the Host Name of
the cluster, i.e. orbit.itsc.cuhk.edu.hk and User
Name (e.g. s012345). Once you press the
OK button, the program will prompt you for
a Password of the User Name, i.e.
the password printed on the Computer
Accounts Application Reply Slips of
the HPC cluster. When the password is correct,
a terminal session will be created and a system
prompt [orbit201]% will be echoed to the
screen.
1.2)
Access off Campus
You
can access the campus network using your PC/laptop
at home or while traveling via a Virtual
Private Network connection. Virtual
Private Network (VPN) technology uses encryption
and some security mechanisms to ensure that
only authorized CUHK users can access the network. For
this connection, you need to have your Campus-wide
E-mail System (CWEM) password ready for
authentication.
To
establish a VPN connection for your PC, you
should follow the procedures posted at http://www.cuhk.edu.hk/itsc/network/vpn/index.html.
Once
you successfully build the connection, you
can run the SSH program and login the cluster
as described in the above section 1.1.

Following
tasks are allowed to perform on the login workstation:
- Editing
and compiling of programs
- Setting
up input data files
- Editing
of jobs and Load Leveler batch job scripts
- Job
submission and monitoring
- Checking
output files from batch runs
- Running
a short interactive job
1.3)
File Transfer from your PC to Orbit
When
you need to upload or download files to/from
your cluster account, you can do so by using
the SSH Secure File Transfer function of the
SSH client program.
When
your PC is connected to Orbit, you can open
a new SSH File Transfer Window by clicking
the yellow button on the second row of the
SSH window or in Window Tool Bar -> New
File Transfer. Then you can see the following
new window:

The
directories and files listed at the left hand
column are those stored in your PC, while the
right hand column are the details of your files
and directories residing on Orbit.
You
can choose the file you want to upload/download
from your Orbit account. For example,
you can right click on the file and choose
upload/download option. The bottom dialog
shows the status of the file transfer.
2. Sample
G98 Programs and Batch Job Scripts
2.1)
Single processor
2.1.1)
Preparing G98 Program, Specifying Locations
for G98 Working and Checkpoint Files
A
sample G98 program "program1.g98" is
given as follows:
|
#P
TEST UHF/6-31G* FORCE
Gaussian
Test Job 30
Methylene
uhf forces
0
3
C
H
1 R
H
1 R 2 A
R
1.08
A
135.0 |
To make the above program successfully
run on Orbit, you need to specify the locations
for G98 checkpoint, scratch files, etc. on Orbit.
rwf specifies
the location for the Read-Write file.
int specifies
the location for the two-electron integral
file.
d2e specifies
the location for the two-electron integral
derivative file.
scr specifies
the location for the scratch file.
chk specifies
the location for the checkpoint file.
You should create a working directory to store
above five files and invoke a pico editor
to add the following paths for those files.
e.g. orbit[1]%
mkdir /scratch2/s123456
e.g. orbit[2]%
pico program1.g98
%
rwf =/scratch2/s123456/myjob
%
int =/scratch2/s123456/myjob
%d2e=/scratch2/s123456/myjob
%
scr =/scratch2/s123456/myjob
%
chk =/scratch2/s123456/myjob
The
locations of working files should be
changed according to your account (e.g. s123456
in this example) and assigned disk storage
quota (scratch or /scratch2).
A
complete G98 program is shown as follows:
%mem
=400Mb
%rwf
=/scratch2/s123456/myjob
%int
=/scratch2/s123456/myjob
%d2e=/scratch2/s123456/myjob
%scr
=/scratch2/s123456/myjob
%chk
=/scratch2/s123456/myjob
#P
TEST UHF/6-31G* FORCE
Gaussian
Test Job 30
Methylene
uhf forces
0
3
C
H
1 R
H
1 R 2 A
R
1.08
A
135.0
2.1.2)
Preparing Batch Job Script for Your G98 Program
The
Cluster "Orbit" uses Load Leveler batch
job management system to dispatch user jobs.
The Load Leveler is an efficient job scheduling
and management system which provides functions
of job submission, cancellation and backfilling,
etc.
After
you have prepared your G98 program, you also
need to prepare a Load Leveler job script for
above program "program1.g98". This
job script is a job control language that serves
batch job users who want to queue up and run
jobs under the Load Leveler batch job management
system.
The
following is a listing of a job queue status
of the Load Leveler.
orbit[3]
% llq
Id Owner Submitted ST PRI Class Running
On
------------------------ ---------- ----------- -- --- ------------ -----------
orbit201.6540.0 b113746 11/15 15:03 R 50 q1n4g orchid1
orbit201.6694.0 b012724 11/29 14:55 R 100 q8n32g orchid4
orbit201.6687.0 s030774 11/29 10:59 R 50 q1c750m orbit202
orbit201.6685.0 s010278 11/29 10:56 R 50 q4n16g orchid2
orbit201.6705.0 s010149 11/30 22:55 I 50 q1c750m
orbit201.6710.0 s030776 12/1 01:29 I 50 q4n16g
orbit201.6704.0 s800000 11/30 22:48 I 50 q4n16g
orbit201.6602.0 s030776 11/21 09:28 I 50 q1n4g
orbit201.6709.0 s020403 12/1 01:09 I 50 q4n16g
orbit201.6708.0 b057779 11/30 23:05 I 50 q1n4g
orbit201.6707.0 s010149 11/30 23:00 I 50 q1c750m
orbit201.6706.0 s010149 11/30 22:57 I 50 q1c750m
orbit201.6672.0 s010149 11/26 21:40 I 70 q1c750m
orbit201.6688.0 s030774 11/29 11:00 I 50 q1c750m
orbit201.6573.0 b113746 11/19 15:24 I 50 q1n4g
orbit201.6572.0 b113746 11/19 15:24 I 50 q1n4g
16 job steps in queue, 12 waiting, 0 pending, 4 running, 0 held
Id:
the job id allocated by Load Leveler
Owner:
Username of the owner of the submitted job
Submitted:
the submit time of the job
ST:
Status of the job, R = Running, I = Idle
PRI:
Priority of the job, every user will have its
own priority assigned
Class:
The class of the job and it also stands for the
queue that the job is queuing
Running
On: The primary node where the job is running
To
make the program run under the control of Load
Leveler, you need to use the Load Leveler batch
job statements. A sample job script "program1.job" is
shown as follows:
orbit[4]
% pico program1.job
!/bin/
csh
#@
initialdir = /users/student/s123456/
#@
class = q1c1g
#@
notify_user = s012345@mailserv.cuhk.edu.hk
#@
queue
setenv g98root /usr
/local
source
$g98root/g98/bsd/g98.login
setenv GAUSS_EXEDIR $g98root/g98
/usr/local/g98/bsd/clearipc
timex g98 < program1.g98 >& program1.out
/usr/local/g98/bsd/clearipc
Meaning
of above job script statements
#!/
bin/ csh - To use C Shell /bin/
csh
Command
started with "# @" is for the use of LoadLeveler
Initialdir
: Specifies the path of working directory. You
should change it according to your need
class
: Specifies the class (i.e. queue name) of your
job. Available class: q8n32g, q4n16g, q2n8g,
q1n4g, q1c750m, q1c500m, q6n24g and q1c1g, etc.
You should choose a suitable class for your program.
'q'
stands for "queue", 'n' stands for "node", 'g'
stands for " Gbytes memory".
To
see an updated class list, you can invoke the
command llclass .
[orbit]
6% llclass
Name
MaxJobCPU MaxProcCPU Free Max Description
d+hh:mm:ss
d+hh:mm:ss Slots Slots
q4n16g
-1 -1 0 60
q5n20g
-1 -1 7 44
q6n24g
-1 -1 7 60
q7n28g
-1 -1 7 60
q8n32g
-1 -1 7 60
q12n48g
-1 -1 7 60
q16n64g
-1 -1 10 49
q2n8gf
-1 -1 0 16
q3n12g
-1 -1 7 44
q4n16gf
-1 -1 0 16
q1c500m
-1 -1 0 1
inter_class
-1 -1 0 16
q1n4g
-1 -1 3 20
q2n8g
-1 -1 3 20
qtest1
-1 -1 6 12
q2c2g
-1 -1 2 4
q1c1g
-1 -1 1 3
q1c750m
-1 -1 0 4
q1c600m
-1 -1 1 1
--------------------------------------------------------------------------------
"Free
Slots" values of the classes "q3n12g", "q4n16g", "q5n20g", "q6n24g", "q7n2
8g", "q8n32g", "q12n48g", "q16n64g", "q2n8gf", "q4n16gf", " inter_class ", "q1n4g"
, "q2n8g", "qtest1", "q2c2g" are
constrained by the MAX_STARTERS limit(s).
"Free
Slots" values of the classes "q4n16g", "q1c750m", "q1c500m" are
constraine
d
by the MAXJOBS limit(s).
queue
: queue up the job
setenv
g98root / usr /local
source
$g98root/g98/bsd/g98.login
setenv
GAUSS_EXEDIR $g98root/g98
set
up a working environment for the G98 program
/usr/local/g98/bsd/clearipc
timex
g98 < program1.g98 >& program1.out
/usr/local/g98/bsd/clearipc
"program1.g98" is
the input file of the Gaussian 98 program and "program1.out" is
the output filename of the program.
Statement /usr/local/g98/bsd/clearipc is
used to clean up the previously unreleased computing
resources such as inter process memory.
Backfilling
Sometimes,
you want to test/run your G98 job "program1.job" for
a short period of time (say 3 minutes) and to
get a shorter turnaround response. The backfilling
technique can meet these purposes. From
the technical viewpoint, backfilling is the capability
to schedule a job that is short in duration,
and which requires a small number of processors/nodes,
before other jobs. This is very useful when you
only want to test the accuracy of your Load Leveler
job script "program1.job" and G98 program
settings "program1.g98".
In
order to use backfilling in Load Leveler, you
need to add the following line in your job script "program1.job":
#
@ wall_clock_limit = 00:3:00
" 00:3:00 " specifies
the maximum execution time of your job. Your
job will be terminated no matter it has been
finished or not.
2.1.3)
Job Submission
After
you have prepared the job script file, you can
submit it by:
[orbit]%
llsubmit program1.job
System
will echo a message "llsubmit: The job "orbit201.itsc.cuhk.edu.hk.XXXX" has
been submitted." to you. Where XXXX is your
job ID.If
you want to cancel the job, you can type
[orbit]%
llcancel <job ID> Where <job
id> is above job ID XXXX. You can get it again
by using llq as illustrated in the previous section.
2.2)
4 Processors
2.2.1)
Preparing G98 Program and Specifying Locations
for G98 Working and Checkpoint Files
To
run the G98 program "program1.g98" under
a 4-processor environment, you need to insert
an additional statement "% nproc
= 4 " into the G98 program, i.e. program1.g98. This'll
make full use of the processor resources of a
single SP thin node http://www.cuhk.edu.hk/itsc/compenv/research-computing/cluster/hardware.html when
the program is run. A sample 4-CPU G98 program
is given as follows:
orbit[7]% pico program4.g98
%nproc
=4
%mem
=400Mb
%rwf
=/scratch2/s123456/myjob
%d2e=/scratch2/s123456/myjob
%scr
=/scratch2/s123456/myjob
%chk
=/scratch2/s123456/myjob
#P
TEST UHF/6-31G* FORCE
Gaussian
Test Job 30
Methylene
uhf forces
0
3
C
H
1 R
H
1 R 2 A
R
1.08
A
135.0
2.2.2)
Preparing Batch Job Script for Your G98 Program
A
sample 4-CPU job script is shown as follows:
e.g. orbit[9]% pico program4.job
#!/bin/
csh
#
@ initialdir = /users/student/s123456/
#
@ node = 1
#
@ tasks_per_node = 4
#
@ class = q1n4g
#
@ job_type = parallel
#@
notify_user = s012345@mailserv.cuhk.edu.hk
#
@ queue
setenv g98root /usr
/local
source $g98root/g98/bsd/g98.login
setenv GAUSS_EXEDIR $g98root/g98
/usr/local/g98/bsd/clearipc
timex g98 < program4.g98 >& program4.out
/usr/local/g98/bsd/clearipc
node
: Specifies the number of nodes you are going
to use. It should be 1 in this example.
tasks_per_node
: Specifies the number of processors per node
you are going to use. It should be 4 in this
example.
"program4.g98" is
the input file of the Gaussian 98 program and "program4.out" is
the output filename of the program.
2.2.3)
Job Submission
After
you have prepared the job script file, you can
submit it by:
[orbit]%
llsubmit program4.job
System
will echo a message "llsubmit: The job "orbit201.itsc.cuhk.edu.hk.XXXX" has
been submitted." to you. Where XXXX is your
job ID.
If
you want to cancel the job, you can type
[orbit]%
llcancel <job ID>
Where <job
id> is above job ID XXXX. You can get it again
by using llq as illustrated in the previous section.
2.3) 8
Processors ( 2 nodes)
2.3.1)
Preparing G98 Program and Specifying Locations
for G98 Working and Checkpoint Files
A
sample 8-CPU program is shown as follows:
e.g. orbit[9]% pico program8.g98
%
nproclinda =8
%
mem =400Mb
%
rwf =/scratch2/s123456/myjob
%d2e=/scratch2/s123456/myjob
%
scr =/scratch2/s123456/myjob
%
chk =/scratch2/s123456/myjob
#P
TEST UHF/6-31G* FORCE
Gaussian
Test Job 30
Methylene
uhf forces
0
3
C
H
1 R
H
1 R 2 A
R
1.08
A
135.0
Users
should note that %nproclinda = is used instead
of %nproc=. Others
remain unchanged.
2.3.2)
Preparing Batch Job Script for G98 Program
A
sample 8-CPU job script is shown as follows:
e.g. orbit[9]% pico program8.job
#!/bin/csh
#@
initialdir = /users/student/s123456
#@
node = 2
#@
tasks_per_node = 4
#@
node_usage = not_shared
#@
class = q2n8g
#
@ network.mpi = css0,not_shared,us
#@
job_type = parallel
#@
notify_user = s012345@mailserv.cuhk.edu.hk
#
@ queue
setenv PATH $PATH":/usr/local/g98/linda/ibm-aix4.3-I8/bin"
setenv g98root
/usr/local
source $g98root/g98/bsd/g98.login
setenv GAUSS_EXEDIR $g98root/g98
setenv hostlist hostlistfile
rm -f $hostlist
echo $LOADL_PROCESSOR_LIST > templist
awk
'{ gsub (/orbit/,"orchid");print $1,$5}' templist
| tr " " "\n">$hostlist
setenv
GAUSS_LFLAGS '+ kaon -n 7:8 -wait 3600 - workerwait
3600 - nodefile $hostlist -mp 4'
setenv
K5MUTE 1
/usr/local/g98/bsd/clearipc
timex g98l < program8.g98 >& program8.out
/usr/local/g98/bsd/clearipc
Users
should note that g98l (Gaussian 98 Linda
Version ) is used instead of g98 in the job
script.
node
: Specifies the number of nodes you are going
to use (Note that each node consists of 4 processors)
tasks_per_node
: Specifies the number of processors per node
you are going to use ( Max = 4)
node_usage
: Specifies whether you want to share the assigned
nodes with other jobs
network.mpi
: Specifies the mode for inter-node communication
(us or ip)
job_type
: Specifies the job is parallel job
queue
: queue up the job
setenv PATH $PATH":/usr/local/g98/linda/ibm-aix4.3-I8/bin"
setenv g98root /usr
/local
source $g98root/g98/bsd/g98.login
setenv GAUSS_EXEDIR $g98root/g98
is
used to set up a working environment for the
Gaussian 98 program
setenv hostlist hostlistfile
rm -f $hostlist
echo $LOADL_PROCESSOR_LIST > templist
awk
'{ gsub (/orbit/,"orchid");print $1,$5}' templist
| tr " " "\n" > $hostlist
setenv
GAUSS_LFLAGS '+ kaon -n 7:8 -wait 3600 - workerwait
3600 - nodefile $hostlist -mp 4'
setenv
K5MUTE 1
is
used to create the host list (i.e. hostname of
SP thin thin nodes) for the Gaussian 98.
/usr/local/g98/bsd/clearipc
timex g98l < program8.g98 >& program8.out
/usr/local/g98/bsd/clearipc
"program8.g98" is
the input file of the Gaussian 98 program and "program8.out" is
the output filename of the program.
.2.3.3) Job
Submission
After
you have prepared this script file, you can submit
it by:
[orbit]%
llsubmit program8.job
System
will echo a message "llsubmit: The job "orbit201.itsc.cuhk.edu.hk.XXXX" has
been submitted." to you. Where XXXX is your
job ID.
If
you want to cancel the job, you can type
[orbit]%
llcancel <job ID>
Where <job
id> is above job ID XXXX. You can get it again
by using llq as illustrated in the previous section.
2.4)
16-processors ( 4 nodes)
2.4.1)
Preparing G98 Program and Specifying Locations
for G98 Working and Checkpoint Files
A
sample 16-CPU program is shown as follows:
e.g. orbit[9]% pico program16.g98
%nproclinda
=16
%mem
=400Mb
%rwf
=/scratch2/s123456/myjob
%d2e=/scratch2/s123456/myjob
%scr
=/scratch2/s123456/myjob
%chk
=/scratch2/s123456/myjob
#P
TEST UHF/6-31G* FORCE
Gaussian
Test Job 30
Methylene
uhf forces
0
3
C
H
1 R
H
1 R 2 A
R
1.08
A
135.0
The
Gaussian 98 Linda version is used for programs
running with 8 or more processors. Users note
that %nproclinda= should be specified instead
of %nproc= .
2.4.2)
Preparing Batch Job Script for Your G98 Program
A
sample 16-CPU job script is shown as follows:
e.g. orbit[9]% pico program16.job
#!/bin/
csh
#@
initialdir = /users/student/s123456
#@
node = 4
#@
tasks_per_node = 4
#@
node_usage = not_shared
#@
class = q4n16g
#@
network.mpi = css0,not_shared,us
#@
job_type = parallel
#@
queue
setenv PATH $PATH":/usr/local/g98/linda/ibm-aix4.3-I8/bin"
setenv g98root /usr
/local
source $g98root/g98/bsd/g98.login
setenv GAUSS_EXEDIR $g98root/g98
setenv hostlist hostlistfile
rm -f $hostlist
echo $LOADL_PROCESSOR_LIST > templist
awk
'{ gsub (/orbit/,"orchid");print $1,$5,$9,$13}'
templist | tr " " "\n" > $hostlist
setenv GAUSS_LFLAGS
'-vv + kaon -n 15:16 -wait 3600 - workerwait
3600 - nodefile $hostlist -mp 4'
setenv K5MUTE 1
/usr/local/g98/bsd/clearipc
timex g98l < program16.g98 >& program16.out
/usr/local/g98/bsd/clearipc
Users
should note that g98l (Gaussian 98
Linda Version ) is used instead of g98 in the
1-CPU and 4-CPU job scripts.
#!/
bin/ csh - To use the C Shell, /bin/ csh ,
Command
started with "# @" is for LoadLeveler .
initialdir
: Specifies the path of working directory. You
should change it according to your need
node
: Specifies the number of nodes (each node consists
of 4 processors)
tasks_per_node
: Specifies the number of processors per node
(should be 4 in this example)
node_usage
: Specifies whether you want to share your node
with other jobs (not_shared in this example)
network.mpi
: specifies the mode for inter-node communication
job_type
: specifies the job type, parallel or serial
queue
: queues up the job
setenv PATH $PATH":/usr/local/g98/linda/ibm-aix4.3-I8/bin"
setenv g98root
/ usr /local
source $g98root/g98/bsd/g98.login
setenv GAUSS_EXEDIR $g98root/g98
is
used to set up a working environment for the
Gaussian 98 program
setenv hostlist hostlistfile
rm -f $hostlist
echo $LOADL_PROCESSOR_LIST > templist
awk
'{ gsub (/orbit/,"orchid");print $1,$5,$9,$13}'
templist | tr " " "\n" > $hostlist
setenv GAUSS_LFLAGS
'-vv + kaon -n 15:16 -wait 3600 - workerwait
3600 - nodefile $hostlist -mp 4'
setenv K5MUTE 1
is
used to create the node list for Gaussian 98
program.
/usr/local/g98/bsd/clearipc
timex g98l < program16.g98 >& program16.out
/usr/local/g98/bsd/clearipc
is
used to cleanup the previous unreleased resources
and run your G98 program "program16.g98".
program16.out is the output file for the program.
2.4.3)
Job Submission
After
you have prepared this script file, you can submit
it by:
[orbit]%
llsubmit program16.job
System
will echo a message "llsubmit: The job "orbit201.itsc.cuhk.edu.hk.XXXX" has
been submitted." to you. Where XXXX is your
job ID.
If
you want to cancel the job, you can type
[orbit]%
llcancel <job ID>
Where <job
id> is above job ID XXXX. You can get it again
by using llq as illustrated in the previous section.
3. G98
Program Results
When
the G98 programs are successfully run, the end
of above output files (e.g. program1.out, program4.out,
program8.out or program16.out) should have the
following lines similar to those listed below. A
statement of "Normal termination
of Gaussian 98." and execution timing figures
for the jobs would be shown.
......
......
7335,3,-179.992017,0\H,8,1.099967,7,109.465529,6,60.033096,0\H,8,1.099
782,7,109.498585,6,-59.999782,0\O,8,1.430841,7,109.473813,6,179.998563
,0\H,11,0.949502,8,109.442674,7,179.982067,0\\Version=IBM-RS6000-G98Re
vA.11.2\HF=-304.5246238\MP2=-305.6229662\RMSD=9.830e-09\PG=C01 [X(C4H6
O2)]\\@
A SUCCESSFUL PURSUIT OF SCIENCE MAKES A MAN THE
BENEFACTOR OF ALL MANKIND OF EVERY AGE.
-- JOSEPH PRIESTLEY, "EXPERIMENTS AND
OBSERVATIONS ON DIFFERENT KINDS OF AIR", 1774
Job cpu time: 0 days 1 hours 0 minutes 42.7 seconds.
File lengths (MBytes): RWF= 607 Int= 0 D2E= 0 Chk= 9 Scr= 3897
Normal termination of Gaussian 98.
real 67731.20
user 67382.83
sys 73.39
4.
Performance Analysis
There
are two simple methods to check the performance
of your G98 programs.
1.
TIMEX
Timex
is a user application to find the running time
of a program. The timex command reports, in seconds,
the elapsed time, user time, and system execution
time for a command. You can find the similar
figures in the output file of your program.
real 67731.20
user 67382.83
sys 73.39
You
need to consider the elapsed and user time. For
above example, the user time is almost the same
as the elapsed time. It indicates that the job
has been efficiently run.
Detailed
explanation of timex can be achieved by invoking "man timex" at
the system prompt.
e.g.
orbit[10]% man timex
2.
SAR
Sar
is a tool used to collect or report computer
system activity information.
%sar
AIX
orbit201 3 4 000B3FAF4C00 12/09/03
00:00:01
% usr %sys % wio %idle
00:20:01 96 1 0 3
00:40:01 100 0 0 0
01:00:01 95 1 0 4
01:20:01 97 1 0 2
01:40:01 98 1 0 2
02:00:00 99 0 0 1
02:20:00 98 1 0 1
02:40:00 98 0 0 2
Average 98 1 0 2
%usr Reports
the percentage of time the cpu or cpus spent
in execution at the user (or application) level.
%sys Reports
the percentage of time the cpu or cpus spent
in execution at the system (or kernel) level.
%idle Reports
the percentage of time the cpu or cpus were idle
with no outstanding disk I/O requests.
In
above example, the figure indicated by "%usr" is
98%. It indicates that the processor (s) of the
system is always spent in execution for user
codes. The job has been efficiently run.
Parallel
jobs are dispatched to run on the SP thin nodes http://www.cuhk.edu.hk/itsc/compenv/research-computing/cluster/hardware.html dynamically. Users
can know the names of assigned nodes, i.e. orchid[1-16]
after their jobs were started. This can be done
by invoking a "llq -l Job_ID | grep orchid" command.
orbit201-[2]% llq -l 2637 | grep
orchid
Allocated Hosts : orchid9.itsc.cuhk.edu.hk::css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M)
+ orchid11.itsc.cuhk.edu.hk::css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M)
+ orchid4.itsc.cuhk.edu.hk::css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M)
In above example, 3 SP thin nodes orchid9, orchid11
and orchd4 are assigned for the job. Since
each of the SP thin nodes is a computer system,
you can simply login one of the nodes and
invoke the sar command to check the performance.
orbit[3]%
rlogin orchid9
orbit[4]%
sar
orbit[5]%
exit
Then,
you can use the sar command in the node to check
the cpu utilization.
A
good parallel program is decided by the processor
utilization. Ideally, it should be 100%.
Generally speaking, a parallel program with
cpu utilization of around 70% is said to be average.
For that with utilization over 80% is said to
be a good parallel program.
5.
Tips
6.
Enquiries
Please
contact ITSC Electronic HelpDesk at http://helpdesk.itsc.cuhk.edu.hk/ |