The Chinese University of Hong Kong Information Technology Services Centre 資訊科技服務處 香港中文大學
bullet eLearning
bullet High-Performance Computing
bullet News
bullet Facilities
bullet Orbit
bullet Hardware
bullet Software
bullet User Guidelines
bullet Starter Programmes
bullet Account Application and Resource Allocation
bullet System Design
bullet Use of Application
bullet Consultation & Enquiry
bullet User Manual & Documentation
bullet Courses
bullet FAQs
bullet Maintenance Schedule
bullet Orpin
bullet Organon
bullet Orlon
bullet Bioinformatics Services
bullet Research Projects
bullet Seminars
bullet Technology
bullet HPC Sites
bullet Consultation
bullet iHome - CUHK Community Homepage Server
bullet Logic - General-purpose UNIX Server
bullet PC LAN Services at User Areas
   

This guide shows you how to run serial and parallel Gaussian 98 programs under batch job management system "LoadLeveler" on the High Performance Computing (HPC) Cluster - Orbit. 

After reading this guide, you should be able to know the basic procedures of how to run serial and parallel Gaussian 98 (G98) programs on Orbit. Sample Gaussian 98 programs and LoadLeveler job scripts are provided and explained in details. In addition, simple performance analysis for Gaussian 98 programs and usage tips are also included. 

1. Access to the HPC Cluster "Orbit"

1.1) Access on Campus

1.2) Access off Campus

1.3) File Transfer from your PC to Orbit

2. Sample G98 Programs and Batch Job Scripts 

2.1) Single processor

   2.1.1) Preparing G98 Program, Specifying Locations for G98 Working and Checkpoint Files

   2.1.2) Preparing Batch Job Script for G98 Program

   2.1.3) Job Submission

2.2) 4 Processors

   2.2.1) Preparing G98 Program, Specifying Locations for G98 Working and Checkpoint Files

   2.2.2) Preparing Batch Job Script for G98 Program

   2.2.3) Job Submission 

2.3) 8 Processors (2 nodes)

   2.3.1) Preparing G98 Program, Specifying Locations for G98 Working and Checkpoint Files

   2.3.2) Preparing Batch Job Script for G98 Program

   2.3.3) Job Submission 

2.4) 16 Processors ( 4 nodes)

   2.4.1) Preparing G98 Program, Specifying Locations for G98 Working and Checkpoint Files

   2.4.2) Preparing Batch Job Script for G98 Program

   2.4.3) Job Submission 

3. G98 Program Results

4. Performance Analysis

5. Tips

6. Enquiries


1. Access to the HPC Cluster "Orbit"  

Before establishing a connection across the Internet from your PC to Orbit, you need to install a free licensed Secure Shell (SSH) client program  http://www.cuhk.edu.hk/itsc/ssh/  into your PC.

1.1) Access on Campus 

You can directly connect your PC to Orbit by running the SSH program.

On the File menu of the SSH program, you should select Quick Connect option. Then, you can input the Host Name of the cluster, i.e. orbit.itsc.cuhk.edu.hk and User Name (e.g. s012345). Once you press the OK button, the program will prompt you for a Password of the User Name, i.e. the password printed on the Computer Accounts Application Reply Slips  of the HPC cluster. When the password is correct, a terminal session will be created and a system prompt [orbit201]% will be echoed to the screen.

1.2) Access off Campus 

You can access the campus network using your PC/laptop at home or while traveling via a Virtual Private Network connection. Virtual Private Network (VPN) technology uses encryption and some security mechanisms to ensure that only authorized CUHK users can access the network. For this connection, you need to have your Campus-wide E-mail System (CWEM) password ready for authentication.

To establish a VPN connection for your PC, you should follow the procedures posted at  http://www.cuhk.edu.hk/itsc/network/vpn/index.html.

Once you successfully build the connection, you can run the SSH program and login the cluster as described in the above section 1.1.

   

Following tasks are allowed to perform on the login workstation:

  • Editing and compiling of programs
  • Setting up input data files
  • Editing of jobs and Load Leveler batch job scripts
  • Job submission and monitoring
  • Checking output files from batch runs
  • Running a short interactive job

1.3) File Transfer from your PC to Orbit

When you need to upload or download files to/from your cluster account, you can do so by using the SSH Secure File Transfer function of the SSH client program.

When your PC is connected to Orbit, you can open a new SSH File Transfer Window by clicking the yellow button on the second row of the SSH window or in Window Tool Bar -> New File Transfer. Then you can see the following new window:

The directories and files listed at the left hand column are those stored in your PC, while the right hand column are the details of your files and directories residing on Orbit.   

You can choose the file you want to upload/download from your Orbit account. For example, you can right click on the file and choose upload/download option. The bottom dialog shows the status of the file transfer.

 

2. Sample G98 Programs and Batch Job Scripts

2.1) Single processor

 

2.1.1) Preparing G98 Program, Specifying Locations for G98 Working and Checkpoint Files

A sample G98 program "program1.g98" is given as follows:

#P TEST UHF/6-31G* FORCE

Gaussian Test Job 30

Methylene uhf forces

0 3

C

H 1 R

H 1 R 2 A

R 1.08

A 135.0

To make the above program successfully run on Orbit, you need to specify the locations for G98 checkpoint, scratch files, etc. on Orbit.

 rwf specifies the location for the Read-Write file.

 int  specifies the location for the two-electron integral file.

 d2e specifies the location for the two-electron integral derivative file.

 scr  specifies the location for the scratch file.

 chk  specifies the location for the checkpoint file.

You should create a working directory to store above five files and invoke a pico editor to add the following paths for those files.

           e.g. orbit[1]% mkdir   /scratch2/s123456

           e.g. orbit[2]% pico  program1.g98

% rwf =/scratch2/s123456/myjob

% int =/scratch2/s123456/myjob

%d2e=/scratch2/s123456/myjob

% scr =/scratch2/s123456/myjob

% chk =/scratch2/s123456/myjob

The locations of working files should be changed according to your account (e.g. s123456 in this example) and assigned disk storage quota (scratch or /scratch2).

A complete G98 program is shown as follows:  

%mem =400Mb

%rwf =/scratch2/s123456/myjob

%int =/scratch2/s123456/myjob

%d2e=/scratch2/s123456/myjob

%scr =/scratch2/s123456/myjob

%chk =/scratch2/s123456/myjob

#P TEST UHF/6-31G* FORCE

Gaussian Test Job 30

Methylene uhf forces

0 3

C

H 1 R

H 1 R 2 A

R 1.08

A 135.0

2.1.2) Preparing Batch Job Script for Your G98 Program

The Cluster "Orbit" uses Load Leveler batch job management system to dispatch user jobs. The Load Leveler is an efficient job scheduling and management system which provides functions of job submission, cancellation and backfilling, etc.

After you have prepared your G98 program, you also need to prepare a Load Leveler job script for above program "program1.g98". This job script is a job control language that serves batch job users who want to queue up and run jobs under the Load Leveler batch job management system.

The following is a listing of a job queue status of the Load Leveler.

orbit[3] % llq

Id                           Owner    Submitted        ST    PRI  Class        Running On
------------------------ ---------- ----------- -- --- ------------ -----------
orbit201.6540.0    b113746    11/15 15:03    R    50    q1n4g      orchid1
orbit201.6694.0    b012724    11/29 14:55    R    100  q8n32g     orchid4
orbit201.6687.0    s030774    11/29 10:59    R    50    q1c750m    orbit202
orbit201.6685.0    s010278    11/29 10:56    R    50    q4n16g        orchid2
orbit201.6705.0    s010149    11/30 22:55    I    50    q1c750m
orbit201.6710.0    s030776    12/1 01:29      I    50    q4n16g
orbit201.6704.0    s800000    11/30 22:48    I    50    q4n16g
orbit201.6602.0    s030776    11/21 09:28    I    50    q1n4g
orbit201.6709.0    s020403    12/1 01:09      I    50    q4n16g
orbit201.6708.0    b057779    11/30 23:05    I    50    q1n4g
orbit201.6707.0    s010149    11/30 23:00    I    50    q1c750m
orbit201.6706.0    s010149    11/30 22:57    I    50    q1c750m
orbit201.6672.0    s010149    11/26 21:40    I    70    q1c750m
orbit201.6688.0    s030774    11/29 11:00    I    50    q1c750m
orbit201.6573.0    b113746    11/19 15:24    I    50    q1n4g
orbit201.6572.0    b113746    11/19 15:24    I    50    q1n4g

16 job steps in queue, 12 waiting, 0 pending, 4 running, 0 held

Id: the job id allocated by Load Leveler

Owner: Username of the owner of the submitted job

Submitted: the submit time of the job

ST: Status of the job, R = Running, I = Idle

PRI: Priority of the job, every user will have its own priority assigned

Class: The class of the job and it also stands for the queue that the job is queuing

Running On: The primary node where the job is running

To make the program run under the control of Load Leveler, you need to use the Load Leveler batch job statements. A sample job script "program1.job" is shown as follows:

orbit[4] % pico program1.job

!/bin/ csh

#@ initialdir = /users/student/s123456/

#@ class = q1c1g

#@ notify_user = s012345@mailserv.cuhk.edu.hk

#@ queue

setenv g98root  /usr /local

source $g98root/g98/bsd/g98.login

setenv GAUSS_EXEDIR  $g98root/g98

/usr/local/g98/bsd/clearipc

timex g98 < program1.g98 >& program1.out

/usr/local/g98/bsd/clearipc

Meaning of above job script statements

#!/ bin/ csh -  To use C Shell /bin/ csh

Command started with "# @" is for the use of LoadLeveler

Initialdir : Specifies the path of working directory. You should change it according to your need

class : Specifies the class (i.e. queue name) of your job. Available class: q8n32g, q4n16g, q2n8g, q1n4g, q1c750m, q1c500m, q6n24g and q1c1g, etc. You should choose a suitable class for your program.

          'q' stands for "queue", 'n' stands for "node", 'g' stands for " Gbytes memory".

To see an updated class list, you can invoke the command llclass . 

[orbit] 6% llclass

Name MaxJobCPU MaxProcCPU Free Max Description

d+hh:mm:ss d+hh:mm:ss Slots Slots

q4n16g -1 -1 0 60

q5n20g -1 -1 7 44

q6n24g -1 -1 7 60

q7n28g -1 -1 7 60

q8n32g -1 -1 7 60

q12n48g -1 -1 7 60

q16n64g -1 -1 10 49

q2n8gf -1 -1 0 16

q3n12g -1 -1 7 44

q4n16gf -1 -1 0 16

q1c500m -1 -1 0 1

inter_class -1 -1 0 16

q1n4g -1 -1 3 20

q2n8g -1 -1 3 20

qtest1 -1 -1 6 12

q2c2g -1 -1 2 4

q1c1g -1 -1 1 3

q1c750m -1 -1 0 4

q1c600m -1 -1 1 1

--------------------------------------------------------------------------------

"Free Slots" values of the classes "q3n12g", "q4n16g", "q5n20g", "q6n24g", "q7n2

8g", "q8n32g", "q12n48g", "q16n64g", "q2n8gf", "q4n16gf", " inter_class ", "q1n4g"

, "q2n8g", "qtest1", "q2c2g" are constrained by the MAX_STARTERS limit(s).

"Free Slots" values of the classes "q4n16g", "q1c750m", "q1c500m" are constraine

d by the MAXJOBS limit(s).

queue : queue up the job

setenv g98root / usr /local

source $g98root/g98/bsd/g98.login

setenv GAUSS_EXEDIR $g98root/g98

set up a working environment for the G98 program

/usr/local/g98/bsd/clearipc

timex g98 < program1.g98 >& program1.out

/usr/local/g98/bsd/clearipc

"program1.g98" is the input file of the Gaussian 98 program and "program1.out" is the output filename of the program.

Statement /usr/local/g98/bsd/clearipc is used to clean up the previously unreleased computing resources such as inter process memory.

Backfilling

Sometimes, you want to test/run your G98 job "program1.job" for a short period of time (say 3 minutes) and to get a shorter turnaround response. The backfilling technique can meet these purposes. From the technical viewpoint, backfilling is the capability to schedule a job that is short in duration, and which requires a small number of processors/nodes, before other jobs. This is very useful when you only want to test the accuracy of your Load Leveler job script "program1.job" and G98 program settings "program1.g98". 

In order to use backfilling in Load Leveler, you need to add the following line in your job script "program1.job":

# @ wall_clock_limit = 00:3:00

" 00:3:00 " specifies the maximum execution time of your job. Your job will be terminated no matter it has been finished or not. 

2.1.3) Job Submission 

After you have prepared the job script file, you can submit it by:

[orbit]% llsubmit program1.job

System will echo a message "llsubmit: The job "orbit201.itsc.cuhk.edu.hk.XXXX" has been submitted." to you. Where XXXX is your job ID.If you want to cancel the job, you can type

[orbit]% llcancel <job ID> Where <job id> is above job ID XXXX. You can get it again by using llq as illustrated in the previous section. 

2.2) 4 Processors

2.2.1) Preparing G98 Program and Specifying Locations for G98 Working and Checkpoint Files

To run the G98 program "program1.g98" under a 4-processor environment, you need to insert an additional statement "% nproc = 4 " into the G98 program, i.e. program1.g98.  This'll make full use of the processor resources of a single SP thin node http://www.cuhk.edu.hk/itsc/compenv/research-computing/cluster/hardware.html when the program is run. A sample 4-CPU G98 program is given as follows:  

         orbit[7]% pico program4.g98

%nproc =4

%mem =400Mb

%rwf =/scratch2/s123456/myjob

%d2e=/scratch2/s123456/myjob

%scr =/scratch2/s123456/myjob

%chk =/scratch2/s123456/myjob

#P TEST UHF/6-31G* FORCE

Gaussian Test Job 30

Methylene uhf forces

0 3

C

H 1 R

H 1 R 2 A

R 1.08

A 135.0

2.2.2) Preparing Batch Job Script for Your G98 Program

A sample 4-CPU job script is shown as follows:

       e.g. orbit[9]% pico program4.job

#!/bin/ csh

# @ initialdir = /users/student/s123456/

# @ node = 1

# @ tasks_per_node = 4

# @ class = q1n4g

# @ job_type = parallel

#@ notify_user = s012345@mailserv.cuhk.edu.hk

# @ queue

setenv  g98root  /usr /local

source $g98root/g98/bsd/g98.login

setenv GAUSS_EXEDIR $g98root/g98

/usr/local/g98/bsd/clearipc

timex g98 < program4.g98 >& program4.out

/usr/local/g98/bsd/clearipc

node : Specifies the number of nodes you are going to use. It should be 1 in this example. 

tasks_per_node : Specifies the number of processors per node you are going to use. It should be 4 in this example.

"program4.g98" is the input file of the Gaussian 98 program and "program4.out" is the output filename of the program.

 2.2.3) Job Submission 

After you have prepared the job script file, you can submit it by:

        [orbit]% llsubmit program4.job

System will echo a message "llsubmit: The job "orbit201.itsc.cuhk.edu.hk.XXXX" has been submitted." to you. Where XXXX is your job ID.

If you want to cancel the job, you can type

       [orbit]% llcancel <job ID>

Where <job id> is above job ID XXXX. You can get it again by using llq as illustrated in the previous section. 

2.3) 8 Processors ( 2 nodes)

2.3.1) Preparing G98 Program and Specifying Locations for G98 Working and Checkpoint Files

 A sample 8-CPU program is shown as follows:

       e.g. orbit[9]% pico program8.g98

% nproclinda =8

% mem =400Mb

% rwf =/scratch2/s123456/myjob

%d2e=/scratch2/s123456/myjob

% scr =/scratch2/s123456/myjob

% chk =/scratch2/s123456/myjob

#P TEST UHF/6-31G* FORCE

Gaussian Test Job 30

Methylene uhf forces

0 3

C

H 1 R

H 1 R 2 A

R 1.08

A 135.0

Users should note that %nproclinda = is used instead of %nproc=. Others remain unchanged.

2.3.2) Preparing Batch Job Script for G98 Program

 A sample 8-CPU job script is shown as follows:

       e.g. orbit[9]% pico program8.job

#!/bin/csh

#@ initialdir = /users/student/s123456

#@ node = 2

#@ tasks_per_node = 4

#@ node_usage = not_shared

#@ class = q2n8g

# @ network.mpi = css0,not_shared,us

#@ job_type = parallel

#@ notify_user = s012345@mailserv.cuhk.edu.hk

# @ queue

setenv PATH $PATH":/usr/local/g98/linda/ibm-aix4.3-I8/bin"

setenv g98root /usr/local

source $g98root/g98/bsd/g98.login

setenv GAUSS_EXEDIR $g98root/g98

setenv hostlist hostlistfile

rm -f $hostlist

echo $LOADL_PROCESSOR_LIST > templist

awk '{ gsub (/orbit/,"orchid");print $1,$5}' templist | tr " " "\n">$hostlist

setenv GAUSS_LFLAGS '+ kaon -n 7:8 -wait 3600 - workerwait 3600 - nodefile $hostlist -mp 4'

setenv K5MUTE 1

/usr/local/g98/bsd/clearipc

timex g98l < program8.g98 >& program8.out

/usr/local/g98/bsd/clearipc

Users should note that g98l (Gaussian 98 Linda Version ) is used instead of g98 in the job script.

node : Specifies the number of nodes you are going to use (Note that each node consists of 4 processors)

tasks_per_node : Specifies the number of processors per node you are going to use ( Max = 4)

node_usage : Specifies whether you want to share the assigned nodes with other jobs 

network.mpi : Specifies the mode for inter-node communication (us or ip)

job_type : Specifies the job is parallel job

queue : queue up the job

setenv PATH $PATH":/usr/local/g98/linda/ibm-aix4.3-I8/bin"

setenv g98root /usr /local

source $g98root/g98/bsd/g98.login

setenv GAUSS_EXEDIR $g98root/g98

is used to set up a working environment for the Gaussian 98 program

setenv hostlist hostlistfile

rm -f $hostlist

echo $LOADL_PROCESSOR_LIST > templist

awk '{ gsub (/orbit/,"orchid");print $1,$5}' templist | tr " " "\n" > $hostlist

setenv GAUSS_LFLAGS '+ kaon -n 7:8 -wait 3600 - workerwait 3600 - nodefile $hostlist -mp 4'

setenv K5MUTE 1

is used to create the host list (i.e. hostname of SP thin thin nodes) for the Gaussian 98.

 /usr/local/g98/bsd/clearipc

timex g98l < program8.g98 >& program8.out

/usr/local/g98/bsd/clearipc

"program8.g98" is the input file of the Gaussian 98 program and "program8.out" is the output filename of the program.

.2.3.3) Job Submission 

 After you have prepared this script file, you can submit it by:

        [orbit]% llsubmit program8.job

System will echo a message "llsubmit: The job "orbit201.itsc.cuhk.edu.hk.XXXX" has been submitted." to you. Where XXXX is your job ID.

If you want to cancel the job, you can type

       [orbit]% llcancel <job ID>

Where <job id> is above job ID XXXX. You can get it again by using llq as illustrated in the previous section. 

 

2.4) 16-processors ( 4 nodes)

2.4.1) Preparing G98 Program and Specifying Locations for G98 Working and Checkpoint Files

 A sample 16-CPU program is shown as follows:

       e.g. orbit[9]% pico program16.g98

%nproclinda =16

%mem =400Mb

%rwf =/scratch2/s123456/myjob

%d2e=/scratch2/s123456/myjob

%scr =/scratch2/s123456/myjob

%chk =/scratch2/s123456/myjob

#P TEST UHF/6-31G* FORCE

Gaussian Test Job 30

Methylene uhf forces

0 3

C

H 1 R

H 1 R 2 A

R 1.08

A 135.0

The Gaussian 98 Linda version is used for programs running with 8 or more processors. Users note that %nproclinda= should be specified instead of %nproc= .

2.4.2) Preparing Batch Job Script for Your G98 Program

 A sample 16-CPU job script is shown as follows:

      e.g. orbit[9]% pico  program16.job

#!/bin/ csh

#@ initialdir = /users/student/s123456

#@ node = 4

#@ tasks_per_node = 4

#@ node_usage = not_shared

#@ class = q4n16g

#@ network.mpi = css0,not_shared,us

#@ job_type = parallel

#@ queue

setenv PATH $PATH":/usr/local/g98/linda/ibm-aix4.3-I8/bin"

setenv  g98root /usr /local

source $g98root/g98/bsd/g98.login

setenv GAUSS_EXEDIR $g98root/g98

setenv  hostlist hostlistfile

rm -f  $hostlist

echo $LOADL_PROCESSOR_LIST > templist

awk '{ gsub (/orbit/,"orchid");print $1,$5,$9,$13}' templist | tr " " "\n" > $hostlist

setenv GAUSS_LFLAGS '-vv + kaon -n 15:16 -wait 3600 - workerwait 3600 - nodefile $hostlist -mp 4'

setenv K5MUTE 1

/usr/local/g98/bsd/clearipc

timex  g98l < program16.g98 >& program16.out

/usr/local/g98/bsd/clearipc

Users should note that g98l (Gaussian 98 Linda Version ) is used instead of g98 in the 1-CPU and 4-CPU job scripts.

#!/ bin/ csh - To use the C Shell, /bin/ csh ,

Command started with "# @" is for LoadLeveler .

initialdir : Specifies the path of working directory. You should change it according to your need

node : Specifies the number of nodes (each node consists of 4 processors)

tasks_per_node : Specifies the number of processors per node (should be 4 in this example) 

node_usage : Specifies whether you want to share your node with other jobs (not_shared in this example)

network.mpi : specifies the mode for inter-node communication

job_type : specifies the job type, parallel or serial

queue : queues up the job

setenv PATH $PATH":/usr/local/g98/linda/ibm-aix4.3-I8/bin"

setenv g98root / usr /local

source $g98root/g98/bsd/g98.login

setenv GAUSS_EXEDIR $g98root/g98

is used to set up a working environment for the Gaussian 98 program

setenv hostlist hostlistfile

rm -f $hostlist

echo $LOADL_PROCESSOR_LIST > templist

awk '{ gsub (/orbit/,"orchid");print $1,$5,$9,$13}' templist | tr " " "\n" > $hostlist

setenv GAUSS_LFLAGS '-vv + kaon -n 15:16 -wait 3600 - workerwait 3600 - nodefile $hostlist -mp 4'

setenv K5MUTE 1

is used to create the node list for Gaussian 98 program.

/usr/local/g98/bsd/clearipc

timex g98l < program16.g98 >& program16.out

/usr/local/g98/bsd/clearipc

is used to cleanup the previous unreleased resources and run your G98 program "program16.g98". program16.out is the output file for the program. 

2.4.3) Job Submission 

 After you have prepared this script file, you can submit it by:

        [orbit]% llsubmit program16.job

System will echo a message "llsubmit: The job "orbit201.itsc.cuhk.edu.hk.XXXX" has been submitted." to you. Where XXXX is your job ID.

If you want to cancel the job, you can type

       [orbit]% llcancel <job ID>

Where <job id> is above job ID XXXX. You can get it again by using llq as illustrated in the previous section. 

 

3.  G98 Program Results

When the G98 programs are successfully run, the end of above output files (e.g. program1.out, program4.out, program8.out or program16.out) should have the following lines similar to those listed below. A statement of "Normal termination of Gaussian 98." and execution timing figures for the jobs would be shown.

......

......

 7335,3,-179.992017,0\H,8,1.099967,7,109.465529,6,60.033096,0\H,8,1.099
782,7,109.498585,6,-59.999782,0\O,8,1.430841,7,109.473813,6,179.998563
,0\H,11,0.949502,8,109.442674,7,179.982067,0\\Version=IBM-RS6000-G98Re
vA.11.2\HF=-304.5246238\MP2=-305.6229662\RMSD=9.830e-09\PG=C01 [X(C4H6
O2)]\\@


A SUCCESSFUL PURSUIT OF SCIENCE MAKES A MAN THE
BENEFACTOR OF ALL MANKIND OF EVERY AGE.
-- JOSEPH PRIESTLEY, "EXPERIMENTS AND
OBSERVATIONS ON DIFFERENT KINDS OF AIR", 1774
Job cpu time: 0 days 1 hours 0 minutes 42.7 seconds.
File lengths (MBytes): RWF= 607 Int= 0 D2E= 0 Chk= 9 Scr= 3897
Normal termination of Gaussian 98.

real 67731.20
user 67382.83
sys 73.39
 

4. Performance Analysis

There are two simple methods to check the performance of your G98 programs.  

1. TIMEX

Timex is a user application to find the running time of a program. The timex command reports, in seconds, the elapsed time, user time, and system execution time for a command. You can find the similar figures in the output file of your program.

real 67731.20
user 67382.83
sys 73.39

You need to consider the elapsed and user time. For above example, the user time is almost the same as the elapsed time. It indicates that the job has been efficiently run.

Detailed explanation of timex can be achieved by invoking "man timex" at the system prompt.

    e.g. orbit[10]% man timex

2. SAR

Sar is a tool used to collect or report computer system activity information. 

%sar

AIX orbit201 3 4 000B3FAF4C00 12/09/03

00:00:01 % usr %sys % wio %idle
00:20:01     96    1        0    3
00:40:01     100  0        0    0
01:00:01     95    1        0    4
01:20:01     97    1        0    2
01:40:01     98    1        0    2
02:00:00     99    0        0    1
02:20:00     98    1        0    1
02:40:00     98    0        0    2

Average     98    1        0    2

%usr  Reports the percentage of time the cpu or cpus spent in execution at the user (or application) level.

%sys  Reports the percentage of time the cpu or cpus spent in execution at the system (or kernel) level.

%idle  Reports the percentage of time the cpu or cpus were idle with no outstanding disk I/O requests.
 

In above example, the figure indicated by "%usr" is 98%. It indicates that the processor (s) of the system is always spent in execution for user codes. The job has been efficiently run.
 

Parallel jobs are dispatched to run on the SP thin nodes  http://www.cuhk.edu.hk/itsc/compenv/research-computing/cluster/hardware.html dynamically. Users can know the names of assigned nodes, i.e. orchid[1-16] after their jobs were started. This can be done by invoking a "llq -l Job_ID | grep orchid" command. 

    orbit201-[2]% llq -l 2637 | grep orchid

 Allocated Hosts : orchid9.itsc.cuhk.edu.hk::css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M)
+ orchid11.itsc.cuhk.edu.hk::css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M)
+ orchid4.itsc.cuhk.edu.hk::css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M),css0(-1,MPI,IP,0M)
 

In above example, 3 SP thin nodes orchid9, orchid11 and orchd4 are assigned for the job. Since each of the SP thin nodes is a computer system, you can simply login one of the nodes and invoke the sar command to check the performance.

   orbit[3]% rlogin orchid9

    orbit[4]% sar 

    orbit[5]% exit

Then, you can use the sar command in the node to check the cpu utilization.

A good parallel program is decided by the processor utilization. Ideally, it should be 100%. Generally speaking, a parallel program with cpu utilization of around 70% is said to be average. For that with utilization over 80% is said to be a good parallel program.

5. Tips

6. Enquiries

Please contact ITSC Electronic HelpDesk at http://helpdesk.itsc.cuhk.edu.hk/

 

Need Help?
Please send your problems/requests to
http://helpdesk.itsc.cuhk.edu.hk