![]() |
|
||
| |
|||
SwarmSwarm is a program designed to simplify submitting a group of commands to the cluster. Some programs do not scale well and thus are not suited to true parallelizing. Other programs may be such that each individual job is very short, but many such jobs need to be run. Such programs are well suited to running 'swarms of single-threaded jobs'. The Swarm program simplifies this process.Swarm reads a list of commands from cmdfile then automatically submits those commands to the PBS batch system to execute. Swarm runs one command for each processor on a node, making optimum use of a node (thus a node with 2 processors will execute two commands simultaneously). When there are hundreds or thousands of commmands, use the -b option to bundle groups of commands to be run sequentially per processor. Commands in the command file should appear just as they would be entered on a command line. STDOUT (or STDERR) output that isn't explicitly directed elsewhere will be sent to a file named swarmPIDn#.o (or .e) in your current working directory. A line where the first non-whitespace character is "#" is considered a comment and is ignored. Swarm creates a .swarm directory in your current working directory, and creates an executable script for every 2 commands in your command file. These scripts are automatically deleted as the final step when they are executed. (If the -d (debug) option is specified, the scripts are not submitted to batch, so they will not be deleted.)
SYNOPSIS
swarm -f cmdfile [ -n # ] [-b #] [ -d ] [ -h ] [ qsub-options ]
OPTIONS
The -f cmdfile option is mandatory, all others are optional.
-f cmdfile specify the file containing a list of commands, one
command per line. You may use ";" to separate several
commands on a line, and these will be executed
sequentially.
-d debug mode. The command file is read, command scripts
are generated and saved in the .swarm directory, and
debugging information is printed, but the scripts are
not submitted to the batch system. The qsub
command(s) that would have been issued is printed
as part of the debug output and can be used to
submit the script(s) manually if desired.
-b # bundle mode. swarm runs one command per processor
by default. Use the bundle option to run "#" commands
per processor, one after the other. The advantages of
bundling include fewer swarm jobs and output/error
files, lower overhead due to scheduling and job
startup, and disk file cache benefits under certain
circumstances.
-n # number of processes to run per node; swarm sets
this number to 2 by default (the NIH Biowulf comprises
2-processor nodes).
-h prints help message
OUTPUTSTDOUT and STDERR output from processes executed under swarm will be directed to a file named swarmPIDn#.o (or .e), for instance swarm2587n1.o (or swarm2587n1.e). Since this can be confusing (with multiple processes writing to the same file) it is a good idea to explicitly redirect output on the command line using ">".Be aware of programs that write directly to a file using a fixed filename. If you run multiple instances of such pro grams then for each instance you will need to a) change the name of the file or b) alter the path to the file. See the EXAMPLES section for some ideas. EXAMPLESTo see how swarm works, first create a file containing a few simple commands, then use swarm to submit them to the batch queue:
$ cat > cmdfile
date
hostname
ls -l
^D
$ swarm -f cmdfile
Use qstat -u your-user-id to monitor the status of your
request; an "R" in the "S"tatus column indicates your job
is running (see qstat(1) for more details). This particular
example will probably run to completion before you can
give the qstat command. To see the output from the commands, see the files named "swarmPIDn#.o".
For each invocation of the program the names for the input and output files vary:
$ cat > runbix
./bix < testin1 > testout1
./bix < testin2 > testout2
./bix < testin3 > testout3
./bix < testin4 > testout4
^D
If a program writes to a fixed filename, then you may need to run the program in different directories. First create the necessary directories (for instance run1, run2), and in the swarm command file cd to the unique output directory before running the program: (cd using either an absolute path beginning with "/" or a relative path from your home directory). Lines with leading "#" are considered comments and ignored.
$ cat > batchcmds
# Run ped program using different directory
# for each run
cd pedsystem/run1; ../ped
cd pedsystem/run2; ../ped
cd pedsystem/run3; ../ped
cd pedsystem/run4; ../ped
...
$ swarm -f batchcmds
For large numbers of commands, especially if the jobs are small, it is advantageous to 'bundle' the jobs with the -b flag. If the command file contains 2500 commands, the following swarm command will group them into bundles of 40 commands each, producing 64 bundles. Swarm will then submit two bundles as a single swarm job, so there will be 32 (2500/64) swarm jobs.
swarm -f cmdfile -b 40
Note that commands in a bundle will run sequentially on the assigned node. Ideally, the
bundling number should be chosen so that there are as many jobs as the system will allow
for a single user. For example, if the current jobs/user limit is 32, design your bundle
size so that you get at least 32 swarm jobs.
Swarm submits clusters of processes using PBS (Portable Batch System) via the qsub command; any valid qsub com mand-line option is also valid for swarm. In this example the "-l" option is given to specify a resource list, typi cally requesting
$ swarm -f testfile -l nodes=1:p866:m1000
Note that swarm is designed to run 2 processes on a single
node, swarm will override "-l nodes=n" with "-l nodes=1".
|
|||
|
This
document is available as http://biowulf.nih.gov/apps/swarm.html Jun 05, 2002 (sb) |
|||