Status
About
Hardware
Applications
Batch queues
Disk storage

MPI
Performance
New Users
User Guide
Documentation
Research
Photos


    Swarm

    Swarm is a program designed to simplify submitting a group of commands to the cluster. Some programs do not scale well and thus are not suited to true parallelizing. Other programs may be such that each individual job is very short, but many such jobs need to be run. Such programs are well suited to running 'swarms of single-threaded jobs'. The Swarm program simplifies this process.

    Swarm reads a list of commands from cmdfile then automatically submits those commands to the PBS batch system to execute. Swarm runs one command for each processor on a node, making optimum use of a node (thus a node with 2 processors will execute two commands simultaneously). When there are hundreds or thousands of commmands, use the -b option to bundle groups of commands to be run sequentially per processor.

    Commands in the command file should appear just as they would be entered on a command line. STDOUT (or STDERR) output that isn't explicitly directed elsewhere will be sent to a file named swarmPIDn#.o (or .e) in your current working directory. A line where the first non-whitespace character is "#" is considered a comment and is ignored.

    Swarm creates a .swarm directory in your current working directory, and creates an executable script for every 2 commands in your command file. These scripts are automatically deleted as the final step when they are executed. (If the -d (debug) option is specified, the scripts are not submitted to batch, so they will not be deleted.)

    SYNOPSIS

           swarm -f cmdfile [ -n # ] [-b #] [ -d ] [ -h ] [ qsub-options ]
    
    

    OPTIONS

           The  -f  cmdfile  option  is  mandatory,  all  others  are optional.
    
    
           -f cmdfile     specify the file containing a list of commands, one 
                          command per line. You may use  ";" to separate several  
                          commands on a line, and these will be executed 
    		      sequentially.
    
           -d             debug mode. The command file is read, command scripts  
                          are generated and saved in the .swarm directory, and 
    		      debugging information is printed, but the scripts are 
    		      not submitted to the batch  system.  The  qsub
                          command(s)  that would have been issued is printed 
    		      as part of the  debug  output  and can  be used to 
    		      submit the script(s) manually if desired.
    
           -b #           bundle  mode.  swarm runs one command per processor 
                          by default. Use the bundle option to run "#" commands  
    		      per processor, one after the other. The advantages of 
    		      bundling include fewer swarm jobs and output/error 
    		      files, lower overhead due to scheduling and job 
    		      startup, and disk file cache benefits under certain 
    		      circumstances.
    
    
           -n #           number of processes to run per node; swarm sets  
                          this number to 2 by default (the NIH Biowulf comprises   
    		      2-processor   nodes).
    
           -h             prints help message
    

    OUTPUT

    STDOUT and STDERR output from processes executed under swarm will be directed to a file named swarmPIDn#.o (or .e), for instance swarm2587n1.o (or swarm2587n1.e). Since this can be confusing (with multiple processes writing to the same file) it is a good idea to explicitly redirect output on the command line using ">".

    Be aware of programs that write directly to a file using a fixed filename. If you run multiple instances of such pro­ grams then for each instance you will need to a) change the name of the file or b) alter the path to the file. See the EXAMPLES section for some ideas.

    EXAMPLES

    To see how swarm works, first create a file containing a few simple commands, then use swarm to submit them to the batch queue:
                 $ cat > cmdfile
                 date
                 hostname
                 ls -l
                 ^D
    
                 $ swarm -f cmdfile
    
    Use qstat -u your-user-id to monitor the status of your request; an "R" in the "S"tatus column indicates your job is running (see qstat(1) for more details). This particular example will probably run to completion before you can give the qstat command. To see the output from the commands, see the files named "swarmPIDn#.o".


    Example 1: A program that reads to STDIN and writes to STDOUT

    For each invocation of the program the names for the input and output files vary:

                 $ cat > runbix
                 ./bix < testin1 > testout1
                 ./bix < testin2 > testout2
                 ./bix < testin3 > testout3
                 ./bix < testin4 > testout4
                 ^D
    


    Example 2: A program that writes to a fixed filename

    If a program writes to a fixed filename, then you may need to run the program in different directories. First create the necessary directories (for instance run1, run2), and in the swarm command file cd to the unique output directory before running the program: (cd using either an absolute path beginning with "/" or a relative path from your home directory). Lines with leading "#" are considered comments and ignored.

                 $ cat > batchcmds
                 # Run ped program using different directory
                 # for each run
                 cd pedsystem/run1; ../ped
                 cd pedsystem/run2; ../ped
                 cd pedsystem/run3; ../ped
                 cd pedsystem/run4; ../ped
                  ...
    
                 $ swarm -f batchcmds
    


    Example 3: Bundling large numbers of commands

    For large numbers of commands, especially if the jobs are small, it is advantageous to 'bundle' the jobs with the -b flag. If the command file contains 2500 commands, the following swarm command will group them into bundles of 40 commands each, producing 64 bundles. Swarm will then submit two bundles as a single swarm job, so there will be 32 (2500/64) swarm jobs.

         swarm -f cmdfile -b 40
    
    Note that commands in a bundle will run sequentially on the assigned node. Ideally, the bundling number should be chosen so that there are as many jobs as the system will allow for a single user. For example, if the current jobs/user limit is 32, design your bundle size so that you get at least 32 swarm jobs.


    Example 4: Using qsub flags

    Swarm submits clusters of processes using PBS (Portable Batch System) via the qsub command; any valid qsub com­ mand-line option is also valid for swarm. In this example the "-l" option is given to specify a resource list, typi­ cally requesting

                 $ swarm -f testfile -l nodes=1:p866:m1000
    
    Note that swarm is designed to run 2 processes on a single node, swarm will override "-l nodes=n" with "-l nodes=1".
 

This document is available as http://biowulf.nih.gov/apps/swarm.html
Biowulf home page | Helix Systems | NIH

Jun 05, 2002 (sb)