Parallel bash scripts – waiting for completion

Spread the love

Assume we would like to execute several independent tasks in parallel and we would like to wait till these tasks are completed. For example we would like to copy a set of files to several (remote) hosts and get a signal when we are ready with this copying task to ALL hosts so we can start the next step in the process.

To explain this we will create a time-consuming task (copying task) and a controlling task which will start all separate time-consuming tasks. You will find the line by line explanation below the scripts.

Time-consuming task (task.sh)

Create a (dummy) time-consuming task like this:

#/bin/bash

TASK_NUMBER=$1;
LOCK_DIR=$2;

LOCKFILE="${LOCK_DIR}/TASK-${TASK_NUMBER}";

# Set lock
touch $LOCKFILE;

echo "Task $TASK_NUMBER - Start";

# Simulate a time consuming task
wait_time=$RANDOM;
let "wait_time%=30";

echo "Task $TASK_NUMBER - Sleeptime: $wait_time";
sleep $wait_time;

echo "Task $TASK_NUMBER - Ready";

# Remove lock
rm $LOCKFILE;

Line by line explanation (task.sh)

1. Parse arguments from command line


TASK_NUMBER=$1;
LOCK_DIR=$2;

…will parse the first two arguments from the command line. First argument is the unique number of the task. The second argument is the directory for storing lock-files.

2. Set lock


LOCKFILE="${LOCK_DIR}/TASK-${TASK_NUMBER}";
touch $LOCKFILE;

…the first line will generate the name of the lock-file. This is a composition of the lock-file directory and the unique number of the task. The second line will actually create the lock-file.

3. Simulate a time consuming task


wait_time=$RANDOM;
let "wait_time%=30";
sleep $wait_time;

… the first line will generate a pseudorandom integer in the range from 0 to 32767. It is called pseudo because computers can only -simulate- randomness. The second line will calculate the modulo (the remainder of an integer division operation) of the random value divided by 30. Just to create a acceptable running time for the simulation task. The last line will actually ‘sleep‘ a random time. Simulating work in progress. Replace these three lines with a task which really takes a long time to finish like copying files to a remote host.

4. Release the lock


rm $LOCKFILE;

… will remove the lock-file which was created in step 2.

Controlling task (main.sh)

Besides of creating time consuming tasks, create a controlling task which will call the time consuming tasks:

#/bin/bash

TASKS=5;
LOCK_DIR="/tmp/.lock_dir";

wait4tasks(){
sleep 2;
tasks=`ls $LOCK_DIR | tr '\n' ' '`;
counter=`ls $LOCK_DIR | wc -l`;

while [ $counter -gt 0 ] ; do
echo "MAIN - Running tasks: $counter - $tasks";
sleep 1;
tasks=`ls $LOCK_DIR | tr '\n' ' '`;
counter=`ls $LOCK_DIR | wc -l`;
done
}

echo "MAIN - Starting $TASKS tasks.";

if mkdir $LOCK_DIR; then
for i in $(seq 1 1 ${TASKS})
do
./task.sh $i $LOCK_DIR &
done

wait4tasks;
fi

rmdir $LOCK_DIR;

echo "MAIN - Finished running tasks.";

Line by line explanation (main.sh)

First we will explain the method: ‘wait4tasks()‘.

1. Let tasks settle


sleep 2;

… start with waiting 2 seconds. Just let the tasks settle, let the tasks create at least the lock-files.

2. Get locks of tasks


tasks=`ls $LOCK_DIR | tr '\n' ' '`;

… first get a directory listing (ls) containing all lock-files of tasks. Supply this listing to the tr-command (translate) which will replace all line endings with a space. Result is one line with all lock-file names, each separated by a space. Put this result in the variable: ‘tasks‘.

3. Count the number of tasks


counter=`ls $LOCK_DIR | wc -l`;

… start with getting the same directory listing as in step 2, but now count the lines (wc -l). This is the number of running tasks. Put this result in the variable: counter.

Step 2 and Step 3 are meant to fill the variables ‘tasks‘ and ‘counter‘ with its initial values.

4. Main loop of wait4tasks()-method.


while [ $counter -gt 0 ] ; do
echo "MAIN - Running tasks: $counter - $tasks";
sleep 1;
tasks=`ls $LOCK_DIR | tr '\n' ' '`;
counter=`ls $LOCK_DIR | wc -l`;
done

… repeat statements in loop while counter is greater than 0. Next, print the number of running tasks and the names of the tasks. Now sleep for one second (or more). Refill the variables ‘tasks‘ and ‘counter‘ with the current running tasks. Filling is explained in step 2 and step 3.

So when counter is zero ( = not greater than 0), we will exit this while-loop. We will only return from the method wait4tasks() when there are no running tasks anymore.

Next we will explain the core of the controlling task.

1. Creating / checking lock-directory


if mkdir $LOCK_DIR; thenĀ  ~~~~ fi

…. will create the lock-directory. The if-condition is ‘true’ when creating the directory succeed. The command ‘mkdir‘ is an atomic check-and-create operation. When two (or more) processes call mkdir at the same time, only one process can succeed at most. At operating system kernel level, it is ensured that mkdir is atomic.

2. Create some time-consuming tasks


for i in $(seq 1 1 ${TASKS})
do
./task.sh $i $LOCK_DIR &
done

TASKS must be filled with an integer value (here 5). The for-loop will fill the variable-i for each iteration with a value starting with 1 up to TASKS with a step size of 1. Here this is: 1, 2, 3, 4 and 5. Now start a time-consuming task (./task.sh), parse the variable ‘i‘ as identifier for the task (first argument) and the lock-files-directory as second argument. Put the task in the background by putting ‘&‘ behind the command.

3. Wait till tasks are ready.


wait4tasks;

…. now wait till all started tasks are finished by calling the method: ‘wait4tasks‘.

Testing the example

Be sure you saved both files ‘task.sh‘ and ‘main.sh‘ in the same directory. And you must made them both executable by setting the execute-bit: ‘chmod +x task.sh‘ and ‘chmod +x main.sh‘. Start the example by typing: ‘./main.sh‘.

Example output

MAIN - Starting 5 tasks.
Task 2 - Start
Task 2 - Sleeptime: 24
Task 3 - Start
Task 3 - Sleeptime: 18
Task 5 - Start
Task 5 - Sleeptime: 14
Task 1 - Start
Task 1 - Sleeptime: 14
Task 4 - Start
Task 4 - Sleeptime: 26
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
MAIN - Running tasks: 5 - TASK-1 TASK-2 TASK-3 TASK-4 TASK-5
Task 5 - Ready
Task 1 - Ready
MAIN - Running tasks: 3 - TASK-2 TASK-3 TASK-4
MAIN - Running tasks: 3 - TASK-2 TASK-3 TASK-4
MAIN - Running tasks: 3 - TASK-2 TASK-3 TASK-4
MAIN - Running tasks: 3 - TASK-2 TASK-3 TASK-4
Task 3 - Ready
MAIN - Running tasks: 2 - TASK-2 TASK-4
MAIN - Running tasks: 2 - TASK-2 TASK-4
MAIN - Running tasks: 2 - TASK-2 TASK-4
MAIN - Running tasks: 2 - TASK-2 TASK-4
MAIN - Running tasks: 2 - TASK-2 TASK-4
MAIN - Running tasks: 2 - TASK-2 TASK-4
Task 2 - Ready
MAIN - Running tasks: 1 - TASK-4
MAIN - Running tasks: 1 - TASK-4
Task 4 - Ready
MAIN - Finished running tasks.

Cleaning dangling lock-files

When you force the script to stop (CTRL+C), the lock-files-directory is not cleaned. You have to remove this lock-directory yourself by calling: ‘rm -rf /tmp/.lock_dir‘. Otherwise you will see the errors below when you try to execute the main.sh-script again:

mkdir: cannot create directory `/tmp/.lock_dir': File exists
rmdir: failed to remove `/tmp/.lock_dir': Directory not empty

Links

Leave a comment