TephraProb on a cluster

08 Jun 2018 tephraprob parallel codes

There is a relatively easy way to parallelise TephraProb on a computer cluster without having to struggle with Matlab’s Parallel Computing Toolbox. Matlab is not even needed on the cluster, and it is just a matter of sending single occurrences of Tephra2 on different nodes. Here is how to do so:

Generate your input files and eruption scenarios locally
Send the required files on the cluster
Run the scenario on the cluster
Retrieve the output files from the cluster to the local computer
Post-process the output files (e.g. probability calculations) locally

Setting up remote files

Following the procedure in the TephraProb manual, all tasks should be completed until section 5.3 - meaning you should be set to hit the Run Tephra2 function. The main file that will be used for the parallelisation is T2_stor.txt, located in RUNS/runName/runNumber/T2_stor.txt, which contains the Tephra2 commands for all single model runs of the scenario.

Transfer your run, grid and wind files on the cluster. Unless you decide to customize T2_stor.txt, the directory tree once on the cluster should look like the following. Note that not all files of the RUNS/, GRID/ and WIND folders need to be transferred.

ROOT
├── MODEL/
├── RUNS/
│   └── runName/
│       └── runNumber/
│           ├── CONF/*.*
│           ├── GS*.*
│           └── OUT/*.*
├── WIND/
│   └── windName/
│           └── pathToAscii/*.*
├── GRID/
│   └── gridName/
│           └── *.utm
├── T2_stor.txt
└── runTephraProb.sh

Note that T2_stor.txt should be at the root, and all commands specified in it should point to locations that have been uploaded on the cluster. Note that by default, all paths defined in T2_stor.txt are relative.

The rsync becomes handy here. To copy RUNS/ folders, you can use something like the following command, which will ignore all unecessary files and folders:

rsync -arvz --exclude *.mat --exclude FIG --exclude KML --exclude LOG --exclude SUM  RUNS/run_name host@server:~/TephraProb/RUNS/

Compiling Tephra2

Tephra2 needs to be recompiled for the cluster’s architecture. On the cluster, from the root of TephraProb, navigate to MODEL/forward_src/ and enter:

make

Now, navigate back to the root of TephraProb and type:

chmod 755 MODEL/tephra2-2012

That should get Tephra2 running on the cluster. Getting a File format not recognized error? See if that helps.

Running in parallel

The parallelization is achieved using job arrays. Conceptually:

T2_stor.txt
is cut in smaller files named T2_stor.txt00, T2_stor.txt01 … T2_stor.txtXX using the split Unix command
The subset of commands to Tephra2 in each of the sub-file is sent to a different node using a job array

On the cluster, split T2_stor.txt:

split -l 1000 -a 2 -d T2_stor.txt T2_stor.txt

where -l is the number of line of each sub-file (here T2_stor.txt will be split into subsets of 1000 lines) and -a is the number of digits appended to the name of the subfile (e.g. -a 2 produces 01, 02 etc…). The last argument is the generic name of the sub-file.

Let’s say that this created 10 files named T2_stor.txt00 to T2_stor.txt09. We need to adapt the job array to account for the range 0-9. We then use a handy little piece of code called GNU Parallel to send single CPU jobs to the nodes.

SLURM

On a SLURM cluster, the bash runTephraProb.sh might look like that:

module load GCC/4.9.3-2.25
module load OpenMPI/1.10.2
module load parallel
chunk=`printf "%02d" $SLURM_ARRAY_TASK_ID`
srun parallel -j 16 -a T2_stor.txt$chunk

The job can then be submitted using:

sbatch --array=0-9 runTephraProb.sh

OpenPBS

On an OpenPBS cluster, the bash runTephraProb.sh might look like that:

module load openmpi/1.4.5-gnu
module load parallel
cd $PBS_O_WORKDIR
chunk=`printf "%02d" $PBS_ARRAYID`
mpirun -np 12 -machinefile $PBS_NODEFILE parallel -j 12 -a T2_stor.txt.$chunk

The job can then be submitted using:

qsub -t 0-9 runTephraProb.sh

Post-processing

Once the modelling is finished, copy the remote version of RUNS/runName/runNumber/OUT/ on to your local RUNS/runName/runNumber/OUT/ and go on with the post-processing. Again, the rsync command can help you achieve something like that: The job can then be submitted using:

rsync -arvz --ignore-existing host@server:~/TephraProb/RUNS/run_name/ RUNS/run_name/

2020 Feb

2018 Dec

2018 Nov

2018 Oct

2018 Jul

2018 Jun

2018 Feb

2017 Dec

2017 Oct

2017 Sep

TephraProb on a cluster

Setting up remote files

Compiling Tephra2

Running in parallel

SLURM

OpenPBS

Post-processing

2020 Feb

2018 Dec

2018 Nov

2018 Oct

2018 Jul

2018 Jun

2018 Feb

2017 Dec

2017 Oct

2017 Sep

TephraProb on a cluster

Setting up remote files

Compiling Tephra2

Running in parallel

SLURM

OpenPBS

Post-processing

Related Posts

＿ TephraProb v.1.7 21 Feb 2020

＿ TephraProb troubleshoot 20 Dec 2018

＿ Random tools 13 Dec 2018