-
Notifications
You must be signed in to change notification settings - Fork 98
/
Copy path11_running_scheduler.tex
161 lines (131 loc) · 6.57 KB
/
11_running_scheduler.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
\chapter{Running through a Scheduler}\label{cha:Running-Scheduler}
The code is usually run on large parallel machines, often PC clusters,
most of which use schedulers, i.e., queuing or batch management systems
to manage the running of jobs from a large number of users. The following
considerations need to be taken into account when running on a system
that uses a scheduler:\newline
\begin{itemize}
\item The processors/nodes to be used for each run are assigned dynamically
by the scheduler, based on availability. Therefore, in order for the
mesher and the solver (or between successive runs of the solver) to
have access to the same database files (if they are stored on hard
drives local to the nodes on which the code is run), they must be
launched in sequence as a single job.
\item On some systems, the nodes to which running jobs are assigned are
not configured for compilation. It may therefore be necessary to pre-compile
both the mesher and the solver. A small program provided in the distribution
called \texttt{\small create\_header\_file.f90} can be used to directly
create\texttt{\small{} OUTPUT\_FILES/values\_from\_mesher.h} using
the information in the \texttt{\small DATA/Par\_file} without having
to run the mesher (type \texttt{\small `make}{\small{} }\texttt{\small create\_header\_}~\\
\texttt{\small file}' to compile it and `\texttt{\small ./bin/xcreate\_header\_file}'
to run it; refer to the sample scripts below). The solver can now
be compiled as explained above.
\item One feature of schedulers/queuing systems is that they allow submission
of multiple jobs in a {}``launch and forget'' mode. In order to
take advantage of this property, care needs to be taken that output
and intermediate files from separate jobs do not overwrite each other,
or otherwise interfere with other running jobs.
\end{itemize}
We describe here in some detail a job submission procedure for the
Caltech 1024-node cluster, CITerra, under the LSF scheduling system.
We consider the submission of a regular forward simulation. The two
main scripts are \texttt{\small run\_lsf.bash}, which compiles the
Fortran code and submits the job to the scheduler, and \texttt{\small go\_mesher\_solver\_lsf}~\\
\texttt{\small .bash}, which contains the instructions that make up
the job itself. These scripts can be found in \texttt{\small utils/Cluster/lsf}
directory and can straightforwardly be modified and adapted to meet
more specific running needs.
\section{\texttt{run\_lsf.bash}}
This script first sets the job queue to be `normal'. It then compiles
the mesher and solver together, figures out the number of processors
required for this simulation from \texttt{DATA/Par\_file}, and submits
the LSF job.
{\small
\begin{verbatim}
#!/bin/bash
# use the normal queue unless otherwise directed queue="-q normal"
if [ $# -eq 1 ]; then
echo"Setting the queue to $1"
queue="-q $1"
fi
# compile the mesher and the solver
d=`date`
echo"Starting compilation $d"
make clean
make meshfem3D
make create_header_file
./bin/xcreate_header_file
make specfem3D
d=`date`
echo"Finished compilation $d"
# compute total number of nodes needed
NPROC_XI=`grep ^NPROC_XI DATA/Par_file | cut -c 34- `
NPROC_ETA=`grep ^NPROC_ETA DATA/Par_file | cut -c 34- `
NCHUNKS=`grep ^NCHUNKS DATA/Par_file | cut -c 34- `
# total number of nodes is the product of the values read
numnodes=$(( $NCHUNKS * $NPROC_XI * $NPROC_ETA ))
echo "Submitting job"
bsub $queue -n $numnodes -W 60 -K <go_mesher_solver_lsf_globe.bash
\end{verbatim}
}
\section{\texttt{go\_mesher\_solver\_lsf\_globe.bash}}
This script describes the job itself, including setup steps that can
only be done once the scheduler has assigned a job-ID and a set of
compute nodes to the job, the \texttt{run\_lsf.bash} commands used
to run the mesher and the solver, and calls to scripts that collect
the output seismograms from the compute nodes and perform clean-up
operations.
\begin{enumerate}
\item First the script directs the scheduler to save its own output and
output from \texttt{stdout} into \texttt{\small OUTPUT\_FILES/\%J.o},
where \texttt{\%J} is short-hand for the job-ID; it also tells the
scheduler what version of \texttt{mpich} to use (\texttt{mpich\_gm})
and how to name this job (\texttt{go\_mesher\_solver\_lsf}).
\item The script then creates a list of the nodes allocated to this job
by echoing the value of a dynamically set environment variable \texttt{LSB\_MCPU\_HOSTS}
and parsing the output into a one-column list using the Perl script
\texttt{utils/Cluster/lsf/remap\_lsf\_machines.pl}. It then creates a set of scratch
directories on these nodes (\texttt{\small /scratch/}~\\
\texttt{\small \$USER/DATABASES\_MPI}) to be used as the \texttt{LOCAL\_PATH}
for temporary storage of the database files. The scratch directories
are created using \texttt{shmux}, a shell multiplexor that can execute
the same commands on many hosts in parallel. \texttt{shmux} is available
from \href{http://web.taranis.org/shmux/}{Shmux}. Make sure that the \texttt{LOCAL\_PATH}
parameter in \texttt{DATA/Par\_file} is also set properly.
\item The next portion of the script launches the mesher and then the solver
using \texttt{run\_lsf.bash}.
\item The final portion of the script performs clean up on the nodes using
the Perl script \texttt{cleanmulti.pl}
\end{enumerate}
{\small
\begin{verbatim}
#!/bin/bash -v
#BSUB -o OUTPUT_FILES/%J.o
#BSUB -a mpich_gm
#BSUB -J go_mesher_solver_lsf
BASEMPIDIR=/scratch/$USER/DATABASES_MPI
echo "$LSB_MCPU_HOSTS" > OUTPUT_FILES/lsf_machines
echo "$LSB_JOBID" > OUTPUT_FILES/jobid
./remap_lsf_machines.pl OUTPUT_FILES/lsf_machines > OUTPUT_FILES/machines
# Modif : create a directory for this job
shmux -M50 -Sall \
-c "mkdir -p /scratch/$USER;mkdir -p $BASEMPIDIR.$LSB_JOBID" - < OUTPUT_FILES/machines >/dev/null
# Set the local path in Par_file
sed -e "s:^LOCAL_PATH .*:LOCAL_PATH = $BASEMPIDIR.$LSB_JOBID:" < DATA/Par_file > DATA/Par_file.tmp
mv DATA/Par_file.tmp DATA/Par_file
current_pwd=$PWD
mpirun.lsf --gm-no-shmem --gm-copy-env $current_pwd/bin/xmeshfem3D
mpirun.lsf --gm-no-shmem --gm-copy-env $current_pwd/bin/xspecfem3D
# clean up
cleanbase_jobid.pl OUTPUT_FILES/machines DATA/Par_file
\end{verbatim}
}
\section{\texttt{run\_lsf.kernel} and \texttt{go\_mesher\_solver\_globe.kernel}}
For kernel simulations, you can use the sample run scripts \texttt{run\_lsf.kernel}
and \texttt{go\_mesher\_solver\_globe}~\\
\texttt{.kernel} provided in \texttt{utils/Cluster} directory, and modify
the command-line arguments of \texttt{xcreate\_adjsrc\_traveltime} in
\texttt{go\_mesher\_solver\_globe.kernel} according to the start and end time
of the specific portion of the forward seismograms you are interested
in.