Ignore:
Timestamp:
Nov 7, 2024, 10:27:18 AM (2 months ago)
Author:
jbclement
Message:

PEM:
Modifications related to the launching script:

  • There is actually no launching difference between the 1D and 3D models. It is more about how and where you want to execute. So now, the user can choose between two launching modes with the parameter "mode" (0 = "processing scripts"; any other values = "submitting jobs"). The former option is usually used to process the script on a local machine while the latter is used to submit jobs on supercomputer;
  • The execution command line in the job scripts that should be modified by the user according to the set-up is now given as an argument at the beginning to be more identifiable and adaptable;
  • Making the job scripts more robust to detect a successful end.

JBC

Location:
trunk/LMDZ.COMMON/libf/evolution/deftank
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • trunk/LMDZ.COMMON/libf/evolution/deftank/PCMrun.job

    r3416 r3495  
    2323# Name of executable for the PCM:
    2424exePCM="gcm_64x48x32_phymars_para.e"
     25
     26# Execution command:
     27exe_cmd="srun --cpu-bind=threads --label -c${OMP_NUM_THREADS:=1}"
    2528########################################################################
    2629
     
    3235read i_myear n_myear convert_years iPCM iPEM nPCM nPCM_ini < info_PEM.txt
    3336cp run_PCM.def run.def
    34 srun --cpu-bind=threads --label -c${OMP_NUM_THREADS:=1} $exePCM > out_runPCM${iPCM} 2>&1
    35 if [ ! -f "restartfi.nc" ] || ! (tail -n 1 out_runPCM${iPCM} | grep -iq "everything is cool!"); then # Check if it ended abnormally
     37eval "$exe_cmd $exePCM > out_runPCM${iPCM} 2>&1"
     38if [ ! -f "restartfi.nc" ] || ! (tail -n 100 out_runPCM${iPCM} | grep -iq "everything is cool!"); then # Check if it ended abnormally
    3639    echo "Error: the run PCM $iPCM crashed!"
    3740    echo "Be careful: there may be dependent jobs remaining in the SLURM queue with status 'DependencyNeverSatisfied'! You can cancel them by executing the script \"kill_launchPEM.sh\"."
  • trunk/LMDZ.COMMON/libf/evolution/deftank/PEMrun.job

    r3417 r3495  
    2222# Name of executable for reshaping PCM data with XIOS:
    2323exeReshape="reshape_XIOS_output_64x48x32_phymars_seq.e"
     24
     25# Argument for the PEM execution (for SLURM: "$SLURM_JOB_ID" | for PBD/TORQUE: "$PBS_JOBID" | "" when the script is not run as a job):
     26arg_pem="$SLURM_JOB_ID"
    2427########################################################################
    2528
     
    3942echo "Run PEM $iPEM is starting."
    4043cp run_PEM.def run.def
    41 ./$exePEM $SLURM_JOB_ID > out_runPEM${iPEM} 2>&1
    42 if [ ! -f "restartfi.nc" ] || ! (tail -n 1 out_runPEM${iPEM} | grep -iq "so far, so good!"); then # Check if it ended abnormally
     44eval "./$exePEM $arg_pem > out_runPEM${iPEM} 2>&1"
     45if [ ! -f "restartfi.nc" ] || ! (tail -n 100 out_runPEM${iPEM} | grep -iq "so far, so good!"); then # Check if it ended abnormally
    4346    echo "Error: the run PEM $iPEM crashed!"
    4447    exit 1
  • trunk/LMDZ.COMMON/libf/evolution/deftank/README

    r3460 r3495  
    77      (ii)  nPCM_ini -> the number of initial PCM runs (at least 2);
    88      (iii) nPCM -> the number of PCM runs between each PEM run (usually 2);
    9       (iv)  dim -> the dimension of the model (1 for 1D, any other values stand for 3D).
     9      (iv)  mode -> the launching mode (0 = "processing scripts"; any other values = "submitting jobs"). The former option is usually used to process the script on a local machine while the latter is used to submit jobs on supercomputer.
    1010  The script can take an argument:
    1111      - If there is no argument, then the script initiates a PEM simulation from scratch.
    1212      - If the argument is 're', then the script relaunches an existing PEM simulation. It will ask for parameters to know the starting point that you want to.
    13   The script works only with the job scheduler SLURM to submit chained jobs.
     13  To submit chained jobs, the script works with the job schedulers SLURM and PBS/TORQUE.
    1414
    1515# liblaunchPEM.sh:
     
    1717
    1818# PCMrun.job:
    19   Bash script file to submit a PCM job (with SLURM or PBS/TORQUE). The name of the PCM executable file should be adapted. The headers correspond to the ADASTRA supercomputer and should be changed for other machines and job schedulers. In case of 1D, the headers are naturally omitted.
     19  Bash script file to submit a PCM job (with SLURM or PBS/TORQUE). The headers correspond to the ADASTRA supercomputer and should be changed for other machines and job schedulers. In case of "processiong scripts" launching mode, the headers are naturally omitted.
    2020  The path to source the arch file should be adapted to the machine.
    21   The execution line should also be adapted according to the set-up.
     21  The name of the PCM executable file should be adapted.
     22  The execution command should also be adapted according to the set-up.
    2223
    2324# PEMrun.job:
    24   Bash script file to submit PEM job (with SLURM or PBS/TORQUE). The name of the PEM executable file and Reshaping executable file should be adapted. The headers correspond to the ADASTRA supercomputer and should be changed for other machines and job schedulers. In case of 1D, the headers are naturally omitted.
     25  Bash script file to submit PEM job (with SLURM or PBS/TORQUE).The headers correspond to the ADASTRA supercomputer and should be changed for other machines and job schedulers. In case of "processiong scripts" launching mode, the headers are naturally omitted.
    2526  The path to source the arch file should be adapted to the machine.
    26   The execution line should also be adapted according to the set-up.
    27   The PEM executable can have an optional argument to specify the SLURM job ID in order to detect the job time limit and deal with it.
     27  The name of the PEM executable file and Reshaping executable file should be adapted.
     28  The PEM executable can have an optional argument which should be specified according to the set-up. This the job ID to make the PEM detect the job time limit.
    2829
    2930# run_PEM.def
  • trunk/LMDZ.COMMON/libf/evolution/deftank/launchPEM.sh

    r3419 r3495  
    1717#n_earth_years=300
    1818
    19 # Set the number of initial PCM runs:
     19# Set the number of initial PCM runs (>= 2):
    2020nPCM_ini=3
    2121
    22 # Set the number of PCM runs between each PEM run:
     22# Set the number of PCM runs between each PEM run (>= 2):
    2323nPCM=2
    2424
    25 # Set the dimension of the model (1 = "1D"; other values = "3D"):
    26 dim=3
     25# Set the launching mode (0 = "processing scripts"; any other values = "submitting jobs"). The former option is usually used to process the script on a local machine while the latter is used to submit jobs on supercomputer:
     26mode=1
    2727########################################################################
    2828
     
    4848    checklaunch
    4949    initlaunch
    50     cyclelaunch $dim $nPCM_ini
     50    cyclelaunch $mode $nPCM_ini
    5151
    5252else
     
    5757        echo "This is a new cycle for the PEM simulation."
    5858        date
    59         if [ $dim -ne 1 ]; then
     59        if [ $mode -ne 0 ]; then
    6060            job_scheduler
    6161        fi
    6262        read i_myear n_myear convert_years iPCM iPEM nPCM nPCM_ini < info_PEM.txt
    63         cyclelaunch $dim $nPCM
     63        cyclelaunch $mode $nPCM
    6464
    6565    # Starting a relaunch
     
    122122        fi
    123123        if [ $relaunch = "PCM" ]; then
    124             relaunchPCM $dim
     124            relaunchPCM $mode
    125125        else
    126             relaunchPEM $dim
     126            relaunchPEM $mode
    127127        fi
    128128
     
    133133        echo "This is a continuation of the previous PEM run."
    134134        date
    135         submitPEM $dim
     135        submitPEM $mode
    136136
    137137    # Default case: error
  • trunk/LMDZ.COMMON/libf/evolution/deftank/lib_launchPEM.sh

    r3446 r3495  
    3232        submit_job="sbatch --parsable"
    3333        submit_dependjob="sbatch --parsable --dependency"
    34         sed -i 's/\$PBS_JOBID/\$SLURM_JOB_ID/g' PEMrun.job
    3534    elif command -v qstat &> /dev/null; then
    3635        echo "PBS/TORQUE is installed on $machine."
     
    3938        submit_job="qsub"
    4039        submit_dependjob="qsub -W depend"
    41         sed -i 's/\$SLURM_JOB_ID/\$PBS_JOBID/g' PEMrun.job
    4240    else
    4341        echo "Error: neither SLURM nor TORQUE/PBS is installed on $machine!"
     
    119117        mkdir diags
    120118    fi
    121     if [ $dim -ne 1 ]; then
     119    if [ $mode -ne 0 ]; then
    122120        job_scheduler
    123121    fi
     
    161159
    162160# To submit the PCM runs
    163 # arg1: model dimension
     161# arg1: launching mode
    164162# arg2: number of PCM runs to launch
    165163# arg3: local number of the PCM run from which to start (optional)
     
    172170    if [ $i_myear -lt $n_myear ]; then
    173171        echo "Run PCM $iPCM: call $ii/$2..."
    174         if [ $1 -eq 1 ]; then # 1D model
     172        if [ $1 -eq 0 ]; then # Mode: processing scripts
    175173            sed -i "s/^k=[0-9]\+$/k=$(echo "3 - $nPCM_ini" | bc)/" PCMrun.job
    176174            ./PCMrun.job
     
    178176                errlaunch
    179177            fi
    180         else # 3D model
     178        else # Mode: launching jobs
    181179            cp PCMrun.job PCMrun${iPCM}.job
    182180            sed -i -E "s/($name_job[^0-9]*[0-9]*[^0-9]*)[0-9]+$/\1${iPCM}/" PCMrun${iPCM}.job
     
    197195        if [ $i_myear -lt $n_myear ]; then
    198196            echo "Run PCM $iPCM: call $i/$2..."
    199             if [ $1 -eq 1 ]; then # 1D model
     197            if [ $1 -eq 0 ]; then # Mode: processing scripts
    200198                sed -i "s/^k=[0-9]\+$/k=$(echo "$i + 2 - $nPCM_ini" | bc)/" PCMrun.job
    201199                ./PCMrun.job
     
    203201                    errlaunch
    204202                fi
    205             else # 3D model
     203            else # Mode: launching jobs
    206204                cp PCMrun.job PCMrun${iPCM}.job
    207205                sed -i -E "s/($name_job[^0-9]*[0-9]*[^0-9]*)[0-9]+$/\1${iPCM}/" PCMrun${iPCM}.job
     
    219217
    220218# To submit the PEM run
    221 # arg1: model dimension
     219# arg1: launching mode
    222220submitPEM() {
    223221    if [ $i_myear -lt $n_myear ]; then
    224222        echo "Run PEM $iPEM"
    225         if [ $1 -eq 1 ]; then # 1D model
     223        if [ $1 -eq 0 ]; then # Mode: processing scripts
    226224            ./PEMrun.job
    227225            if [ $? -ne 0 ]; then
    228226                errlaunch
    229227            fi
    230         else # 3D model
     228        else # Mode: launching jobs
    231229            sed -i -E "s/($name_job[^0-9]*[0-9]*[^0-9]*)[0-9]+$/\1${iPEM}/" PEMrun.job
    232230            jobID=$(eval "$submit_job PEMrun.job")
     
    242240
    243241# To make one cycle of PCM and PEM runs
    244 # arg1: model dimension
     242# arg1: launching mode
    245243# arg2: number of PCM runs to launch
    246244# arg3: local number of the PCM run from which to start (optional)
     
    252250    if [ $i_myear -lt $n_myear ]; then
    253251        echo "Run PEM $iPEM"
    254         if [ $1 -eq 1 ]; then # 1D model
     252        if [ $1 -eq 0 ]; then # Mode: processing scripts
    255253            ./PEMrun.job
    256254            if [ $? -ne 0 ]; then
    257255                errlaunch
    258256            fi
    259         else # 3D model
     257        else # Mode: launching jobs
    260258            sed -i -E "s/($name_job[^0-9]*[0-9]*[^0-9]*)[0-9]+$/\1${iPEM}/" PEMrun.job
    261259            jobID=$(eval "$submit_dependjob=afterok:${jobID} PEMrun.job")
     
    293291
    294292# To relaunch from PCM run
    295 # arg1: model dimension
     293# arg1: launching mode
    296294relaunchPCM() {
    297295    iPCM=$(($irelaunch + 1))
     
    361359
    362360# To relaunch from PEM run
    363 # arg1: model dimension
     361# arg1: launching mode
    364362relaunchPEM() {
    365363    iPEM=$(echo "$irelaunch + 1" | bc)
Note: See TracChangeset for help on using the changeset viewer.