Changeset 4989


Ignore:
Timestamp:
Jun 18, 2024, 5:42:40 PM (8 days ago)
Author:
asima
Message:

Final (really, this time) solution to the crash with message
"srun: error: “Unable to create step for job ... : More processors requested than permitted".

The crash actually only occured for "init=1" in main.sh ;
in this case, the "tmp" initialisation job must "unset" (at least) his "--cpu_per_task" option before submitting tmp_$SIM,
otherwise tmp_$SIM will "inherit" it, no matter the value in its own header.

As a "bonus" :
this change allows using "--cpus-per-task=1" in the initialisation job (as required by ce0l), instead of the previous "16" value.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • BOL/LMDZ_Setup/setup.sh

    r4988 r4989  
    816816#SBATCH -A "$groupe"@cpu
    817817#SBATCH --ntasks=1             # Nombre de processus MPI
    818 #SBATCH --cpus-per-task=16     # nombre de threads OpenMP
     818#SBATCH --cpus-per-task=1     # nombre de threads OpenMP
    819819# /!\ Attention, la ligne suivante est trompeuse mais dans le vocabulaire
    820820# de Slurm "multithread" fait bien référence à l'hyperthreading.
     
    880880   if [ $ok_guide != y ] ; then # Running first simulation automatically except for nudging
    881881      cat <<...eod>> tmp
    882          $submit tmp_$SIM
     882        # unset "tmp" job options before submitting tmp_$SIM ;
     883        #  otherwise, "--cpus-per-task" is "inherited" by tmp_$SIM regardless of the value in his own header
     884        bash -c 'unset \$(env | egrep "SLURM_|SBATCH_|SRUN_"| cut -d= -f1) ; $submit tmp_$SIM'
    883885...eod
    884886   fi
Note: See TracChangeset for help on using the changeset viewer.