= Debug Mode Run several tests in debug mode is a necessary step before committing your updates on the master branch. == COMPILE To compile the GCM or 1-D model, enter this command line: {{{ ./makelmdz_fcm -arch your_arch -parallel mpi -d 64x48x32 -p mars gcm_or_testphys1d -j 8 -debug }}} You can add or remove debug flag manually in the file trunk/LMDZ.COMMON/arch.fcm at the line %DEBUG_FFLAGS. Make sure you are using correct flags regarding your compiler, and the debug mode performs no optimization (e.g: -O0 with mpif90). == METHODOLOGY: 1 + 1 = 2 This methodology is very useful when: - the simulation crashes with no obvious reason - crash is not replicable at different duration simulations (e.g: 1 simulation of 60 days = OK, 3*20 days = crashed!) The procedure of the methodology is quite simple, modify the file called 'run_month1' - 1 run: 1 day and 1 day {{{ num_now = 1 num_end = 2 .... case $true_num in 1) sed/9999/1/ .... 2) sed/9999/1/ .... }}} At the end of the runs, you have done 2 days (tips: mv startfi2.nc startfi2_1j.nc) - 1 run: 2 days (mv startfi1.nc startfi1_2j.nc) {{{ num_now = 1 num_end = 1 .... case $true_num in 1) sed/9999/2/ .... }}} At the end of the runs, you have done 2 days in a single row (tips: mv startfi1.nc startfi1_2j.nc) Then, do the difference between startfi1_2j.nc and startfi2_1j.nc. The command ncdiff is very useful to deal with netcdf file (from nco package): {{{ncdiff startfi1_2j.nc startfi2_1j.nc diff_startfi.nc }}} Then, look at which variable is not equals to zero. {{{ncdump diff_startfi.nc > diff_startfi.txt }}} {{{Vi diff_stratfi.txt }}} === Tips You can do this check quickly by using the GCM variable 'ndynstep' in run.def file, this variable means the number of dynamical steps you want to perform during the simulation. Be sure to set this variable equals to iphysiq to perform at least one physical call. For example, with iphysiq = 10, you can do : 10 + 10 = 20 dynamical steps, corresponding to 1 call to physiq_mod + 1 call to physiq_mod = 2 calls to physiq_mod. == METHODOLOGY: 1 = 1 Same as 1 + 1 = 2, except you perform runs with 1 day with 1 CPU, and 1 day with 24 CPUs! Very useful when you have memory issues. ---- == USING GDB FOR DEBUG === Intro If you have a really vicious bug in your code, you may want to use the **gdb** tool. Here is some advice to use it, but it is not exhaustive, so please complete this info if you can. You can also find more details in the [https://sourceware.org/gdb/current/onlinedocs/gdb/GNU-Free-Documentation-License.html#GNU-Free-Documentation-License GDB documentation] To use gdb, you have to compile your model with the {{{-debug}}} option. This option especially contains two necessary compilation options that are {{{-O0}}} and {{{-g3}}}. Once it's done, go in your simulation repository (where your have all your .def and start files, and your gcm executable ''gcm_exec_debug.e''), and source your arch.env from your trunk/LMDZ.COMMON to have the required librairies to run the model. Then, run the command {{{ gdb gcm_exec_debug.e }}} It opens a gdb session. You can now run the model and interact with it while it runs. For instance, you can create ''break points'', which pause the run at a given point. You can then look at some variables via the ''print'' command, then continue the run. Here is an example below : {{{ >>> gdb gcm_exec_debug.e (gdb) break aeropacity_mod.f:183 ###<--- define a break point before running the program (gdb) run ###<--- start the program Starting program:... ..... ..... Breakpoint 1, aeropacity_mod::aeropacity (ngrid=3010, nlayer=54, nq=11, zday=0.0041666666666666666, pplay=..., pplev=..., ls=3.723693225921032e-05, pq=..., pt=..., ###<--- reach break point 1 tauscaling=..., dust_rad_adjust=..., irtoviscoef=..., tau_pref_scenario=..., tau_pref_gcm=..., tau=..., taucloudtes=..., aerosol=..., dsodust=..., reffrad=..., qrefvis3d=..., qrefir3d=..., omegarefir3d=..., totstormfract=..., clearatm=4294967295, dsords=..., dsotop=..., alpha_hmons=..., nohmons=4294967295, clearsky=.FALSE., totcloudfrac=...) at /scratch/cnt0027/lmd1167/abierjon/simurefs_topflows_GCM6/trunk_r2577/LMDZ.COMMON/libo/X64_OCCIGEN_64x48x54_phymars_para.e/.config/ppsrc/phys/aeropacity_mod.f:183 183 tau(1:ngrid,1:naerkind)=0 (gdb) break aeropacity_mod.f:266 ###<--- define a new break point Breakpoint 2 at 0x13e792a: file /scratch/cnt0027/lmd1167/abierjon/simurefs_topflows_GCM6/trunk_r2577/LMDZ.COMMON/libo/X64_OCCIGEN_64x48x54_phymars_para.e/.config/ppsrc/phys/aeropacity_mod.f, line 266. (gdb) continue ###<--- resume the program from break point 1 Continuing. ..... ..... Breakpoint 2, aeropacity_mod::aeropacity (ngrid=3010, nlayer=54, nq=11, zday=0.0041666666666666666, pplay=..., pplev=..., ls=3.723693225921032e-05, pq=..., pt=..., ###<--- reach break point 2 tauscaling=..., dust_rad_adjust=..., irtoviscoef=..., tau_pref_scenario=..., tau_pref_gcm=..., tau=..., taucloudtes=..., aerosol=..., dsodust=..., reffrad=..., qrefvis3d=..., qrefir3d=..., omegarefir3d=..., totstormfract=..., clearatm=4294967295, dsords=..., dsotop=..., alpha_hmons=..., nohmons=4294967295, clearsky=.FALSE., totcloudfrac=...) at /scratch/cnt0027/lmd1167/abierjon/simurefs_topflows_GCM6/trunk_r2577/LMDZ.COMMON/libo/X64_OCCIGEN_64x48x54_phymars_para.e/.config/ppsrc/phys/aeropacity_mod.f:271 271 IF(iaervar.eq.1) THEN (gdb) print iaervar ###<--- print the value of a local variable of aeropacity at break point 2 $1 = 4 (gdb) quit ###<--- quit gdb }}} Below, we give some more details on specific **gdb** commands. === Break points You can define break points by specifying a line where the program will pause : {{{(gdb) break PROGRAM_NAME:LINE_NB}}} **Careful** : if you specify a break point at a given line of the GCM code, it has to be the line number from the **pre-processed** version of the program ! (.f or .f90, in LMDZ.COMMON/tmp_src/). Break points also work by specifying a call to a function/subroutine : {{{(gdb) break SUBROUTINE_NAME}}} will pause the run when it encounters a {{{call SUBROUTINE_NAME}}}. It still enters the subroutine, which becomes the current context. If the subroutine is defined in a module, try {{{(gdb) break MODULE_NAME::SUBROUTINE_NAME}}} Break points can be defined before and during the {{{(gdb) run}}} (see the example above). === Watch points ''Watch points'' enables you to follow the state of a variable, by pausing the program when the value of this variable changes, and by displaying the change that has been done. You can set up a watch point via the command {{{(gdb) watch VAR_NAME}}}. **Careful** : you can set a watch point only on variables that have already been defined in your current context ! It can thus only be used after the {{{(gdb) run}}}, and usually requires to do a break point before in the subroutine (if you want to watch a local variable of this subroutine). === Print After you have stopped at some point in the code (thanks to a break or watch point for instance), you can print the value of variables. The syntax for printing variables that are defined in your current context (''ex:'' local variables or arguments of your subroutine) is simply {{{(gdb) print VAR_NAME}}} Try {{{(gdb) print MODULE_NAME::VAR_NAME}}} if you want to print a variable (''VAR_NAME'') from another module (''MODULE_NAME'') that the subroutine is using. If you want to look at a variable saved in a Fortran ''COMMON'' block (''ex:'' variables from the infamous ''callkeys.h''), doing {{{(gdb) info common}}} will print the value of all the variables from the ''COMMON'' blocks. ''NB:'' if you have stopped in a subroutine which is now your current context, you can go back to your main program (where you have the instruction {{{call SUBROUTINE_NAME}}}) and thus change the context thanks to the command {{{(gdb) up}}}. To go back in your subroutine, use {{{(gdb) down}}}. === List Pretty useful, the command {{{(gdb) list}}} displays the code line where you just stopped, as well as some of the lines that will come just after. Remember that when you pause at a given line, you haven't executed it yet ! (you're stopped at the beginning of the line) === Continue / Step After you have paused the run, you can resume running with the command {{{(gdb) continue}}} You can also just run the next line and pause again with the command {{{(gdb) step}}} === Debug in parallel If you want to do some debug while running in parallel, it should be possible to do a {{{mpirun gdb gcm_exec_debug.e}}}, which will run multiple instances of {{{gdb gcm_exec_debug.e}}} (one in each thread) that will still communicate and wait for each other when they need it (as when you run the model directly, in parallel). Note that this is a bit complicated and not so well tested among us. It may be better to use other tools, like **ddd** or **ddt** (though they are not installed on all machines).