-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm running create_newcase underneath SLURM #98
Comments
This is almost certainly because in your ccs_config/machines/ file, you have CONFIG_BATCH set to slurm. Try setting it to none. To be sure, what's your full 'create_newcase' command? And what 'machine' are you running on (eg, ./xmlquery MACH)? And what do the contents of that ccs_config/machines//config_machines.xml look like? |
@briandobbins thanks for that suggestion! For your other questions, see #96 where I asked many of the same questions. |
Ok that got me past the failure.
There's this file here
but I can't tell how it's supposed to be run, I don't see any SLURM details in there. |
I think that's to be expected with CONFIG_BATCH set to none; you're asking for CIME (CESM) to not use some batch details and so the initial setup for the case doesn't provide it. (Which would be reasonable if you can't see the queues from inside your container.) I think you'd get around this by managing the SLURM plus container details yourself and then using One other idea might be to not set CONFIG_BATCH to none and run |
If I leave the SLURM setting in place
but build with the
I would expect the non-local setting to suppress the local detection. |
This |
Huh, I would expect that too. Looking at it a bit, I think this is because the Really it might be best to just leave BATCH_SYSTEM as none, and handle the batch configuration on your own for now.
I'm unsure. The |
Yeah, just to add briefly to this -- I often, when debugging, just write a very short script that changes into the run directory for a case, and calls 'mpiexec ../bld/cesm.exe'. You can easily write your own to do that if the model is building. Alternatively, as Dylan alludes to, you can also add a queue inside your config_machines.xml, and it'll generate the .case.run script for Slurm. All that ./case.submit does is some checks, reads some queue settings, and submits (via 'sbatch' for Slurm) the .case.run file. The only thing you need to watch out for when bypassing the full method is if you change namelists; you'll need to run ./preview_namelists manually if you do, else they won't propagate. |
I'm inclined to run these components individually so I can control the invocations more directly.
I don't much care if any of the steps are redundant after the first usage, I'm more concerned about bulletproofing the sequence of steps that I'd be running. |
I think I would go more for: ./create_newcase
./case.setup
./case.build
./preview_namelists && ./check_input_data
srun ... cesm.exe Basically that:
|
The
I'll be using |
Carl, can you share the container image or Docker / Singularity file to make it? These are almost certainly all simple omissions in the configuration file, but it's hard for us to debug without being able to fully see how things are configured, and try changes out. In this specific case, are you defining an 'openmpi' option for modules with the NVHPC compiler? The case.setup shouldn't need to know anything outside the container, so I don't think it should matter that it doesn't find your actual srun. (I'm assuming you're using the container to run, too, and thus the MPI in the container is compatible with the launch mechanism for the container on the cluster.) |
Ok I uploaded the docker-file here: |
I'm trying to generate a testcase that will be run underneath SLURM.
But, also, I'm running
create_newcase
inside a container that doesn't have SLURM installed.This error at the end here
What is it trying to use the queue for?
The text was updated successfully, but these errors were encountered: