Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run make check and make check-short targests (Dev-build) #56

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

iamashwin99
Copy link
Collaborator

Add a new target, spack-dev-build which builds the docker file Docker-dev-build.
This target aims to use the spack dev-build command to install octopus in a specific directory and then run make checks from it.

@iamashwin99
Copy link
Collaborator Author

In the latest build, the make check failed:

Failed:  143 / 144

@iamashwin99
Copy link
Collaborator Author

Ah, figured out the reason for all the tests failing with Martin Leuders, we are using the variable OCT_VERSION to specify the octopus version to spack, but when the test suite executes, this variable interferes with the input parser, which thinks VERSION is an input parameter and thus all the tests skip due to incorrect inp file.

@iamashwin99 iamashwin99 marked this pull request as ready for review November 24, 2022 19:48
@iamashwin99
Copy link
Collaborator Author

I feel that the changes required for this PR are now complete. @fangohr what are your thoughts on this PR?

Copy link
Owner

@fangohr fangohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice as it demonstrates (i) how to use the spack dev-build command, and (ii) that octopus compiles within the dependencies installed by spack, and that (iii) the octopus tests pass (as least most of them).

More questions will follow: are some tests skipped because we are missing libraries, and are those that fail 'okay' to fail? But those questions are outside this PR.

Regarding implementation: there is some amount of duplication (for example of the various flags). Some of this (such as the flags) could be stream lined by defining them once. (See comment in changes.)

Some other duplication may be harder to get rid of; I think that is okay for now. We can do it later if we think it is important.

Docker-dev-build Show resolved Hide resolved
Docker-dev-build Outdated

RUN . $SPACK_ROOT/share/spack/setup-env.sh && \
# create a new environment for the serial version and activate it:
spack env create octopus-serial && \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all the commands in lines 68 to 99 in one line? Doesn't that make it hard to read the output? See for example the beginning of a lot of output at this step 21: https://github.com/fangohr/octopus-in-spack/pull/56/checks#step:3:940

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made them in one line to minimise the size of the docker container thus formed. Do you think I should split them into multiple RUN commands?

Docker-dev-build Outdated Show resolved Hide resolved
Docker-dev-build Outdated
# display specs of upcoming spack installation:
spack spec octopus@${OCT_VERSION} ~mpi+netcdf+parmetis+arpack+cgal+pfft+python+likwid+libyaml+elpa+nlopt~debug~cuda~metis~scalapack && \
# run the spack installation (adding it to the environment):
spack dev-build octopus@${OCT_VERSION} ~mpi+netcdf+parmetis+arpack+cgal+pfft+python+likwid+libyaml+elpa+nlopt~debug~cuda~metis~scalapack && \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment A: I think the spack dev-build does not install Octopus but just the dependencies. Is that correct? (See Comment B below.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, spack dev-build would be identical to spack install except that instead of spack fetching the source, we manually provide the source and the build directory. In the end all the dependencies are installed, octopus is compiled with these dependencies and the resulting octopus binary is added to the spack's database. See here

Docker-dev-build Show resolved Hide resolved
Docker-dev-build Show resolved Hide resolved
Docker-dev-build Outdated
tar --strip-components=1 -xf octopus-12.1.tar.gz -C ./&& \
# install the serial version of octopus:
# display specs of upcoming spack installation:
spack spec octopus@${OCT_VERSION} ~mpi+netcdf+parmetis+arpack+cgal+pfft+python+likwid+libyaml+elpa+nlopt~debug~cuda~metis~scalapack && \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long list of flags is repeated in line 81. And then again (most of them) in line 112 and 114. Perhaps define them once in the beginning in a variable and only change here what is different for each case?


# # MPI version

RUN . $SPACK_ROOT/share/spack/setup-env.sh && \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for serial block.

python3 /home/user/check_buildlog.py mpi_check-long.log foss2021a-mpi || /bin/true; \
spack env deactivate

# Provide bash in case the image is meant to be used interactively
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a summary of passed and failed tests (if easily done) for serial and mpi. So that you can scroll to the end and find the most interesting information there.

Or perhaps just re-run the check_buildog.py script at the very end.

Copy link
Collaborator Author

@iamashwin99 iamashwin99 Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently check_buildog.py is the last thing thats run for each variant.
It gives the summary in the following format:

-+-+-+-+ Parsing serial_check-short.log
Unexpected failures :
periodic_systems/24-hartree_fock_1D.test, functionals/18-mgga.test, finite_systems_3d/37-sternheimer_polarized.test, finite_systems_3d/06-restart.test, finite_systems_1d/01-hydrogen.test
Built with tool chain : foss2021a-serial
-+-+-+-+ Parsing serial_check-long.log
Unexpected failures :
real_time/14-absorption-spinors.test, periodic_systems/21-magnon.test, periodic_systems/07-mgga.test, linear_response/01-casida.test, finite_systems_3d/16-scfinlcao_std.test
Built with tool chain : foss2021a-serial

------------------ Build log and check log for MPI variant ---------------


+|+|+|+ Parsing mpi_check-short.log
Unexpected failures :
real_time/12-absorption.test, periodic_systems/24-hartree_fock_1D.test, functionals/18-mgga.test, finite_systems_3d/37-sternheimer_polarized.test, finite_systems_3d/06-restart.test
Built with tool chain : foss2021a-mpi
+|+|+|+ Parsing mpi_check-long.log
Unexpected failures :
real_time/14-absorption-spinors.test, periodic_systems/07-mgga.test, linear_response/01-casida.test, finite_systems_3d/16-scfinlcao_std.test
Built with tool chain : foss2021a-mpi

Did you want it in some other format?

@iamashwin99
Copy link
Collaborator Author

The new order of RUN fails to install even though the logs says its installed:

203701   │ 2022-11-29T16:08:03.6034265Z make[1]: Leaving directory '/home/user/dev-build-serial'
203702   │ 2022-11-29T16:08:05.2377959Z ==> octopus: Successfully installed octopus-12.1-molat6lf7auebazyxtjedgt6ifotszbf
203703   │ 2022-11-29T16:08:05.2378517Z   Fetch: 0.00s.  Build: 8m 36.60s.  Total: 8m 36.60s.
203704   │ 2022-11-29T16:08:05.2379594Z [+] /home/user/spack/opt/spack/linux-debian11-broadwell/gcc-10.2.1/octopus-12.1-molat6lf7auebazyxtjedgt6ifotszbf
203705   │ 2022-11-29T16:13:49.1110735Z Removing intermediate container 7f978c987401
203706   │ 2022-11-29T16:13:49.1119733Z  ---> c9ad4fd4804e
203707   │ 2022-11-29T16:13:49.1120626Z Step 25/40 : RUN . $SPACK_ROOT/share/spack/setup-env.sh &&       spack env activate octopus-serial &&       spack test run --alias test_serial octopus &&       spack test results -l test_serial
203708   │ 2022-11-29T16:13:49.2796493Z  ---> Running in 8b1755d479d4
203709   │ 2022-11-29T16:13:56.2173645Z ==> Warning: No installed packages match spec octopus
203710   │ 2022-11-29T16:13:56.2174028Z ==> Spack test test_serial
203711   │ 2022-11-29T16:13:56.2174267Z ============================= 0 passed of 0 specs ==============================

@iamashwin99
Copy link
Collaborator Author

The dev-build has issues with the number of jobs requested :
https://github.com/fangohr/octopus-in-spack/actions/runs/4364482894/jobs/7631855756

These are the last lines of stderr:

----------------------------------------
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4
slots that were requested by the application:

  /usr/bin/nice

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.

@fangohr
Copy link
Owner

fangohr commented Mar 13, 2023

I think this means we are trying to run an MPI process with 4 processes, but github cannot provide that? Can we tell the octopus tests to only use 2 processes for MPI tests? That may (!) solve the issue.

@fangohr
Copy link
Owner

fangohr commented Mar 11, 2024

Discussion today (with @iamashwin99 @glaweh @lang-m)

  • long command lines in Dockerfile should be shorter by moving into bash script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants