-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Action with Pangea-3 installation reproduction and ppc64le emulation #257
Conversation
- rename DOCKERFILE variable as TPL_DOCKERFILE to avoid conflicts with run-on-action variable - call a ppc64le ubuntu20 to install docker and call docker build to build the suitable TPL image
1620a6a
to
17a52a9
Compare
Preliminary Remarks
Job Failure
Evaluation of emulation layer slowdown
PerspectivesEven if we succeed to build TPLs in the suitable time, as GEOS Cuda build is a lot slower ( Nevertheless we can list some improvement paths for the current PR and the TPL build:
From a more global perspective (GEOS project), thanks to @sframba propositions:
|
Hello @Algiane thank you for your comments. The timing issue is surely something to keep in mind, but before getting to this, I'd like to get a little more information about the process.
|
Hi @TotoGaz , On The geos
For now, the test of the executable on P3 is tweaked. I:
Please let me know if you need more tests. Best |
For that specific purpose, you can run |
@Algiane Is it fair to state that now the issue is really a timing issue? That if we had a very very powerful machine, that would work OK? Cross compiling is something that can be very challenging. Furthermore, cross compiling the |
Thanks for the For me, with this method we have 2 issues:
For now, as the emulation seems to be a dead-end but we still don't have a solution to test the P3 configuration, I will let this PR as a draft and try to see if we can connect a Best |
We have a powerful self-hosted machine. Do you think that could do it?
I'm surprised that this gets so big. E.g. https://hub.docker.com/r/geosx/pecan-gpu-gcc8.2.0-openmpi4.0.1-mkl2019.5-cuda11.5.119/tags is ~4.4GB (still very big, but half). Do you know what get's it so big? We're using a lot the multi-stage approach to remove the temporaries. Are you doing the same? Also, if we manage to run it on a comfortable self-hosted machine, would the size issue become secondary? |
Maybe: it depends on the time needed to build the TPLs and Geos on this machine. We can multiply these times by 15 to have an order of the times needed with the emulation layer.
I have about the same size for the image on DockerHub but it uses compression. Once pulled, for example, the |
@sframba : I have tested the connection of a |
… prepare the split of build and push steps.
…just before push and do not logout on streak2.
3a5d6ba
to
8701d91
Compare
8701d91
to
08c41c3
Compare
…on P3.Dockerfile.
702f8be
to
fc924a5
Compare
New job that:
emulates a ppc64 architecture (using the docker/setup-qemu-action that relies on the use of
qemu
through theqemu-user-static
image);deploy a AlmaLinux-8 image on which TPLs' dependencies are installed with respect to the pangea3 modules needed to build the TPLs:
The Dockerfile used to build this image is provided in
docker/TotalEnergies/Pangea3-base.Dockerfile
and available on my DockerHub account under thepangea-almalinux8-gcc9.4-openmpi4.1.2-cuda11.5.0-openblas0.3.18
name with tag4
:7g8efcehpff/pangea-almalinux8-gcc9.4-openmpi4.1.2-cuda11.5.0-openblas0.3.18:4
.adds a
docker/TotalEnergies/Pangea3.Dockerfile
file that allows to build the ppc64le docker image with built and and installed TPLs for geos;adds a
RUNS_ON
matrix variable to the job matrix to allow the use of different runners (it is needed to run on a self-hosted runner more powerful than the default github runners due to the slowdown introduced by the emulation layer);removes the push step from the
docker_build_and_push.sh
script and rename this scriptdocker_build.sh
;moves the authentication to docker before the attempt to push the docker image and do not logout on
streak2
: it solves errors when pushing images (access denied
) due to race condition between jobs (if 2 jobs run at the same time on the machine, one job may remove the login credentials between the moment the first job login to docker and the moment it attempts to push the image);adds a dedicated step for the
docker push
command.Linked to EPIC TTE Builds and Geos PR 3159