Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic behavior with scotch 7 #87

Open
sblauth opened this issue Jul 17, 2024 · 4 comments
Open

Deterministic behavior with scotch 7 #87

sblauth opened this issue Jul 17, 2024 · 4 comments
Labels

Comments

@sblauth
Copy link

sblauth commented Jul 17, 2024

Comment:

Hello everyone,

I have a question / issue with scotch 7. Recently, I could update the dependencies of my code to use scotch 7. However, I have since experienced some issues when running my code in parallel - so I guess this is related to ptscotch / libptscotch, but I am not entirely sure.
I am using FEniCS, which in turn uses scotch for mesh partitioning and graph reordering. Since switching to scotch 7, some tests at https://github.com/sblauth/cashocs/ fail irreproducibly / non-deterministically when run in parallel. I know that this problem is related to scotch as changing the mesh partitioner to ParMETIS (which FEniCS also supports) does not raise any problems. Due to licensing issues, I would prefer to stick with scotch as mesh partitioning tool.

I have investigated the recipe for the conda-forge build a bit and it seems to me that the previous version (6.0.9), which works fine for me, sets some deterministic build flags, whereas version 7.0.4, with which I have the issues, does not?

As far as I have seen, this could be addressed dynamically in scotch 7 now (using contexts) - however, as I am using FEniCS from python, I have no idea how to do so - it seems that this cannot be done with environment variables.

Are my observations regarding determinism in the conda-forge build correct? Is there any way for me, who uses scotch via FEniCS, to restore the parallel determinism? Or would it be thinkable to provide a deterministic conda-forge scotch build?
I am also happy to provide further information if this is required.

Thanks a lot in advance,
Sebastian

@Tobias-Fischer
Copy link
Contributor

Hi there, thanks for the bug report! Can you please point us to the flags that you are talking about, and check if they still exist in scotch 7? If they do, would be happy to add them to the builds here.

@sblauth
Copy link
Author

sblauth commented Jul 17, 2024

Thanks for the quick reply. I will try to point you to the flags that I believe are the ones causing the problems, but I am really no expert in compiling things - I usually let conda do the work for me (sorry about that).

So investigating the recipe directory in the 6.0.9 PR https://github.com/regro-cf-autotick-bot/scotch-feedstock/tree/4532f8f5ec7e4094d7df9f6e317c24eb01f1eaf7/recipe it seems to me that the build.sh file used to build scotch is using the compile flags defined in Makefile.inc. There, the flag DCOMMON_RANDOM_FIXED_SEED is set at https://github.com/regro-cf-autotick-bot/scotch-feedstock/blob/4532f8f5ec7e4094d7df9f6e317c24eb01f1eaf7/recipe/Makefile.inc#L20

If I see this correctly, this option is not set in the current build script https://github.com/conda-forge/scotch-feedstock/blob/main/recipe/build-scotch.sh
I've looked in the recipe folder and could not find these flags being applied anywhere.

Moreover, it also seems that the flags DCOMMON_PTHREAD and DSCOTCH_PTHREAD are not set anymore in scotch 7, whereas they have been set in 6.0.9.

I guess that these are the compiler flags which are responsible for the (non)-determinism of parallel runs.
Based on the change log, these flags should still be available for scotch 7.

@Tobias-Fischer
Copy link
Contributor

In the cmake build that we use, these flags always seem to be set: https://github.com/live-clones/scotch/blob/82ec87f558f4acb7ccb69a079f531be380504c92/src/CMakeLists.txt#L49

So I’m not sure what’s causing the issue. Maybe someone else knows - I’m not overly familiar with scotch.

@sblauth
Copy link
Author

sblauth commented Jul 17, 2024

Okay, thanks a lot. Also the DCOMMON_PTHREAD and DSCOTH_PTHREAD flags seem to be set there - so this is not the issue.
If anyone else has an idea what could cause this non-determinism I would be really happy. I can also provide some examples with FEniCS that show the difference in behavior with scotch 7.0.4 and 6.0.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants