Erroneous values in the velocity field, possible issue with the parallelization? #562
-
Dear NekRS community, I'm testing a fairly simple CHT channel case on current master (4f87e0e), with MPT 2.26 and GCC 12.2.0 on HLRS HAWK. Occa Mode: serial. I'm unable to reproduce the issue on occa Mode: CUDA, but I also can't run on GPU with this amount of ranks. For reference, the t-mesh has 35200 elements, the v-mesh has 28800 elements. In the output (snapshot files, opened with paraview 5.10) I notice spurious values on some element corners. It kind of looks like a gather-scatter operation gone wrong: (These values are emphasised during a "move" operation in Paraview, so they are pretty easy to spot) The location and amount of these spurious values depend on the number of ranks. Here's a histogram of the streamwise component of the streamwise velocity gradient after 10 timesteps (all values should be in the (-10, 10) bin):
i.e. the number of spurious values roughly quadruples when increasing the number of ranks by a factor of 4, which would suggest an issue with the parallelisation. Some additional observations:
Here's my case for reference: Did anyone else encounter such an issue? What was your solution? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Looks like I copied a bug from the examples. This loop ought to start with 0: Then it's obvious why the number of spurious values scales with the number of ranks! I don't really understand why they don't disappear over time though. Anyway, the issue is probably solved? |
Beta Was this translation helpful? Give feedback.
Looks like I copied a bug from the examples. This loop ought to start with 0:
nekRS/examples/turbChannel/turbChannel.udf
Line 46 in 4f87e0e
Then it's obvious why the number of spurious values scales with the number of ranks! I don't really understand why they don't disappear over time though. Anyway, the issue is probably solved?