Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sombrero] Handle systems where number of cores doesn't divide lattice volume #250

Merged
merged 3 commits into from
Dec 14, 2023

Conversation

giordano
Copy link
Member

@giordano giordano commented Dec 4, 2023

Should fix #237 (comment). I followed the suggestion in #237 (comment). Probably this isn't the most efficient algorithm out there, but this takes a bunch of microseconds on my laptop, so hopefully it won't be a major performance bottleneck:

In [7]: %%timeit
   ...: max_num_tasks(112, LATTICE_VOLUME)
   ...: 
   ...: 
3.75 µs ± 24.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

CC: @mirenradia.

@@ -122,4 +137,7 @@ def set_up_from_parameters(self):

@run_after('setup')
def setup_num_tasks(self):
self.num_tasks = self.current_partition.processor.num_cores * 64
self.num_tasks = max_num_tasks(
self.current_partition.processor.num_cores * 64,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether 64 should multiply self.current_partition.processor.num_cores or the result of max_num_tasks. In the latter case we'd greatly oversubscribe the node, right? But that was already the case before, so probably that was it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latter case we'd greatly oversubscribe the node, right? But that was already the case before, so probably that was it?

I think this test was intended to run on 64 nodes so it shouldn't be oversubscribed. Maybe we should use the num_tasks_per_node option instead and calculate the max_num_tasks according to LATTICE_VOLUME / 64 to ensure even load balancing and that we actually use 64 nodes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, setting num_tasks_per_node is cleaner in this case, I did that, thanks!


from benchmarks.apps.sombrero import case_filter
from benchmarks.modules.reframe_extras import scaling_config
from benchmarks.modules.utils import SpackTest

# Fixed lattice volume in ITT benchmarks
LATTICE_VOLUME = 32 * 24 * 24 * 24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the Sombrero README, I think this is only the lattice volume in the case -s small is passed to the executable. For the 64 node test, -s medium is currently passed so the lattice volume is $48^3 \cdot 64$. It's possible these might be changed as well in order to resolve #246.

Having said that, I think this will probably still work fine since the prime factor decomposition of these numbers are both of the form $2^p3^q$ and they're much bigger than the number of cores on even the largest nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I defined LATTICE_VOLUME_SMALL and LATTICE_VOLUME_MEDIUM to distinguish the two cases.

@mirenradia
Copy link
Contributor

Probably this isn't the most efficient algorithm out there, but this takes a bunch of microseconds on my laptop, so hopefully it won't be a major performance bottleneck.

I agree that this is extremely unlikely to be a problem so I have no issues with it.

Copy link
Contributor

@mirenradia mirenradia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with these changes other than my 1 small comment.

I have tested that the ITT-sn one works on the CSD3 Icelakes but unfortunately can't test the ITT-64n one as this exceeds the max job size (I did test it worked by replacing 64 with 48).

Comment on lines 142 to 144
LATTICE_VOLUME_MEDIUM // 64,
)
self.num_tasks = self.num_tasks_per_node * 64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very tiny comment. Could you replace the two uses of the magic number 64 with a variable e.g. num_nodes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Can you please check it works for you as expected?

@mirenradia
Copy link
Contributor

mirenradia commented Dec 14, 2023

LGTM (but I can't approve it until you re-request a review).

I have tested this works on the CSD3 Icelakes as expected (by passing -S num_nodes=48 for the ITT-64n test).

@giordano
Copy link
Member Author

Thanks!

@giordano giordano merged commit 364fdb8 into main Dec 14, 2023
4 checks passed
@giordano giordano deleted the mg/sombrero-ntasks branch December 14, 2023 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple issues with Sombrero benchmark
2 participants