Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/verbs;ofi_rxm: FI_HMEM fails to be detected when out of cache #9759

Open
thomasgillis opened this issue Jan 23, 2024 · 2 comments
Open

Comments

@thomasgillis
Copy link
Contributor

Hi all,

I am reaching out with a question that might also be an issue in verbs;ofi_rxm:
when exceeding the cache size (default is 1024), the address is interpreted as a system address by default in fi_writemsg, instead of a device one. Increasing the size of the cache with FI_MR_CACHE_MAX_COUNT solves the problem.

Is my issue a direct consequence of FI_MR_HMEM and the limit on the cache size? Or is it a missing detection of the pointer type in the provider?

Thanks for your time and your help :-)

@nikhilnanal
Copy link
Contributor

Hi @thomasgillis. I wanted to get some more information about the usage, even better would be if you have a reproducer which you could share which can help understand the usage. Does the application register send and then deregister the mr before the next one or does it register the buffers all at once and then send before deregistering. does the application deregister at all?. Is there any other setting/flags that are set for this test. what device are you using for the test?

@thomasgillis
Copy link
Contributor Author

Hi @nikhilnanal sorry just seeing this now.
I haven't touched the code since April (changed jobs), but IIRC we would register 2048 messages and then execute them all.
The reproducer is public, here is the link: https://github.com/pmodels/rmem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants