Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underflow in timing code on aarch64 #1459

Open
manopapad opened this issue Apr 14, 2023 · 5 comments
Open

Underflow in timing code on aarch64 #1459

manopapad opened this issue Apr 14, 2023 · 5 comments
Labels

Comments

@manopapad
Copy link
Contributor

Originally reported by @1193749292 on nv-legate/legate#667. Bug occurs on an Ubuntu 20.04.5 LTS aarch64 system, on a debug build.

The following assertion is failing: https://github.com/StanfordLegion/legion/blob/stable/runtime/realm/timers.inl#L72. I asked @1193749292 to add some instrumentation around that code:

native = 125710472513989694 nanoseconds = 1645055 absolute = 0 zero_time = 1681439899479082722 LLONG_MAX = 9223372036854775807 uint64_t(LLONG_MAX) = 9223372036854775807
native = 125710472514051510 nanoseconds = 1665654 absolute = 0 zero_time = 1681439899479082722 LLONG_MAX = 9223372036854775807 uint64_t(LLONG_MAX) = 9223372036854775807
...(Several similar messages)

Signal 6legion_python: /root/xxx/legion-control_replication/runtime/realm/timers.inl:80: static long long int Realm::Clock::current_time_in_nanoseconds(bool): Assertion `nanoseconds <= uint64_t(LLONG_MAX)' failed.
 uint64_t(LLONG_MAX) = 9223372036854775807
9223372036854775807
legion_python: /root/xxx/legion-control_replication/runtime/realm/timers.inl:80: static long long int Realm::Clock::current_time_in_nanoseconds(bool): Assertion `nanoseconds <= uint64_t(LLONG_MAX)' failed.
 received by node 0, process 

It appears as if the code computing nanoseconds from native is producing a much smaller number than expected.

@lightsighter
Copy link
Contributor

Does this happen deterministically or randomly?

@streichler
Copy link
Contributor

None of those messages have a value of nanoseconds that would trigger the assert. Can we get the contents of the ... there to see if maybe one of the snipped messages is not actually similar?)

@1193749292
Copy link

1193749292 commented Apr 14, 2023 via email

@streichler
Copy link
Contributor

Can you attach the entire output? There's at least one partial logging message tangled up in the assert messages, so the other half of it should be somewhere in the blob of text.

@1193749292
Copy link

@streichler

All the output of the legion build, legate.core installation and running is here, especially the legate.core installation, which does end in

/root/miniconda3/envs/legion-xxx/lib/python3.9/site-packages/setuptools/command/egg_info.py:643: SetuptoolsDeprecationWarning: Custom 'build_py' does not implement 'get_data_files_without_manifest'.
Please extend command classes from setuptools instead of distutils.
  warnings.warn(

And then there's no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants