Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error handling for LibC.clock_gettime(CLOCK_MONOTONIC) calls #15309

Conversation

compumike
Copy link
Contributor

POSIX clock_gettime returns 0 on success and -1 on error. See https://man7.org/linux/man-pages/man3/clock_gettime.3.html#RETURN_VALUE

The code for Crystal::System::Time.compute_utc_seconds_and_nanoseconds correctly does a raise RuntimeError.from_errno(...) unless ret == 0. However, the code for Crystal::System::Time.monotonic incorrectly does a raise ... if ret == 1 (checking for positive one return value, not a negative one!). So any errors will not be detected, and the uninitialized timespec struct will be used by the caller.

This PR fixes this to make the monotonic return value checking logic be the same.

(My context is that in 1.14.0 I am still occasionally seeing an unreproducible segfault in CI related to Time.monotonic https://forum.crystal-lang.org/t/how-to-diagnose-occasional-segfault-in-time-monotonic-within-timecop-travel/6099 and came across this as I was looking more closely at the code.)

@@ -51,7 +50,8 @@ module Crystal::System::Time
info = mach_timebase_info
LibC.mach_absolute_time &* info.numer // info.denom
{% else %}
LibC.clock_gettime(LibC::CLOCK_MONOTONIC, out tp)
ret = LibC.clock_gettime(LibC::CLOCK_MONOTONIC, out tp)
raise RuntimeError.from_errno("clock_gettime(CLOCK_MONOTONIC)") unless ret == 0
Copy link
Contributor

@ysbaddaden ysbaddaden Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific method mustn't raise. It could make sure to initialize tp to zero. That's fine.

@ysbaddaden
Copy link
Contributor

I'm not sure we really need to check for errno. Reading the manage, it looks impossible to reach a failure (we do control all the values).

@compumike
Copy link
Contributor Author

That's good to know, thank you @ysbaddaden ! I've added a second commit to reduce the scope. This PR now only fixes the one explicitly incorrect return value check in Crystal::System::Time.monotonic.

I have not yet been able to consistently reproduce my Invalid memory access (signal 11) issue. I also don't have a great theory for how even a EINTR error from LibC.clock_gettime might possibly cause it. But earlier today, on 1.14.0, I had:

Invalid memory access (signal 11) at address 0x4
[0x5b4473081736] *Exception::CallStack::print_backtrace:Nil +118 in /app/.cache/crystal/crystal-run-spec.tmp
[0x5b4472fea116] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +310 in /app/.cache/crystal/crystal-run-spec.tmp
[0x7e75d2eb8050] ?? +139044514922576 in /lib/x86_64-linux-gnu/libc.so.6
[0x5b4473241f91] *Timecop::TimeStackItem#monotonic:Time::Span +17 in /app/.cache/crystal/crystal-run-spec.tmp
[0x5b4473203b43] *Time::monotonic:Time::Span +67 in /app/.cache/crystal/crystal-run-spec.tmp
[0x5b447348feee] *IdleGC::IdleDetection::fiber_yield_time:Time::Span +46 in /app/.cache/crystal/crystal-run-spec.tmp
[0x5b447348fde9] *IdleGC::IdleDetection::fiber_yield_time_ns:UInt64 +9 in /app/.cache/crystal/crystal-run-spec.tmp
[0x5b447348fd97] *IdleGC::IdleDetection::process_is_idle?:Bool +135 in /app/.cache/crystal/crystal-run-spec.tmp
[0x5b4472ff6b10] ~procProc(Nil) +208 in /app/.cache/crystal/crystal-run-spec.tmp
[0x5b4473151605] *Fiber#run:(IO::FileDescriptor | Nil) +117 in /app/.cache/crystal/crystal-run-spec.tmp
[0x5b4472fea7a6] ~proc3Proc(Fiber, (IO::FileDescriptor | Nil)) +6 in /app/.cache/crystal/crystal-run-spec.tmp
[0x0] ???
Command exited with non-zero status 11

and we have these errors randomly every few weeks on our CI builds. (I've never been able to reproduce it on demand, locally or in CI, despite trying.)

This made me take a closer look at the Time.monotonic code and discover the == 1 bug, which is fixed in this PR. But it sure seems something strange is going on that involves both Time.monotonic and a LibC::SiginfoT, so I'd sure like to make sure that the error handling picks up a EINTR just in case that's what's happening.

@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:runtime topic:stdlib:system and removed topic:stdlib:runtime labels Dec 23, 2024
@straight-shoota straight-shoota changed the title Fix error handling for LibC.clock_gettime(CLOCK_MONOTONIC) calls Fix error handling for LibC.clock_gettime(CLOCK_MONOTONIC) calls Dec 23, 2024
@ysbaddaden
Copy link
Contributor

@compumike It's an issue in timecop.

@straight-shoota straight-shoota added this to the 1.15.0 milestone Dec 23, 2024
@straight-shoota straight-shoota merged commit eb56b55 into crystal-lang:master Dec 26, 2024
69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants