Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector stops with panic at to_unix_timestamp format:nanoseconds VRL function #978

Open
r3code opened this issue Aug 5, 2024 · 5 comments · May be fixed by #979
Open

Vector stops with panic at to_unix_timestamp format:nanoseconds VRL function #978

r3code opened this issue Aug 5, 2024 · 5 comments · May be fixed by #979
Assignees
Labels
type: bug A code related bug vrl: stdlib Changes to the standard library

Comments

@r3code
Copy link

r3code commented Aug 5, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Vector panics at timestamp conversation to_unix_timestamp and crashes. Log messages:

vector[582913]: thread 'vector-worker' panicked at 'value can not be represented in a timestamp with nanosecond precision.', /cargo/registry/src/index.crates>
vector[582913]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
ox2-vctr-srv01 vector[582913]: thread 'vector-worker' panicked at 'internal error: entered unreachable code: join error or bad poll', src/topology/builder.rs:890:30
ERROR transform{component_kind="transform" component_id=check_unix_timestamp component_type=remap component_n
INFO vector: Vector has stopped.

We found out that it was a very small date in some of the logs. This one message has crashed all our 3 aggregators. How: we have Kafka between agents and aggregators, one message crashed first vector and was not consumed, then other two after rebalancing tried to consume this message and also crashed. All 3 aggregators were continuously restarting every 10 seconds, until our fix in VRL.

Configuration

Vector 0.33.1 
Streaming model (agent - kafka -aggregators)

if exists(.ts) {
      timestampTs , err = parse_timestamp(del(.ts), "%+")
      if err != null {
          .err = err
      } else {
         .Timestamp = to_unix_timestamp(timestampTs, "nanoseconds")
      }
    }

Version

0.33.1

Debug Output

No response

Example Data

Input:
"ts": "1677-09-21 00:12:43.145224192Z" - OK. VRL Playground
"ts": "1677-09-21 00:12:43.145224191Z" - Fail. thread 'vector-worker' panicked at 'value can not be represented in a timestamp with nanosecond precision.' VRL Playground
"ts": "0000-09-21 00:12:43.145224192Z" - Fail. thread 'vector-worker' panicked at 'value can not be represented in a timestamp with nanosecond precision.' VRL Playground

Reproduced in VRL Playground
изображение

Additional Context

No response

References

fn to_unix_timestamp(value: Value, unit: Unit) -> Resolved {

Tasks

No tasks being tracked yet.
@r3code r3code added the type: bug A code related bug label Aug 5, 2024
@r3code
Copy link
Author

r3code commented Aug 5, 2024

We have some workarounds:

  1. Replace nanoseconds with milliseconds, it helps. VRL Playground Workaround Demo, with 0000 year
  2. As we expect dates in RFC3339 than year goes first, we work with logs so it's not possible to receive a log from the past
    Vrl Playground Workaround Demo

But we still want to use nanoseconds precision.

@iFurySt
Copy link

iFurySt commented Aug 5, 2024

This was done on purpose, reference this: chronotope/chrono#1123

1s = 1e9ns

i64 range: -2^63 ~ +2^63 = -9223372036854775808 ~ 9223372036854775807

range in ts:
-9223372036854775808/1e9=-9223372036.854775808s=1677-09-21T00:12:44.854775808Z
9223372036854775807/1e9=9223372036.854775807s=2262-04-11T23:47:16.854775807Z

The timestamp has elapsed since 1970-01-01T00:00:00Z, so it can only use the i64 to represent the ts in ns from 1677-09-21T00:12:44.854776Z to 2262-04-11T23:47:16.854776Z. This means we can only count up or down around 300yr from 1970-01-01T00:00:00Z 🙂

The next Y2K bug will strike in 2262. 🤣

@iFurySt
Copy link

iFurySt commented Aug 5, 2024

But it may be worth throwing errors rather than panic in vector. @jszwedko WDYT?

@jszwedko
Copy link
Member

jszwedko commented Aug 6, 2024

But it may be worth throwing errors rather than panic in vector. @jszwedko WDYT?

Agreed, I think we should catch this and return an error from to_unix_timestamp.

I'll transfer this issue to the VRL repo.

@jszwedko jszwedko transferred this issue from vectordotdev/vector Aug 6, 2024
@jszwedko jszwedko added the vrl: stdlib Changes to the standard library label Aug 6, 2024
@pront
Copy link
Collaborator

pront commented Aug 6, 2024

Good catch, thanks! I will prepare a fix. I will also do a grep for usages of timestamp_nanos. All those need to be replaced with timestamp_nanos_opt.

@pront pront self-assigned this Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug vrl: stdlib Changes to the standard library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants