Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load timestamp-data as timestamp_tz #224

Open
larsrinn opened this issue Jul 16, 2024 · 5 comments
Open

Load timestamp-data as timestamp_tz #224

larsrinn opened this issue Jul 16, 2024 · 5 comments

Comments

@larsrinn
Copy link

larsrinn commented Jul 16, 2024

I have a tap which generates data with DateTimeType. Unfortunately, when loaded into Snowflake, the timezone information vanishes and data is stored as timestamp_ntz. Am I doing something wrong? I would like to have it as timestamp_tz

@edgarrmondragon
Copy link
Member

I think that's hardcoded here:

TypeMap(eq, TIMESTAMP_NTZ(), "date-time"),

Perhaps we could add a new target setting to switch to mapping from JSON date-time strings to TIMESTAMP_TZ.

PRs welcome!

@larsrinn
Copy link
Author

How would that work? Wouldn't it be required for the tap's schema to claim a given column has datetime-values with timezone information? I didn't find such an option in the Singer-SDK.

@edgarrmondragon
Copy link
Member

How would that work? Wouldn't it be required for the tap's schema to claim a given column has datetime-values with timezone information? I didn't find such an option in the Singer-SDK.

In short, what I mean is we could add a user-level setting, e.g. use_timestamp_tz to decide if the target uses TIMESTAMP_NTZ (the default) or TIMESTAMP_TZ.


The JSON Schema spec, which the target uses to determine column types, expects rfc3339 date-times1 but by default there's enforcement of that specific format, so expecting a full rfc3339 string by default might break some users' pipelines.

Footnotes

  1. https://json-schema.org/understanding-json-schema/reference/string#dates-and-times

@larsrinn
Copy link
Author

So what you're saying is:

  • The JSON-schema-spec defines date-time as timestamps with timezone information
  • For whatever reason (?) target-snowflake maps this to timestamp_ntz columns and hence, throwing away the timezone information
  • Changing the default parameter would break user's pipelines. Hence, you propose to introduce a new parameter to determine whether date-time should be mapped to timestamp_tz or timestamp_ntz, defaulting to timestamp_ntz?

@edgarrmondragon
Copy link
Member

All correct. In short, I worry that timestamp_tz might cast datetime strings differently and might not accept something like 2020-01-01T00:00:00 (i.e. without a tz component), which users might be currently sending to the target.

I worry that timestamp_tz might cast datetime strings differently and might not accept something like 2020-01-01T00:00:00

Is that something you could confirm? Essentially I want to validate how snowflake handles a few cases:

-- date
'2020-01-01'::TIMESTAMP_NTZ
'2020-01-01'::TIMESTAMP_TZ

-- date-time without tz
'2020-01-01T00:00:00'::TIMESTAMP_NTZ
'2020-01-01T00:00:00'::TIMESTAMP_TZ

-- date-time with offset
'2020-01-01T00:00:00+00:00'::TIMESTAMP_NTZ
'2020-01-01T00:00:00+00:00'::TIMESTAMP_TZ

-- date-time with Z
'2020-01-01T00:00:00+00:00Z'::TIMESTAMP_NTZ
'2020-01-01T00:00:00+00:00Z'::TIMESTAMP_TZ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants