Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Stream map defined using record or _ syntax fails to look up correct data type #2656

Open
dluo-sig opened this issue Sep 9, 2024 · 2 comments

Comments

@dluo-sig
Copy link

dluo-sig commented Sep 9, 2024

Another similar issue in CustomStreamMap:

                existing_schema: dict = (
                    # Use transformed schema if available
                    transformed_schema["properties"].get(prop_key, {})
                    # ...or original schema for passthrough
                    or self.raw_schema["properties"].get(prop_def, {})
                )

If the stream map is defined using _['columnName'] or record['columnName'], the type mapping falls back to string because it doesn't look at the underlying column name, but rather treats the entire expression as the column that it's searching for. prop_def should check for that specification and pass through the appropriate column name.

Originally posted by @dluo-sig in #1695 (comment)

@dluo-sig
Copy link
Author

dluo-sig commented Sep 9, 2024

Also, another thing to note is that when renaming a column and assigning the original to __NULL__, the null currently must be defined second in order to pick up the data type here. Otherwise, it gets popped from the schema before the type can be grabbed. One potential solution is to order the stream maps, but not sure if this might have some other downstream effect.

for prop_key, prop_def in sorted(stream_map.items(), key=lambda item: item[1] == "__NULL__"):

@dluo-sig
Copy link
Author

dluo-sig commented Sep 9, 2024

Here is also the transformation I applied to prop_def to get the actual column name. Again, this has not been extensively tested yet.

# Extract the property name from prop_def if it matches the pattern _['ColumnName'] or record['ColumnName']
def extract_property_name(prop_def):
    match = re.match(r"[_record]+\['(.+?)'\]", prop_def)
    return match.group(1) if match else prop_def

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant