Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a new built-in test for taps to verify that fields and streams names are normalized #2631

Open
edgarrmondragon opened this issue Aug 29, 2024 · 0 comments

Comments

@edgarrmondragon
Copy link
Collaborator

Feature scope

Taps (catalog, state, stream maps, tests, etc.)

Description

Special characters and punctuation in stream and field names should be normalized to make everyone's lives easier:

  • campaign:id -> campaign_id
  • running+shoes -> running_shoes
  • Growth (%) -> growth_pct

I propose a new built-in test that checks stream and field names againts a normalization spec. Perhaps:

import re

NORMAL = re.compile(r"[^a-zA-Z0-9_]")

SPECIAL_STRING = {
    "-1": "minus_one",
    "%": "pct",
    "(": "",
    ")": "",
}


def normalize_string(input_string):
    """Normalize and convert to lower snake case.

    Also replace special strings with their normalized version.
    """
    result = input_string.lower()
    for k, v in SPECIAL_STRING.items():
        result = result.replace(k, v)

    return NORMAL.sub("_", result)


ex1 = "Example 123"
assert normalize_string(ex1) == "example_123"

ex2 = "campaign.id"
assert normalize_string(ex2) == "campaign_id"

ex3 = "Growth (%)"
assert normalize_string(ex3) == "growth_pct"

ex4 = "Revenue (USD)"
assert normalize_string(ex4) == "revenue_usd"

If this is done, users would appreciate if we

  • Make it easy to normalize names automatically and predictably
  • Make it easy to turn the test off in case they don't care about name normalization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant