Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alternate character encodings for text data formats #135

Open
martin-traverse opened this issue Apr 26, 2022 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@martin-traverse
Copy link
Contributor

Feature Request

Description of Problem:

  1. When storing/retrieving data via the platform API, support alternate encodings for text foramts. Currently only UTF-8 / UTF-16 are supported for storing, and only UTF-8 for retrieving.

  2. Also in the runtime, it should be possible to specify encoding for input/output datasets. This is relevant when running in dev-mode, i.e. input/output datasets are being accessed directly by developers rather than passed back into the platform.

  3. Nice-to-have - setting encoding on datasets stored in the TRAC platform. These are not normally visible to users as format translation happens when data is present through the platform or model APIs, but could be useful for integration e.g. if direct read access is granted to reporting systems. Since 1 & 2 provide all the required encoding translations, it is really a choice whether to enable configurable encodings in the storage layer or not.

Potential Solutions:

On the platform side, encoding should be availlable as a format option in data read/write/query requests and passed into the data codecs.

On the runtime, for dev mode, encoding should be passed in as a config option. This could be part of the storage config, or a separate config item under dev mode settings.

To set encoding in internal storage, the encoding would need to be set as part of the storage config, which gets passed to data codecs in both the platform and runtime.

@martin-traverse martin-traverse added the enhancement New feature or request label Apr 26, 2022
@martin-traverse
Copy link
Contributor Author

@greg-wiltshire is this still needed? I would suggest that we can certainly ditch output encodings for now and have the platform always return UTF-8, which is the current behavior. Input encodings might be needed though, to handle a variety of source data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant