Support alternate character encodings for text data formats #135

martin-traverse · 2022-04-26T10:53:33Z

Feature Request

Description of Problem:

When storing/retrieving data via the platform API, support alternate encodings for text foramts. Currently only UTF-8 / UTF-16 are supported for storing, and only UTF-8 for retrieving.
Also in the runtime, it should be possible to specify encoding for input/output datasets. This is relevant when running in dev-mode, i.e. input/output datasets are being accessed directly by developers rather than passed back into the platform.
Nice-to-have - setting encoding on datasets stored in the TRAC platform. These are not normally visible to users as format translation happens when data is present through the platform or model APIs, but could be useful for integration e.g. if direct read access is granted to reporting systems. Since 1 & 2 provide all the required encoding translations, it is really a choice whether to enable configurable encodings in the storage layer or not.

Potential Solutions:

On the platform side, encoding should be availlable as a format option in data read/write/query requests and passed into the data codecs.

On the runtime, for dev mode, encoding should be passed in as a config option. This could be part of the storage config, or a separate config item under dev mode settings.

To set encoding in internal storage, the encoding would need to be set as part of the storage config, which gets passed to data codecs in both the platform and runtime.

martin-traverse · 2022-12-09T12:22:40Z

@greg-wiltshire is this still needed? I would suggest that we can certainly ditch output encodings for now and have the platform always return UTF-8, which is the current behavior. Input encodings might be needed though, to handle a variety of source data.

martin-traverse added the enhancement New feature or request label Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support alternate character encodings for text data formats #135

Support alternate character encodings for text data formats #135

martin-traverse commented Apr 26, 2022

martin-traverse commented Dec 9, 2022

Support alternate character encodings for text data formats #135

Support alternate character encodings for text data formats #135

Comments

martin-traverse commented Apr 26, 2022

Feature Request

Description of Problem:

Potential Solutions:

martin-traverse commented Dec 9, 2022