Use entire dataset to infer schema on array importing #204

efirs · 2023-02-13T17:26:22Z

In order to improve specific types detection it's important to use as many documents as possible for schema inference and in the case of arrays, unlike streams, we have a luxury to access entire dataset.

Currently for both streams and arrays inference depth is equal to batch size.
So as arrays are loaded into memory we can process entire array twice, first time for schema inference and second time for importing data.

efirs added this to Tigris Feb 13, 2023

ovaistariq moved this to 🆕 New in Tigris Feb 14, 2023

ovaistariq added the data import label Feb 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use entire dataset to infer schema on array importing #204

Use entire dataset to infer schema on array importing #204

efirs commented Feb 13, 2023

Use entire dataset to infer schema on array importing #204

Use entire dataset to infer schema on array importing #204

Comments

efirs commented Feb 13, 2023