Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use entire dataset to infer schema on array importing #204

Open
efirs opened this issue Feb 13, 2023 · 0 comments
Open

Use entire dataset to infer schema on array importing #204

efirs opened this issue Feb 13, 2023 · 0 comments

Comments

@efirs
Copy link
Collaborator

efirs commented Feb 13, 2023

In order to improve specific types detection it's important to use as many documents as possible for schema inference and in the case of arrays, unlike streams, we have a luxury to access entire dataset.

Currently for both streams and arrays inference depth is equal to batch size.
So as arrays are loaded into memory we can process entire array twice, first time for schema inference and second time for importing data.

@efirs efirs added this to Tigris Feb 13, 2023
@ovaistariq ovaistariq moved this to 🆕 New in Tigris Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants