-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select specific TSV files for import #7
Comments
That's a good idea. I can see that as useful if running on small rented VMs in the cloud or similar. The tricky bit is how to surface that in the CLI as something self-explanatory and self evident. For instance, can we assume:
Just loading all those files, as is currently done skirts those problems completely, because it offers a consistent dataset with everything a user could possibly want to extract. When starting to cherry-pick, it opens a can of worms from a user's perspective. If we at least assume the user has read the readme for this project, then they have a mental model of how things relate. As such they should be able to figure out from the diagram which tables they need in order to get the data they're after. As such it would then follow that surfacing an option that allows specifying a subset of table names would be the preferable approach. The program would then just fetch the corresponding TSV files and create the subset of relations that data subset allows for. What are your thoughts on a solution along that line? |
I also think that this should be done as intuitively as possible. And then I see two additional options: Ratings and Crew/People When I think about it, I would suggest a total of four options for the import process:
If you give users these four options to choose from, they don't even need to know the dependencies. |
Yes, presenting a sane set for people to choose from would seem the most intuitive. I'll see what I can do when I find some time to work on this. Thanks Marc. |
Hello,
As you already mentioned in your description, the database becomes very large. To keep the database smaller, it would be nice to be able to select which TSV files should be imported. In my case, for example, only the files title.basics.tsv.gz, title.akas.tsv.gz, and title.episode.tsv.gz are of interest (at least for now). Perhaps there is the possibility to implement this as a parameter.
Thank you.
Best regards,
Marc
The text was updated successfully, but these errors were encountered: