Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow importing with foreign keys #5097

Closed
ericmock opened this issue Jan 5, 2023 · 12 comments
Closed

Slow importing with foreign keys #5097

ericmock opened this issue Jan 5, 2023 · 12 comments

Comments

@ericmock
Copy link

ericmock commented Jan 5, 2023

While I have not done any explicit tests, it appears that the time it takes when importing data to a table with foreign keys grows like N^2. This isn't an issue when importing tables with thousands of rows, but it becomes very slow when importing tables with hundreds of thousands of rows. Are there optimizations that could be made?

@fulghum
Copy link
Contributor

fulghum commented Jan 5, 2023

Hey @ericmock, thanks for reaching out about this. We have recently done some work on data import (e.g. import performance overview and benchmarks, optimizing blob writes in imports), however, unique key constraints and foreign key constraints are still known to be expensive to check during larger imports.

Our recommendation right now is to split up the data import into two steps – 1) importing all the data first, and then 2) applying unique key and foreign key constraints. Alternatively, you can also disable foreign key checks temporarily while you run the import (i.e. SET FOREIGN_KEY_CHECKS=0;).

Let us know what you think and if any of those suggestions are helpful.

@ericmock
Copy link
Author

ericmock commented Jan 5, 2023

I'd been breaking things into smaller chunks as a workaround. It still seemed to get bogged down. A garbage collection seemed to help a bit.

I was not aware you could turn off the foreign key checks. Is that persistent across imports?

@fulghum
Copy link
Contributor

fulghum commented Jan 9, 2023

SET foreign_key_checks = 0; will turn off foreign key checks for only the current session. You would typically put this into your SQL batch file at the start to disable foreign key checks while you run all the import statements.

If you want this behavior to be global in a dolt sql-server mode (i.e. for all user sessions on a Dolt sql-server), you could run SET @@global.foreign_key_checks = 0;. You could also have the setting persist across dolt sql and dolt sql-server restarts with SET @@PERSIST.foreign_key_checks = 0;.

Just note that when you turn foreign_key_checks back on, it does not go back over any existing data and perform constraint checks – it will only perform those constraint checks on new changes going forward.

Glad you found the garbage collection tip! We're working on some GC enhancements that should remove the need to manually run GC, but until then, explicitly running GC after an import is a great practice.

btw... in case you didn't already find this... we have a guide for importing data dumps into Dolt, and there are a few best practices/tips listed at the end (including GC and a small mention of foreign keys). If you have any feedback on how that guide could have helped you more, we'd be happy to improve those docs.

@ericmock
Copy link
Author

ericmock commented Jan 9, 2023

Thank you for the great information. I had never scrolled to the end of that importing guide. That said, it might be good to put those tips at the top of the page. I suspect most users will find the section for the type of data they want to import and not read further down.

If I import with foreign key checking off, add data, then turn checking back on, dolt does not rescan the database to determine if constraints are violated, right? What if I try to merge that data with foreign-key violations with a database that has foreign-key checking turned on?

@timsehn
Copy link
Contributor

timsehn commented Jan 9, 2023

@ericmock
Copy link
Author

ericmock commented Jan 9, 2023

Thank you. Would there be any performance benefit to turning off constraint verification during import and then ensuring nothing was violated with the above command afterward?

@timsehn
Copy link
Contributor

timsehn commented Jan 10, 2023

It will be way faster. Turning off foreign key checking is very common on import.

@ericmock
Copy link
Author

ericmock commented Jan 10, 2023

Thanks. That's great to know. How would you (or can you) turn foreign-key checking off when using dolt table import? Or is it only possible when importing with load data through the SQL server?

@fulghum
Copy link
Contributor

fulghum commented Jan 10, 2023

Yup, you can also turn off foreign-key checks when using dolt table import – just pass the --disable-fk-checks flag.

The --disable-fk-checks flag is very simple to use, but here's the reference documentation for dolt table import just in case you want to dig into any other options.

Thanks for the good feedback on the import docs (tracking issue)! Let us know how the rest of your import experience goes and if there's any other feedback or questions you have or anything we can do to help.

@ericmock
Copy link
Author

Thank you for the feedback. I'm embarrassed that, after doing so many imports, I didn't notice the --disable-fk-checks flag.

@fulghum
Copy link
Contributor

fulghum commented Jan 10, 2023

No need to be embarrassed!! We got some good feedback from you on the import experience and got some takeaways to improve the Data Import guide, so this was definitely useful for us. 😄

@fulghum
Copy link
Contributor

fulghum commented Jan 13, 2023

I'm gunna close this one out since I think we got everything sorted out now and we've got a separate issue for some import doc updates. Don't be shy to reopen or ping us on Discord if you still need anything here!

@fulghum fulghum closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants