-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ancestry with ActiveImport #433
Comments
@Brotakuu are you doing a one time import, or is this a common process? There is a built in method called: It is tricky to populate a bunch of ancestry parents. Also, it sometimes gets botched if you don't populate the parents, then their children, then their children. I wonder if it is possible to make a sql statement that runs a few times that will build up the ancestry values 100% in the db. This is back of the napkin type coding: -- update root nodes
update comments
set ancestry = parent.id
from comments as parent ON parent.id = c.parent_id
where c.ancestry is null -- where I haven't been processed yet
and parent.parent_id is null -- my parent is a root then keep going until nothing gets updated ( -- update each level of nodes
update comments
set ancestry = concat(parent.ancestry,'/',parent.id)
from comments as parent ON parent.id = c.parent_id
where c.ancestry is null -- where I haven't been processed yet
and parent.ancestry is not null -- but my parent has been processed If this looks like it is working, maybe we can get it into the product. Since updating the hierarchy is a pretty common task |
Thanks @kbrock. I'm using postgresql and this is a weekly process of importing 300k comments which only have During the bulk import process, there's no way to set the parent_id since the parent comment is created at the same time as the descendants. So So you're suggesting running a recursive sql query to build the tree from top down? Do you think this would be faster than what I'm currently running:
|
@Brotakuu think your code is pretty similar to It sounds like you are building the whole tree of comments. If the depth is more than 1 deep (just parents and children), then it is important to update the records in a particular order. #429 explains how just setting the You need to make sure the parent's |
the performance differences will be <1 second vs an N+1. for us it was a few minutes vs under a second if you look at the ruby, so if you know either of those cases, you can simplify the resulting sql so right now we have
given that parent is a self join and parent either has a nil parent id and nil ancestry or non nil parent id and non nil ancestry in the other cases, the parent hasn't been updated yet and we need to approach in a separate pass. you can probably get some ideas from shows us starting at the bottom and going deeper. @d-m-u do you have bandwidth to fix this? I'm thinking we have too many loose ends right now to approach. (and this is from 2019. so while lots of people need this, including us, it may unfortunately take a backseat) |
@kbrock I think it's worth a shot. |
I'm using active import to import comment trees. Is there any way to create the ancestry tree during the import (when the primary key
id
is not created yet)?Currently, I'm importing all the comments, then looping through each comment to add the parent, and it is very slow even with indexes:
The text was updated successfully, but these errors were encountered: