-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow queries caused by large QueuedJobDescriptor table, and lack of indexes #336
Comments
Related issue: silverstripe/silverstripe-contentreview#143 |
Putting an index on Implementation is pretty trivial @mateusz can you see any downside to this? |
An index on a column that is 255 character limit could potentially cause the Database index size to balloon significantly. For a better performance approach, an alternative "identifier" which is smaller should be considered (e.g. varchar(10)). This could be achieved through hashing the currently Implementation into a 10character string, or perhaps some other type of identification method. |
That ClassName in the original description is 63 chars long, it's probably a pretty typical length. This doesn't seem like it's a cause for concern at all? We do index ClassName on every other table, though that's stored as an enum instead of a varchar, not sure if that affects the index size.
I not too keen on this at all. We could add a new column for this, though then there's the issue of all the existing data that would have null values for this. We'd end up having to do something like
And HashColumn will still be null lots of the time, so it's kind of solved nothing Also I'm not aware of this pattern being followed anywhere else in Silverstripe, and I'm not keen to add in extra complexity for the sake of a problem (oversized indexes) that may not even exist |
For me the main thing is to hear and see reflected in docs, if hundreds of thousands of QueuedJobDescriptor records is considered "normal" for this module. If it's normal - then need to find ways to get the module actually perform, and index is some kind of a solution. If it's not normal and the module can't handle it - I'd like to see the cleanup job enabled, and running by default. Brett might be right about the index size ballooning, although I don't know enough to offer readily usable advice. For comparison, URLSegment index (b-tree) on SiteTree (varchar(255)) on 20k records in this particular database is 1.5MB. That would suggest 75 MB for 1 mil records. Theoretical maximum however is 255b * 3 (for UTF) * 1 mil = 730 MB, indeed that's a lot. For non-unique indexes, I've seen somewhere that you can create indexes that are shorter than the column, e.g. create a 10-byte index on top of a VARCHAR(255), which will only take first ten bytes into account. This could be good enough for sorting, or then maybe not - if all entries in the table have the same In the end it's back to the main point - is QueuedJobDescriptor table a historical record, or a working table that's supposed to be relatively short? |
My understanding is that it's a working table, not a historical table. That said I'd like a 2nd opinion - @chillu Do you think QueuedJobDescriptor would ever be used for auditing purposes? |
SiteTreeContentReview extension does a global order by query filtered by a where on a non-indexed column (see example query (1) a the end). This query can take minutes to run if the descriptor table is large (in our case, 184k rows), and is triggered by dev/build on all classess with this extension applied, making dev/build take half an hour.
This behaviour can be potentially triggered from other, unexpected places, taking sites down. Our situation occurred, because the CleanupJob was not ran, as it has to be manually triggered. Manual trigger is not very good because the person deploying must be aware they need to go to the CMS and trigger it, which often doesn't happen.
We'd like to reduce the likelihood for this common failure scenario from occuring and requiring support to step in :)
Suggested solutions:
(1) example query we have seen in the wild, on top of a non-indexed Implementation column. I can confirm the issue is caused by order by. Running similar query on an indexed column is fast. Running similar query without order by is fast as well.
Internal problem record references PRB-160, PRB-161.
The text was updated successfully, but these errors were encountered: