You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently running a query gets very slow if indexing operation is also in progress. This is (probably) because of how disk queues work - indexing is very disk heavy, and saturates the disk with reads of new files to index.
In practice, indexing new files is less important than responding to queries quickly. Ideally, running a query should always have a priority. I think we can solve this with linux's IO priority: https://www.kernel.org/doc/html/latest/block/ioprio.html.
Things to do:
Create a benchmark: Measure a query performance without indexing, and during indexing. Doesn't have to be very precise, but must show that performance during indexing is significantly degraded.
Investigate if it's possible to use ioprio_set/ioprio_get syscalls to work around this issue per worker.
Run the benchmark again, and make sure the query performance is better (and that indexing performance is not hugely impacted, though I don't expect it)
Hopefully this solves the issue, but if not, we can consider other measures (for example, pausing all indexing workers during processing a query)
The text was updated successfully, but these errors were encountered:
Yeah, on average the database got slower. But I realised that's because IO priority is per process, not per thread. And ursadb is a single (multi-threaded) process. So I can't actually do what I hoped to do.
But that's not all - I've tried to work around this by running a second ursadb process ("slow" process for compacting), and the results are
Also actually I also suspect that a big part of the slowdown comes from the fact that OS' disk cache is filled by useless (never again used) data. Maybe I should experiment with MADV_DONTNEED instead?
Anyway, looks like this approach may be more challenging than I suspected. Looks like I need to ponder on this topic a bit more 🤔
Currently running a query gets very slow if indexing operation is also in progress. This is (probably) because of how disk queues work - indexing is very disk heavy, and saturates the disk with reads of new files to index.
In practice, indexing new files is less important than responding to queries quickly. Ideally, running a query should always have a priority. I think we can solve this with linux's IO priority: https://www.kernel.org/doc/html/latest/block/ioprio.html.
Things to do:
The text was updated successfully, but these errors were encountered: