More optimized data structures in indexes & more granular storage parts #760

novoj · 2024-12-06T08:32:54Z

Currently the database cannot handle high cardinality entities in some of the indexes. The problem is that write operations become slower and slower as the cardinality increases. The database has been optimised for reading, so everything has been taken into account. We often use simple arrays in our indexes, which need to be reallocated on each change - and this becomes a bottleneck when the array size goes to hundreds of thousands of elements or more.

A very simple spike test confirmed this obvious fact. Inserting 1 million elements into a simple array, when each iteration requires allocating array.length + 1 of the new array size, gets 50% slower every 100k elements - so inserting 1m elements takes almost 3 minutes on my machine, while inserting the same amount of elements into CompositeIntArray takes a few hundred milliseconds.

It's time to switch to B+ trees in our indexes in places that use simple arrays (like InvertedIndex) to speed up writes. Other changes may be found necessary along the way, but this issue is the spark that should ignite this movement.

Another change that should take place is the revision of the granularity of the index storage parts. Currently, when a single indexed attribute changes in a transaction, the entire attribute index must be replaced in storage at the end of the transaction, which quickly contaminates the storage layer and triggers a vacuuming process that rewrites the contents of the entire file. If only part of the index needed to be stored, this would greatly reduce the load on the system and prolong the time available in time machine.

novoj added the enhancement New feature or request label Dec 6, 2024

novoj added this to the Beta milestone Dec 6, 2024

novoj self-assigned this Dec 6, 2024

novoj mentioned this issue Dec 6, 2024

Better constraints on time machine space allocation #761

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More optimized data structures in indexes & more granular storage parts #760

More optimized data structures in indexes & more granular storage parts #760

novoj commented Dec 6, 2024 •

edited

Loading

More optimized data structures in indexes & more granular storage parts #760

More optimized data structures in indexes & more granular storage parts #760

Comments

novoj commented Dec 6, 2024 • edited Loading

novoj commented Dec 6, 2024 •

edited

Loading