You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the database cannot handle high cardinality entities in some of the indexes. The problem is that write operations become slower and slower as the cardinality increases. The database has been optimised for reading, so everything has been taken into account. We often use simple arrays in our indexes, which need to be reallocated on each change - and this becomes a bottleneck when the array size goes to hundreds of thousands of elements or more.
A very simple spike test confirmed this obvious fact. Inserting 1 million elements into a simple array, when each iteration requires allocating array.length + 1 of the new array size, gets 50% slower every 100k elements - so inserting 1m elements takes almost 3 minutes on my machine, while inserting the same amount of elements into CompositeIntArray takes a few hundred milliseconds.
It's time to switch to B+ trees in our indexes in places that use simple arrays (like InvertedIndex) to speed up writes. Other changes may be found necessary along the way, but this issue is the spark that should ignite this movement.
Another change that should take place is the revision of the granularity of the index storage parts. Currently, when a single indexed attribute changes in a transaction, the entire attribute index must be replaced in storage at the end of the transaction, which quickly contaminates the storage layer and triggers a vacuuming process that rewrites the contents of the entire file. If only part of the index needed to be stored, this would greatly reduce the load on the system and prolong the time available in time machine.
The text was updated successfully, but these errors were encountered:
Currently the database cannot handle high cardinality entities in some of the indexes. The problem is that write operations become slower and slower as the cardinality increases. The database has been optimised for reading, so everything has been taken into account. We often use simple arrays in our indexes, which need to be reallocated on each change - and this becomes a bottleneck when the array size goes to hundreds of thousands of elements or more.
A very simple spike test confirmed this obvious fact. Inserting 1 million elements into a simple array, when each iteration requires allocating array.length + 1 of the new array size, gets 50% slower every 100k elements - so inserting 1m elements takes almost 3 minutes on my machine, while inserting the same amount of elements into CompositeIntArray takes a few hundred milliseconds.
It's time to switch to B+ trees in our indexes in places that use simple arrays (like InvertedIndex) to speed up writes. Other changes may be found necessary along the way, but this issue is the spark that should ignite this movement.
Another change that should take place is the revision of the granularity of the index storage parts. Currently, when a single indexed attribute changes in a transaction, the entire attribute index must be replaced in storage at the end of the transaction, which quickly contaminates the storage layer and triggers a vacuuming process that rewrites the contents of the entire file. If only part of the index needed to be stored, this would greatly reduce the load on the system and prolong the time available in time machine.
The text was updated successfully, but these errors were encountered: