Follow-up of #1502: Embed field and value in hashTypeEntry #1551

zuiderkwast · 2025-01-12T21:18:57Z

When field + value fit within a cache line, embed both of them in the same allocation. We use fewer allocations and save the memory of a pointer we save a memory access when reading the value.

Follow-up of #1502.

ranshid · 2025-01-13T19:06:30Z

Overall I think it would be great to embed the value in the hashEntryType. We had some internal discussion which I would like to bring the main highlights here:

in order to distinguish if an entries has embedded value, we have to tag it somehow. in the current implementation (since the entry starts with the value pointer (sds) we need to locate a place to place the tag. an option to do so would be to use the pointer 3 LSB bits to indicate if is has embedded value or not.
There is also an alternative option to 1. In the current implementation we keep a pointer to the start of the hashTypeEntry entry allocation which starts with a pointer to the value (sds) followed by the field sds header size followed by the sds embedded data. Note that the field sds is always copied from the parsed field argument of the command (or module). AFAIK it is always a stringObject which means it would always be sds with headerType sds8 or larger, which means we will always have 5 extra bits in the field sds. we can use the field sds bits in order to mark different metadata existence (eg value existence, embedded value, ttl etc...). Doing so we can make the hashTypeEntry a pointer to the field sds data.

the hashTypeEntry* is just a pointer to the start of the field sds string (like every ordinary sds)

lets distinguish between the 2 cases:

non embedded entry:

+----------------------------------------+
| value         |   field                |
| ptr (8 bytes) |   sds header | string  |
+----------------------------------------+

in order to access the data we just need to understand if the value is embedded (check the flags at s[-1])
and then go to sdsAllocPtr(entry) - sizeof(sds).
in order to free/defrag, we need to get to the entry buffer allocation address which is also located at sdsAllocPtr(entry) - sizeof(sds)

embedded entry:

I would suggest that we will decide to embed the value only in case the complete entry size (field sds + value sds) is less than cache line size (eg 64 bytes). we can take this decision during hashTypeCreateEntry, but we might have to revise it every time we change the value (which is a point to consider). I think that given the fact that we set the threashold at 64 bytes we can always ensure the value length is not more than 256 bytes (it would be smaller than 64 bytes) so we can always use sds8 thus simplify the code and reduce the need to maintain a header_size bytes between the field and value.

+------------------------------------------------+
|  field                |  value                 |
|  sds header | string  |  sds8 header | string  |
+------------------------------------------------+

in order to access the data we just need to understand if the value is embedded (check the flags at s[-1])
and then go to entry + sdslen(entry) + sizeof(sds8).
in order to free/defrag, we need to get to the entry buffer allocation address which is also located at sdsAllocPtr(entry)

overall this way we would even reduce the current implementation memory allocation by 6 bytes when embedding the value.

small update: in case of conversion from listpack --> hash we currently use hashTypeCurrentObjectNewSds which will not guarantee the field sds will have sds8 header. This will probably have to be handled in some way (eg force sds8 on the field or some other solution)

small update 2: the same is also for modules. apparently modules are using createRawStringObject for some reason, which can also produce sds5 type strings.

ranshid assigned ranshid and unassigned ranshid Jan 13, 2025

ranshid mentioned this issue Jan 15, 2025

Extra memory consumed by embedded sds in new HashObject fields #1567

Open

zuiderkwast self-assigned this Jan 21, 2025

zuiderkwast linked a pull request Jan 21, 2025 that will close this issue

Embed hash value in hash type entry #1579

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up of #1502: Embed field and value in hashTypeEntry #1551

Follow-up of #1502: Embed field and value in hashTypeEntry #1551

zuiderkwast commented Jan 12, 2025

ranshid commented Jan 13, 2025 •

edited by zuiderkwast

Loading

Follow-up of #1502: Embed field and value in hashTypeEntry #1551

Follow-up of #1502: Embed field and value in hashTypeEntry #1551

Comments

zuiderkwast commented Jan 12, 2025

ranshid commented Jan 13, 2025 • edited by zuiderkwast Loading

ranshid commented Jan 13, 2025 •

edited by zuiderkwast

Loading