Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-up of #1502: Embed field and value in hashTypeEntry #1551

Open
zuiderkwast opened this issue Jan 12, 2025 · 1 comment · May be fixed by #1579
Open

Follow-up of #1502: Embed field and value in hashTypeEntry #1551

zuiderkwast opened this issue Jan 12, 2025 · 1 comment · May be fixed by #1579
Assignees

Comments

@zuiderkwast
Copy link
Contributor

When field + value fit within a cache line, embed both of them in the same allocation. We use fewer allocations and save the memory of a pointer we save a memory access when reading the value.

Follow-up of #1502.

@ranshid
Copy link
Member

ranshid commented Jan 13, 2025

Overall I think it would be great to embed the value in the hashEntryType. We had some internal discussion which I would like to bring the main highlights here:

  1. in order to distinguish if an entries has embedded value, we have to tag it somehow. in the current implementation (since the entry starts with the value pointer (sds) we need to locate a place to place the tag. an option to do so would be to use the pointer 3 LSB bits to indicate if is has embedded value or not.

  2. There is also an alternative option to 1. In the current implementation we keep a pointer to the start of the hashTypeEntry entry allocation which starts with a pointer to the value (sds) followed by the field sds header size followed by the sds embedded data. Note that the field sds is always copied from the parsed field argument of the command (or module). AFAIK it is always a stringObject which means it would always be sds with headerType sds8 or larger, which means we will always have 5 extra bits in the field sds. we can use the field sds bits in order to mark different metadata existence (eg value existence, embedded value, ttl etc...). Doing so we can make the hashTypeEntry a pointer to the field sds data.

the hashTypeEntry* is just a pointer to the start of the field sds string (like every ordinary sds)

lets distinguish between the 2 cases:

non embedded entry:

+----------------------------------------+
| value         |   field                |
| ptr (8 bytes) |   sds header | string  |
+----------------------------------------+  
  • in order to access the data we just need to understand if the value is embedded (check the flags at s[-1])
    and then go to sdsAllocPtr(entry) - sizeof(sds).
  • in order to free/defrag, we need to get to the entry buffer allocation address which is also located at sdsAllocPtr(entry) - sizeof(sds)

embedded entry:

I would suggest that we will decide to embed the value only in case the complete entry size (field sds + value sds) is less than cache line size (eg 64 bytes). we can take this decision during hashTypeCreateEntry, but we might have to revise it every time we change the value (which is a point to consider). I think that given the fact that we set the threashold at 64 bytes we can always ensure the value length is not more than 256 bytes (it would be smaller than 64 bytes) so we can always use sds8 thus simplify the code and reduce the need to maintain a header_size bytes between the field and value.

+------------------------------------------------+
|  field                |  value                 |
|  sds header | string  |  sds8 header | string  |
+------------------------------------------------+  
  • in order to access the data we just need to understand if the value is embedded (check the flags at s[-1])
    and then go to entry + sdslen(entry) + sizeof(sds8).
  • in order to free/defrag, we need to get to the entry buffer allocation address which is also located at sdsAllocPtr(entry)

overall this way we would even reduce the current implementation memory allocation by 6 bytes when embedding the value.

small update: in case of conversion from listpack --> hash we currently use hashTypeCurrentObjectNewSds which will not guarantee the field sds will have sds8 header. This will probably have to be handled in some way (eg force sds8 on the field or some other solution)

small update 2: the same is also for modules. apparently modules are using createRawStringObject for some reason, which can also produce sds5 type strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants