Hash Data Structure #33

jabolina · 2023-04-13T18:00:01Z

Adding the initial design draft for the Hash data structure. I think there are some unknown unknowns for me here, but I tried to cover at least a bit of everything.

karesti · 2023-04-14T09:37:08Z

Cache-Hash-Data-Structure.adoc

+[source,java]
+.Hash.java
+----
+public interface Hash<K, V> { // <1>


How is this different from cache ?

Because it is stored as a single entry in a cache.

ok, now I get it

Looking at this, it looks awfully close our MultiMap, just instead of a Collection it is a Map. Could we not build on top of that?

We could build upon MultiMap without duplicates, storing key-value pairs. But I am unsure if we can compare for equality based only on the key. On the other hand, maybe we could abstract some of the MultiMap implementations which work over a Collection to work on another structure instead of creating everything from scratch. I would need to look at that, though.

isn't it the same as FineGrainedAtomicMap that we dropped and replaced with the Grouping API?

tristantarrant · 2023-04-14T09:52:57Z

Cache-Hash-Data-Structure.adoc

+----
+
+<1> The Hash data structure is a generic interface over `K` and `V`. Should we extend `Map<K, V>`?
+This way we could provide a wide range of operations over the Hash.


I would avoid extending Map as that has a million methods that have made it a pain in the past for our Cache.

yes, why not using map directly if we do that.

Some further information:

Map is a sync-only API, so this does not help with the async and mutiny variants

Some of the Map methods have effects, such as returning the previous value on put and remove which are costly in remote scenarios.

karesti · 2023-04-14T10:34:11Z

Cache-Hash-Data-Structure.adoc

+To create and use a `HashCache<?>`:
+
+```java
+      EmbeddedCacheManager cm = ...


karesti · 2023-04-14T10:34:33Z

Cache-Hash-Data-Structure.adoc

+
+```java
+      EmbeddedCacheManager cm = ...
+      HashCacheManager hashCacheManager = EmbeddedHashCacheManagerFactory.from(cm);


this is the legacy API, we need this in the new API only

Let's not implement anything on the legacy API please.

karesti · 2023-04-14T10:35:39Z

Cache-Hash-Data-Structure.adoc

+----
+
+<1> The Hash data structure is a generic interface over `K` and `V`. Should we extend `Map<K, V>`?
+This way we could provide a wide range of operations over the Hash.


yes, why not using map directly if we do that.

karesti · 2023-04-14T10:35:51Z

Cache-Hash-Data-Structure.adoc

+[source,java]
+.Hash.java
+----
+public interface Hash<K, V> { // <1>


ok, now I get it

rigazilla · 2023-04-14T16:37:07Z

Cache-Hash-Data-Structure.adoc

+
+A `Hash<?, ?>` structure is local to a node. Meaning that operations applied to the local copy must be submitted
+through the public API before taking effect. After that, the `Hash<?, ?>` is distributed/replicated according to
+the configuration.


How is the intended use case?
The user operates on an Hash object and then uses an HashCache obj to submit the changes?

I would say that individual operations are applied immediately to the underlying storage (remote cache). When using tx/batching, these would be collected and sent in a single operation.

The user operates on an Hash object and then uses an HashCache obj to submit the changes?

Yeah, I was thinking of something of that sort. I'll update the documentation to include single operations without transferring the whole object. For multiple operations, I was thinking of keeping track of the local changes, and once the user submits the changes to the server, we merge the local Hash with the server's Hash.

I'll update the docs.

jabolina · 2023-04-17T19:34:34Z

Thanks, everyone, for the comments. I updated the document with all the suggestions. While at it, I described more about how we handle the operations. A point of attention is how we should handle operations that directly reference a Hash, for example, HashCache.put("hash", "property", "value"), I added the two options I see for that in the document.

rigazilla · 2023-04-18T13:26:05Z

Cache-Hash-Data-Structure.adoc

+----
+
+<1> Returning a `Map<HK, HV>` to contain all the vargs keys.
+<2> The `submit` method is used to apply the local changes to the server's Hash. The server's `Hash<?, ?>` is returned.


Do we really need the return value? Hash could be very big. Returning on request or only if submit fails?

Yeah, but not strictly. I'll add some void methods, too. In one case, for example, the RESP connector needs to return what has changed. But I was aiming at something more general instead of focusing only on the RESP needs.

rigazilla · 2023-04-18T13:31:02Z

Cache-Hash-Data-Structure.adoc

+<1> The Hash data structure is a generic interface over `K` and `V`. We do not extend `Map<K, V>`, instead we add methods
+as needed.
+
+A `Hash<?, ?>` structure is local to a node. Meaning that operations applied to the local copy must be submitted


I feel a disturbance about Hash. I mean, if it's detached from cache isn't it just a Map do we need a new type? On the other side if individual operations are submitted immediately then some methods seems to overlap with HashCache

I agree, I did not like the naming on the RESP commands either. A Hash to me is not a Map structure but rather a numeric derived from some other fields.

To be completely honest, even after writing the word "Hash" so many times, it still sounds weird as a structure name. I thought the same, i.e., a hash is a value after applying a hashing function (or even the function)... Regarding naming, I would not mind changing it to Map(ping), Document, Structure, Container, or something else.

And as the structure Hash, which the user interacts with, would help to keep track of the operations. For example, if we return a Map and the user removes one of the keys, we wouldn't notice that when merging with the server's content. Or, to better reflect, rename the structure HashView, HashCopy, or something of that sort.

wburns · 2023-04-18T14:53:03Z

Cache-Hash-Data-Structure.adoc

+<1> The Hash data structure is a generic interface over `K` and `V`. We do not extend `Map<K, V>`, instead we add methods
+as needed.
+
+A `Hash<?, ?>` structure is local to a node. Meaning that operations applied to the local copy must be submitted


I agree, I did not like the naming on the RESP commands either. A Hash to me is not a Map structure but rather a numeric derived from some other fields.

wburns · 2023-04-18T14:54:57Z

Cache-Hash-Data-Structure.adoc

+[source,java]
+.Hash.java
+----
+public interface Hash<K, V> { // <1>


Looking at this, it looks awfully close our MultiMap, just instead of a Collection it is a Map. Could we not build on top of that?

wburns · 2023-04-18T14:56:42Z

Cache-Hash-Data-Structure.adoc

+----
+public interface HashCache<K> {
+
+   <HK, HV> HV put(K key, HK hashKey, HV hashValue);


We should probably name this method getAndPut and have just a put method that has no returned.

wburns · 2023-04-18T14:59:52Z

Cache-Hash-Data-Structure.adoc

+[source, java]
+.HashCache.java
+----
+public interface HashCache<K> {


~~This entire interface is blocking, we probably need NonBlocking versions first.~~

Reading further it looks like we would have all 3 of the variants 👍

wburns · 2023-04-18T15:00:04Z

Cache-Hash-Data-Structure.adoc

+
+   <HK, HV> HV put(K key, HK hashKey, HV hashValue);
+
+   <HK, HV> Hash<HK, HV> put(K key, Map<HK, HV> values);


Same here, we really need just a put and getAndPut.

wburns · 2023-04-18T15:07:47Z

Cache-Hash-Data-Structure.adoc

+* Enable/Disable rebalancing
+* Availability get/set
+* Replace not supported
+* Listeners support to enable interactions with near-caching


Listeners are going to be very very very expensive for this without some sort of delta notification, which we do not have for listeners. Also listener API was never solidified for the new API sadly :(

I'll mark it optional. When we establish the listener API, we can come back to this.

wburns · 2023-04-18T15:10:36Z

Cache-Hash-Data-Structure.adoc

+   HashCache<String> users = hashes.get("users");
+
+   users.put("42", Map.of("auth", "token", "role", "user"));
+   System.out.println(hashCache.get("42")); // {"auth": "token", "role": "user"}


Is hashCache here supposed to be users?

Also from this code, I assume that an updates to the returned HashCache instance are immediately updated remotely and applied to the Hashes actual contents? We will need to make sure the interface is documented that that is done that way.

Only the Hash interface is the one that isn't propagated correct?

Is hashCache here supposed to be users?

Yes, I'll fix it.

Also from this code, I assume that an updates to the returned HashCache instance are immediately updated remotely and applied to the Hashes actual contents? We will need to make sure the interface is documented that that is done that way.

Only the Hash interface is the one that isn't propagated correct?

Correct. Only the operations directly over the HashCache takes place immediately. If operating over a Hash it needs to be sent to the HashCache to take effect.

I've been thinking more about this and I'll do an update to the HashCache API.

wburns · 2023-04-18T15:19:18Z

Cache-Hash-Data-Structure.adoc

+   Hashes hashes = ispn.sync().hashes();
+   HashCache<String> users = hashes.get("users");
+
+   Hash<String, String> hash = users.get("42");


So the Hash returned will always be not null I assume?

This makes me wonder if we should have some sort of putHashIfAbsent type of operation, where you can do an initial request to populate the Hash without having to do an initial get round trip.

wburns · 2023-04-18T15:24:29Z

Cache-Hash-Data-Structure.adoc

+as needed.
+
+A `Hash<?, ?>` structure is local to a node. Meaning that operations applied to the local copy must be submitted
+through the public API before taking effect. After that, the `Hash<?, ?>` is distributed/replicated according to


I am debating about this. It seems we could also implement this in a slightly different fashion where the key is just a name of a cache and the underlying Hash is just a normal Cache which itself can be distributed etc. This would then give us delta changes between keys and we can use all of the existing features of Caches. I am not quite sold yet, especially as I have no idea what the query API would look like or listeners.

wburns · 2023-04-18T15:25:53Z

Cache-Hash-Data-Structure.adoc

+   hash.put("auth", "token");
+   hash.put("role", "user");
+
+   users.submit(hash);


So will this completely overwrite the hash if it is already exists or only apply the new values? It isn't clear or doesn't seem defined yet as to what this would do. I am guessing it completely overwrites those values, in which case you would lose any concurrent writes without a respective remove.

For this one, I was thinking to implement with compute and merge the values. We keep track of the operations with the Hash, but effectively all the local changes take precedence. I'll do an update in the docs and API.

jabolina · 2023-04-19T00:45:57Z

@rigazilla and @wburns, thanks for the comments! I am going to update the doc and the HashCache API introducing your suggestions. I'll try to document better the intents with the API, evidencing the Hashneeds to be sent to take effect. And rename that submit method to something meaningful.

pruivo · 2023-04-19T08:21:23Z

Cache-Hash-Data-Structure.adoc

+----
+public interface Hash<K, V> { // <1>
+
+   String getName();


what's the name? don't see any reference to it.

pruivo · 2023-04-19T08:23:27Z

Cache-Hash-Data-Structure.adoc

+through the public API before taking effect. After that, the `Hash<?, ?>` is distributed/replicated according to
+the configuration.
+
+We follow a state-based CRDT approach, merging the local Hash with the server's Hash. Since the `Hash<?, ?>` structures


how are we going to merge when network partitions happen? Are we going to support cross-site?

karesti reviewed Apr 14, 2023

View reviewed changes

tristantarrant reviewed Apr 14, 2023

View reviewed changes

karesti requested changes Apr 14, 2023

View reviewed changes

rigazilla reviewed Apr 14, 2023

View reviewed changes

Adding Hash data structure design

fc6679a

jabolina force-pushed the hash-data-structure branch from a3af275 to fc6679a Compare April 17, 2023 19:30

rigazilla reviewed Apr 18, 2023

View reviewed changes

wburns reviewed Apr 18, 2023

View reviewed changes

pruivo reviewed Apr 19, 2023

View reviewed changes


		<HK, HV> HV put(K key, HK hashKey, HV hashValue);

		<HK, HV> Hash<HK, HV> put(K key, Map<HK, HV> values);

Hash Data Structure #33

Are you sure you want to change the base?

Hash Data Structure #33

Conversation

jabolina commented Apr 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jabolina commented Apr 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jabolina Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jabolina commented Apr 19, 2023

pruivo Apr 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jabolina Apr 18, 2023 •

edited

Loading

pruivo Apr 19, 2023 •

edited

Loading