-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash Data Structure #33
base: main
Are you sure you want to change the base?
Conversation
[source,java] | ||
.Hash.java | ||
---- | ||
public interface Hash<K, V> { // <1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from cache ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is stored as a single entry in a cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now I get it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, it looks awfully close our MultiMap, just instead of a Collection it is a Map. Could we not build on top of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could build upon MultiMap without duplicates, storing key-value pairs. But I am unsure if we can compare for equality based only on the key. On the other hand, maybe we could abstract some of the MultiMap implementations which work over a Collection to work on another structure instead of creating everything from scratch. I would need to look at that, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it the same as FineGrainedAtomicMap
that we dropped and replaced with the Grouping API
?
Cache-Hash-Data-Structure.adoc
Outdated
---- | ||
|
||
<1> The Hash data structure is a generic interface over `K` and `V`. Should we extend `Map<K, V>`? | ||
This way we could provide a wide range of operations over the Hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid extending Map
as that has a million methods that have made it a pain in the past for our Cache
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, why not using map directly if we do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further information:
Map
is a sync-only API, so this does not help with the async and mutiny variants- Some of the
Map
methods have effects, such as returning the previous value onput
andremove
which are costly in remote scenarios.
Cache-Hash-Data-Structure.adoc
Outdated
To create and use a `HashCache<?>`: | ||
|
||
```java | ||
EmbeddedCacheManager cm = ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new api ?
Cache-Hash-Data-Structure.adoc
Outdated
|
||
```java | ||
EmbeddedCacheManager cm = ... | ||
HashCacheManager hashCacheManager = EmbeddedHashCacheManagerFactory.from(cm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the legacy API, we need this in the new API only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not implement anything on the legacy API please.
Cache-Hash-Data-Structure.adoc
Outdated
---- | ||
|
||
<1> The Hash data structure is a generic interface over `K` and `V`. Should we extend `Map<K, V>`? | ||
This way we could provide a wide range of operations over the Hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, why not using map directly if we do that.
[source,java] | ||
.Hash.java | ||
---- | ||
public interface Hash<K, V> { // <1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now I get it
|
||
A `Hash<?, ?>` structure is local to a node. Meaning that operations applied to the local copy must be submitted | ||
through the public API before taking effect. After that, the `Hash<?, ?>` is distributed/replicated according to | ||
the configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the intended use case?
The user operates on an Hash object and then uses an HashCache obj to submit the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that individual operations are applied immediately to the underlying storage (remote cache). When using tx/batching, these would be collected and sent in a single operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user operates on an Hash object and then uses an HashCache obj to submit the changes?
Yeah, I was thinking of something of that sort. I'll update the documentation to include single operations without transferring the whole object. For multiple operations, I was thinking of keeping track of the local changes, and once the user submits the changes to the server, we merge the local Hash with the server's Hash.
I'll update the docs.
a3af275
to
fc6679a
Compare
Thanks, everyone, for the comments. I updated the document with all the suggestions. While at it, I described more about how we handle the operations. A point of attention is how we should handle operations that directly reference a Hash, for example, |
---- | ||
|
||
<1> Returning a `Map<HK, HV>` to contain all the vargs keys. | ||
<2> The `submit` method is used to apply the local changes to the server's Hash. The server's `Hash<?, ?>` is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the return value? Hash could be very big. Returning on request or only if submit fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but not strictly. I'll add some void methods, too. In one case, for example, the RESP connector needs to return what has changed. But I was aiming at something more general instead of focusing only on the RESP needs.
<1> The Hash data structure is a generic interface over `K` and `V`. We do not extend `Map<K, V>`, instead we add methods | ||
as needed. | ||
|
||
A `Hash<?, ?>` structure is local to a node. Meaning that operations applied to the local copy must be submitted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a disturbance about Hash. I mean, if it's detached from cache isn't it just a Map do we need a new type? On the other side if individual operations are submitted immediately then some methods seems to overlap with HashCache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I did not like the naming on the RESP commands either. A Hash to me is not a Map structure but rather a numeric derived from some other fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be completely honest, even after writing the word "Hash" so many times, it still sounds weird as a structure name. I thought the same, i.e., a hash is a value after applying a hashing function (or even the function)... Regarding naming, I would not mind changing it to Map(ping), Document, Structure, Container, or something else.
And as the structure Hash, which the user interacts with, would help to keep track of the operations. For example, if we return a Map and the user removes one of the keys, we wouldn't notice that when merging with the server's content. Or, to better reflect, rename the structure HashView, HashCopy, or something of that sort.
<1> The Hash data structure is a generic interface over `K` and `V`. We do not extend `Map<K, V>`, instead we add methods | ||
as needed. | ||
|
||
A `Hash<?, ?>` structure is local to a node. Meaning that operations applied to the local copy must be submitted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I did not like the naming on the RESP commands either. A Hash to me is not a Map structure but rather a numeric derived from some other fields.
[source,java] | ||
.Hash.java | ||
---- | ||
public interface Hash<K, V> { // <1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, it looks awfully close our MultiMap, just instead of a Collection it is a Map. Could we not build on top of that?
---- | ||
public interface HashCache<K> { | ||
|
||
<HK, HV> HV put(K key, HK hashKey, HV hashValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably name this method getAndPut
and have just a put
method that has no returned.
[source, java] | ||
.HashCache.java | ||
---- | ||
public interface HashCache<K> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire interface is blocking, we probably need NonBlocking versions first.
Reading further it looks like we would have all 3 of the variants 👍
|
||
<HK, HV> HV put(K key, HK hashKey, HV hashValue); | ||
|
||
<HK, HV> Hash<HK, HV> put(K key, Map<HK, HV> values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we really need just a put
and getAndPut
.
* Enable/Disable rebalancing | ||
* Availability get/set | ||
* Replace not supported | ||
* Listeners support to enable interactions with near-caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Listeners are going to be very very very expensive for this without some sort of delta notification, which we do not have for listeners. Also listener API was never solidified for the new API sadly :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll mark it optional. When we establish the listener API, we can come back to this.
HashCache<String> users = hashes.get("users"); | ||
|
||
users.put("42", Map.of("auth", "token", "role", "user")); | ||
System.out.println(hashCache.get("42")); // {"auth": "token", "role": "user"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is hashCache
here supposed to be users
?
Also from this code, I assume that an updates to the returned HashCache
instance are immediately updated remotely and applied to the Hashes
actual contents? We will need to make sure the interface is documented that that is done that way.
Only the Hash
interface is the one that isn't propagated correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
hashCache
here supposed to beusers
?
Yes, I'll fix it.
Also from this code, I assume that an updates to the returned
HashCache
instance are immediately updated remotely and applied to theHashes
actual contents? We will need to make sure the interface is documented that that is done that way.Only the
Hash
interface is the one that isn't propagated correct?
Correct. Only the operations directly over the HashCache
takes place immediately. If operating over a Hash
it needs to be sent to the HashCache
to take effect.
I've been thinking more about this and I'll do an update to the HashCache
API.
Hashes hashes = ispn.sync().hashes(); | ||
HashCache<String> users = hashes.get("users"); | ||
|
||
Hash<String, String> hash = users.get("42"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the Hash
returned will always be not null I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me wonder if we should have some sort of putHashIfAbsent type of operation, where you can do an initial request to populate the Hash without having to do an initial get round trip.
as needed. | ||
|
||
A `Hash<?, ?>` structure is local to a node. Meaning that operations applied to the local copy must be submitted | ||
through the public API before taking effect. After that, the `Hash<?, ?>` is distributed/replicated according to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am debating about this. It seems we could also implement this in a slightly different fashion where the key is just a name of a cache and the underlying Hash is just a normal Cache which itself can be distributed etc. This would then give us delta changes between keys and we can use all of the existing features of Caches. I am not quite sold yet, especially as I have no idea what the query API would look like or listeners.
hash.put("auth", "token"); | ||
hash.put("role", "user"); | ||
|
||
users.submit(hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So will this completely overwrite the hash if it is already exists or only apply the new values? It isn't clear or doesn't seem defined yet as to what this would do. I am guessing it completely overwrites those values, in which case you would lose any concurrent writes without a respective remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, I was thinking to implement with compute and merge the values. We keep track of the operations with the Hash
, but effectively all the local changes take precedence. I'll do an update in the docs and API.
@rigazilla and @wburns, thanks for the comments! I am going to update the doc and the |
---- | ||
public interface Hash<K, V> { // <1> | ||
|
||
String getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the name? don't see any reference to it.
through the public API before taking effect. After that, the `Hash<?, ?>` is distributed/replicated according to | ||
the configuration. | ||
|
||
We follow a state-based CRDT approach, merging the local Hash with the server's Hash. Since the `Hash<?, ?>` structures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we going to merge when network partitions happen? Are we going to support cross-site?
Adding the initial design draft for the Hash data structure. I think there are some unknown unknowns for me here, but I tried to cover at least a bit of everything.