-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace deprecated statefulMapConcat #133
Conversation
Cassandra only supports distinct queries on the partition key. Since both Since this is part of the reconciler that has to be manually called and is already marked as a rather expensive operation, I'd think this change is not too risky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
core/src/main/scala/org/apache/pekko/persistence/cassandra/reconciler/AllTags.scala
Outdated
Show resolved
Hide resolved
@nvollmar I haven't tried it but if it was possible to sort the query result then the duplicate check would only need to know the last element as opposed to keeping a full set of visited tags. Do you think Cassandra is likely to allow this query to be sorted? |
@pjfanning Cassandra does not allow to sort by arbitrary columns. You can define a cluster ordering of a table, but that also has limitations. A more "Cassandra way" to solve this would be using a dedicated table to keep all unique tags for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only one style suggestion
core/src/main/scala/org/apache/pekko/persistence/cassandra/reconciler/AllTags.scala
Outdated
Show resolved
Hide resolved
…onciler/AllTags.scala Co-authored-by: AndyChen(Jingzhang) <[email protected]>
I'm not sure if this is better but statefulMapConcat is deprecated.
The code is trying the take a Source[String, _] and to remove duplicates.
Both approaches involve building sets and those sets will consume a lot of memory if there is a lot of data. I don't think this is avoidable.
I am not a Cassandra expert but I think we might be able to get Cassandra to run 'DISTINCT' on the query. That could mean that we could remove the statefulMapConcat/statefulMap stage.