-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DCC Schema Refresh Fix #1
Open
chrisw-instaclustr
wants to merge
15
commits into
1.5-without-insta-changes
Choose a base branch
from
1.5
base: 1.5-without-insta-changes
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
462db47
schema refreshing
smiklosovic f287be2
removed sleep in reinitializer
smiklosovic 8b4bb41
[release] Stable parent 1.5.0.Final for release
debezium-builder 8007015
[maven-release-plugin] prepare for next development iteration
debezium-builder e39c738
[maven-release-plugin] prepare release v1.5.0.Final
debezium-builder 47e4316
[release] New parent 1.6.0-SNAPSHOT for development
debezium-builder 72f2516
Merge branch 'dd' of github.com:instaclustr/debezium-connector-cassan…
844f4c7
rewritten cdc tracking without schema refreshment
smiklosovic 3ae29c1
fixed bug
smiklosovic 14d9f2b
rewritten to build parallel in-memory schema structure from schema ch…
smiklosovic 5083434
rewritten to build parallel in-memory schema structure from schema ch…
smiklosovic 0999ce9
Fixed up the C* version and made an initial attempt to deal with the …
chrisw-instaclustr ecc11ce
Fixed up the C* version and made an initial attempt to deal with the …
chrisw-instaclustr e876f6c
applied comments in review
smiklosovic 47229ac
Merge branch 'dd2' of github.com:instaclustr/debezium-connector-cassa…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
== Debezium integration with Cassandra report - part 2 | ||
|
||
_by Stefan Miklosovic / stefan dot miklosovic at instaclustr dot com_ | ||
|
||
=== Introduction | ||
|
||
In this document, we will retrospect what we have actually done in the part 1 and | ||
we will build on that to improve the solution. | ||
|
||
=== Nature of the problem | ||
|
||
The problem with solution 1 is that it is not deterministic. The reason for | ||
non-determinism seems to be the fact that it is more or less unpredictable when | ||
Cassandra flushes / persists changes, e.g in `cdc = true / false` to disk so our | ||
refresh of schema will pick these changes up. | ||
|
||
Hence we might see the following: | ||
|
||
1) A table is created with cdc = true | ||
2) We detect the change in schema listener of driver and we refresh the schema in Debezium via mechanics of Cassandra | ||
3) We think that our internal structure is indeed refreshed but our `cdc` is still false. | ||
|
||
This means that even Cassandra process is internally aware of the fact that we have | ||
enabled cdc on a particular table, Debezium does not reflect this because | ||
sometimes changes are flushed / persisted just fine but sometimes it takes time | ||
to propagate these changes and it might be too late for Debezium as listener was already invoked. | ||
|
||
=== Possible solutions | ||
|
||
The are two solutions in general to this problem: | ||
|
||
1) Faking what Cassandra does in Debezium to have same data structures too. | ||
|
||
This is rather delicate operation / topic to deal with but it is possible and we chose to go with | ||
this solution for a time being. | ||
|
||
It merges two main concepts: | ||
|
||
a) Debezium is informed about schema changes via provided schema change listener registered on driver | ||
b) once a respective method on a listner is invoked, we mock same code what Cassandra would invoke but | ||
in such a way that the parts which would be errorneous (because Debezium just does not run Cassandra) are | ||
skipped. | ||
|
||
By doing b), we are internally holding a logical copy of what the real Cassandra is holding and we are | ||
synchronizing Cassandra internal structures (keyspaces, tables ...) by registering | ||
schema change listener and applying same changes to "Cassandra" in Debezium process. | ||
|
||
Lets go through the core of this logic, starting with "onKeyspaceAdded": | ||
|
||
[source,java] | ||
---- | ||
schemaChangeListener = new NoOpSchemaChangeListener() { | ||
@Override | ||
public void onKeyspaceAdded(final KeyspaceMetadata keyspace) { | ||
Schema.instance.setKeyspaceMetadata(org.apache.cassandra.schema.KeyspaceMetadata.create( | ||
keyspace.getName(), | ||
KeyspaceParams.create(keyspace.isDurableWrites(), | ||
keyspace.getReplication()))); | ||
Keyspace.openWithoutSSTables(keyspace.getName()); | ||
logger.info("added keyspace {}", keyspace.asCQLQuery()); | ||
} | ||
---- | ||
|
||
Here we fake that we opened a keyspace. This will populate some internal structures of Cassandra and so on so | ||
our hot Cassandra code in Debezium "knows" what keyspace was added and so on. | ||
|
||
On a keyspace's update, we do: | ||
|
||
[source,java] | ||
---- | ||
@Override | ||
public void onKeyspaceChanged(KeyspaceMetadata current, | ||
KeyspaceMetadata previous) { | ||
Schema.instance.updateKeyspace(current.getName(), | ||
KeyspaceParams.create(current.isDurableWrites(), | ||
current.getReplication())); | ||
} | ||
---- | ||
|
||
When a keyspace is removed, we do: | ||
|
||
[source,java] | ||
---- | ||
@Override | ||
public void onKeyspaceRemoved(final KeyspaceMetadata keyspace) { | ||
schemaHolder.removeKeyspace(keyspace.getName()); | ||
// here KeyspaceMetadata are of Cassandra, not driver's as in method argument | ||
Schema.instance.clearKeyspaceMetadata(KeyspaceMetadata.create( | ||
keyspace.getName(), | ||
KeyspaceParams.create(keyspace.isDurableWrites(), | ||
keyspace.getReplication()))); | ||
} | ||
---- | ||
|
||
We are removing a keyspace from our schema holder too. Think about it, if we removed whole keyspace | ||
by "DROP KEYSPACE abc", all tables are removed too so we just get rid of all tables of that keyspace | ||
in our schema holder as well. | ||
|
||
We left last three methods of a listener - onTableAdded, onTableChanged and onTableRemoved | ||
for a reader to go through. The code you see is more or less what Cassandra does internally but | ||
it is refactored in such a way that parts with are not needed (nor desired to be done) are just skipped. | ||
|
||
Please follow this https://github.com/instaclustr/debezium-connector-cassandra/blob/dd2/src/main/java/io/debezium/connector/cassandra/SchemaProcessor.java#L81-L168[link]. | ||
|
||
Once we put into into the action, metadata will be populated right with `cdc` flag on TableParams and so on. | ||
`Mutation` will be as well serialised properly because it will reach into ColumnFamily's metadata which | ||
has `cdc = true` because we were notified about this change in a listener and we updated | ||
that table in Cassandra code so the following deserialisation of a Mutation where this code | ||
is called will not throw: | ||
|
||
[source,java] | ||
---- | ||
public static class Serializer | ||
{ | ||
public void serialize(CFMetaData metadata, DataOutputPlus out, int version) throws IOException | ||
{ | ||
UUIDSerializer.serializer.serialize(metadata.cfId, out, version); | ||
} | ||
|
||
public CFMetaData deserialize(DataInputPlus in, int version) throws IOException | ||
{ | ||
UUID cfId = UUIDSerializer.serializer.deserialize(in, version); | ||
CFMetaData metadata = Schema.instance.getCFMetaData(cfId); | ||
if (metadata == null) | ||
{ | ||
String message = String.format("Couldn't find table for cfId %s. If a table was just " + | ||
"created, this is likely due to the schema not being fully propagated. Please wait for schema " + | ||
"agreement on table creation.", cfId); | ||
throw new UnknownColumnFamilyException(message, cfId); | ||
} | ||
|
||
return metadata; | ||
} | ||
---- | ||
|
||
Keep in mind that we are not "initialising" Cassandra by any way, when Debezium starts, | ||
internals of Cassandra will already read tables on the disk and so on so it will be fully running | ||
but we will never be notified about what happens afterwards (that cdc was changed from false to true, for example). For | ||
that reason there is a schema change listener which synchronizes it. We might be notified about that via listener, that is true, | ||
but schema refreshment does not always help and we would end up being notified about changes but we would not have any | ||
way to make these changes visible to Cassandra internal's code - only invoking core Cassandra structures and emulating | ||
we are running it in a proper Cassandra node will make deserialisation of a mutation possible because previously our | ||
cdc flag was always false (was not updating) so the handling of such mutation was effectively skipped. | ||
|
||
The second solution consists of making an agent from Debezium - this means that it will see same data structures as Cassandra, | ||
by definition. The problem with this solution we see is that it is rather tricky to do because | ||
Debezium would suddenly start to have same lifecycle as Cassandra (or the other way around - Cassandra | ||
would have same lifecycle as Debezium) - as they are inherently connected together. | ||
|
||
Another problem we see is that the dependencies which Debezium uses are not compatible with | ||
what Cassandra uses and it would be just not possible to "merge it". By merely checking, | ||
the probability this would be the case is quite high, there is Cassandra connector of |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,20 +4,20 @@ | |
<parent> | ||
<groupId>io.debezium</groupId> | ||
<artifactId>debezium-parent</artifactId> | ||
<version>1.5.0-SNAPSHOT</version> | ||
<version>1.5.0.Final</version> | ||
</parent> | ||
|
||
<modelVersion>4.0.0</modelVersion> | ||
<artifactId>debezium-connector-cassandra</artifactId> | ||
<name>Debezium Connector for Cassandra</name> | ||
<version>1.5.0-SNAPSHOT</version> | ||
<version>1.5.0.Final</version> | ||
<packaging>jar</packaging> | ||
|
||
<scm> | ||
<connection>scm:git:[email protected]:debezium/debezium-connector-cassandra.git</connection> | ||
<developerConnection>scm:git:[email protected]:debezium/debezium-connector-cassandra.git</developerConnection> | ||
<url>https://github.com/debezium/debezium-connector-cassandra</url> | ||
<tag>HEAD</tag> | ||
<tag>v1.5.0.Final</tag> | ||
</scm> | ||
|
||
<properties> | ||
|
@@ -119,6 +119,7 @@ | |
<dependency> | ||
<groupId>org.apache.cassandra</groupId> | ||
<artifactId>cassandra-all</artifactId> | ||
<version>3.11.4</version> | ||
<exclusions> | ||
<exclusion> | ||
<groupId>ch.qos.logback</groupId> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why these
.Final
s are showing up as diff - they are the same on the target branch.