diff --git a/docs/modules/integrate/pages/cdc-connectors.adoc b/docs/modules/integrate/pages/cdc-connectors.adoc index 0f71ab95c..1a9c89bd8 100644 --- a/docs/modules/integrate/pages/cdc-connectors.adoc +++ b/docs/modules/integrate/pages/cdc-connectors.adoc @@ -1,6 +1,8 @@ = CDC Connector [.enterprise]*Enterprise* +NOTE: This page refers to Hazelcast's {enterprise-product-name} CDC connectors. For more information on {open-source-product-name} CDC connectors, see xref:integrate:legacy-cdc-connectors.adoc[]. + Change Data Capture (CDC) refers to the process of observing changes made to a database and extracting them in a form usable by other systems, for the purposes of replication, analysis and many more. @@ -15,7 +17,7 @@ which can handle CDC events from link:https://debezium.io/documentation/referenc However, we're also striving to make CDC sources first class citizens in Hazelcast, as we have done already for MySQL and PostgreSQL. -== Installing the Connector +== Install the CDC connector This connector is included in the full distribution of Hazelcast {enterprise-product-name}. @@ -134,7 +136,7 @@ Follow the provided xref:pipelines:cdc.adoc[] tutorial to see how CDC processes [NOTE] ==== -Remember you have to have database up and running before CDC job is started, including e.g. additional CDC agents required (like DB2 does require). +Remember to ensure your database is up and running before a CDC job is started, including any additional required CDC agents (as required by DB2), for example. ==== === Common source builder functions @@ -158,7 +160,7 @@ where the map entry's key is the key of `SourceRecord` in JSON format, and the v but also more widely used and tested. Use this engine for the most stable results (for example, no async offset restore). For MySQL and PostgreSQL especially this engine makes the most sense, as MySQL and PostgreSQL Debezium connectors are single-threaded only. |withAsyncEngine() -|Sets the preferred engine to the async one. This engine is multithreaded (if supported by the connector), but you must be aware of the async nature; for example, offset restore may occur asynchronously after the restart is done, leading to sometimes confusing results. +|Sets the preferred engine to the async one. This engine is multithreaded (if supported by the connector), but be aware of the async nature; for example, offset restore may occur asynchronously after the restart is done, leading to sometimes confusing results. |setProperty(String, String) |Sets connector property to given value. There are multiple overloads, allowing to @@ -166,7 +168,7 @@ set the value to `long`, `String` or `boolean`. |=== -=== Fault Tolerance +=== Fault tolerance CDC sources offer at least-once processing guarantees. The source periodically saves the database write ahead log offset for which it had @@ -211,12 +213,11 @@ If user code has to be used, then the problem can be solved with the help of the == Data types -Hazelcast relies on Debezium, which in turn uses Kafka Connect API such as `Struct` objects. Hazelcast makes conversion to `Map` and `POJO` s easier by providing abstractions such as `RecordPart`. Despite that, it's worth knowing how some database types can or will be mapped to Java types. +Hazelcast relies on Debezium, which in turn uses the Kafka Connect API, including `Struct` objects for example. Hazelcast makes conversion to `Map` and `POJO`s easier by providing abstractions such as `RecordPart`. Despite this, it's worth knowing how some database types can or will be mapped to Java types. [NOTE] ==== -Each database type has it's own database type-to-struct type mappings. For specific mappings of this type, please -check out Debezium documentation, for example: link:https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-data-types[MySQL], link:https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-data-types[PostgreSQL], link:https://debezium.io/documentation/reference/stable/connectors/db2.html#db2-data-types[DB2], etc.. +Each database type has its own database type-to-struct type mappings. For specific mappings of this type, see the Debezium documentation, for example: link:https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-data-types[MySQL], link:https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-data-types[PostgreSQL], link:https://debezium.io/documentation/reference/stable/connectors/db2.html#db2-data-types[DB2], etc.. ==== === Common datatypes mapping. @@ -253,11 +254,11 @@ Using `time.precision.mode=connect` uses `java.util.Date` to represent dates, ti |=== -== Migration Tips +== Migration tips Hazelcast {open-source-product-name} has a Debezium CDC connector, but it's based on an older version of Debezium. Migration to the new connector is straightforward but be aware of the following changes: * You should use the `com.hazelcast.enterprise.jet.cdc` package instead of `com.hazelcast.jet.cdc`. * Artifact names are now `hazelcast-enterprise-cdc-debezium`, `hazelcast-enterprise-cdc-mysql` and `hazelcast-enterprise-cdc-postgres` (instead of `hazelcast-jet-...`). - * Debezium replaced all `whitelist`s with `include list`s and `blacklist`s with `exclude list`s, which we have replicated in our naming; so, for example, use `setTableIncludeList` instead of `setTableWhitelist`. If you are not sure what are the new names the Debezium is using, you can check out their link:https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-connector-properties[MySQL] and link:https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-connector-properties[PostgreSQL] documentation. \ No newline at end of file + * Debezium renamed certain terms, which we have also replicated in our code. For example, `include list` replaces `whitelist`, `exclude list` replaces `blacklist`. This means, for example, you need to use `setTableIncludeList` instead of `setTableWhitelist`. For more detail on new Debezium names, see their link:https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-connector-properties[MySQL] and link:https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-connector-properties[PostgreSQL] documentation. \ No newline at end of file diff --git a/docs/modules/integrate/pages/legacy-cdc-connectors.adoc b/docs/modules/integrate/pages/legacy-cdc-connectors.adoc index 6a27f4d6f..d97b75533 100644 --- a/docs/modules/integrate/pages/legacy-cdc-connectors.adoc +++ b/docs/modules/integrate/pages/legacy-cdc-connectors.adoc @@ -1,5 +1,8 @@ = Legacy CDC Connector + +NOTE: This page refers to Hazelcast's {open-source-product-name} CDC connectors, also known as legacy CDC connectors. For more information on {enterprise-product-name} CDC connectors, see xref:integrate:cdc-connectors.adoc[]. + Change Data Capture (CDC) refers to the process of observing changes made to a database and extracting them in a form usable by other systems, for the purposes of replication, analysis and many more. @@ -8,17 +11,17 @@ Change Data Capture is especially important to Hazelcast, because it allows for the _streaming of changes from databases_, which can be efficiently processed by the Jet engine. -Implementation of CDC in Hazelcast {open-source-product-name} is based on +The implementation of CDC in Hazelcast {open-source-product-name} is based on link:https://debezium.io/[Debezium, window=_blank]. Hazelcast offers a generic Debezium source -which can handle CDC events from link:https://debezium.io/documentation/reference/stable/connectors/index.html[any database supported by Debezium, window=_blank]. +that can handle CDC events from link:https://debezium.io/documentation/reference/stable/connectors/index.html[any database supported by Debezium, window=_blank]. However, we're also striving to make CDC sources first class citizens in Hazelcast, as we have done already for MySQL and PostgreSQL. -== Installing the Connector +== Install the CDC connector This connector is included in the full distribution of Hazelcast {open-source-product-name}. -== CDC as a Source +== CDC as a source We have the following types of CDC sources: @@ -26,10 +29,10 @@ We have the following types of CDC sources: a generic source for all databases supported by Debezium * link:https://docs.hazelcast.org/docs/{full-version}/javadoc/com/hazelcast/jet/cdc/mysql/MySqlCdcSources.html[MySqlCdcSources, window=_blank]: a specific, first class Jet CDC source for MySQL databases (also based - on Debezium, but with the additional benefits provided by Hazelcast + on Debezium, but with the additional benefits provided by Hazelcast) * link:https://docs.hazelcast.org/docs/{full-version}/javadoc/com/hazelcast/jet/cdc/postgres/PostgresCdcSources.html[PostgresCdcSources, window=_blank]: a specific, first class CDC source for PostgreSQL databases (also based - on Debezium, but with the additional benefits provided by Hazelcast + on Debezium, but with the additional benefits provided by Hazelcast) To set up a streaming source of CDC data, define it using the following configuration: @@ -50,9 +53,9 @@ pipeline.readFrom( .writeTo(Sinks.logger()); ---- -For an example of how to use CDC data see xref:pipelines:cdc.adoc[our tutorial]. +For an example of how to use CDC data, see the xref:pipelines:cdc.adoc[] tutorial. -=== Fault Tolerance +=== Fault tolerance CDC sources offer _at least once_ processing guarantees. The source periodically saves the database write ahead log offset for which it has diff --git a/docs/modules/pipelines/pages/cdc.adoc b/docs/modules/pipelines/pages/cdc.adoc index 2c165e2d3..3645ca029 100644 --- a/docs/modules/pipelines/pages/cdc.adoc +++ b/docs/modules/pipelines/pages/cdc.adoc @@ -153,7 +153,7 @@ mysql> SELECT * FROM customers; If you already have Hazelcast and you skipped the above steps, make sure to follow from here on. -. Make sure the MySQL CDC plugin is in the `lib/` directory. You must manually download the MySQL CDC plugin from Hazelcast's Maven link:https://repo1.maven.org/maven2/com/hazelcast/jet/hazelcast-enterprise-cdc-mysql/{full-version}/hazelcast-enterprise-cdc-mysql-{full-version}-jar-with-dependencies.jar[repository, window=_blank] and then copy it to the `lib/` directory. +. Make sure the MySQL CDC plugin is in the `lib/` directory. You must manually download the MySQL CDC plugin from link:https://repo1.maven.org/maven2/com/hazelcast/jet/hazelcast-enterprise-cdc-mysql/{full-version}/hazelcast-enterprise-cdc-mysql-{full-version}-jar-with-dependencies.jar[Hazelcast's Maven repository, window=_blank] and then copy it to the `lib/` directory. + [source,bash] ---- @@ -166,7 +166,7 @@ You should see the following jars: * hazelcast-enterprise-cdc-mysql-{full-version}-jar-with-dependencies.jar * hazelcast-enterprise-cdc-postgres-{full-version}-jar-with-dependencies.jar + -WARNING: If you have Hazelcast {enterprise-product-name} Edition, you need to manually download the MySQL CDC plugin from Hazelcast's Maven https://repo1.maven.org/maven2/com/hazelcast/jet/hazelcast-jet-cdc-mysql/{full-version}/hazelcast-jet-cdc-mysql-{full-version}-jar-with-dependencies.jar[repository] and then copy it to the `lib/` directory. +WARNING: If you have Hazelcast {enterprise-product-name}, you need to manually download the MySQL CDC plugin from https://repo1.maven.org/maven2/com/hazelcast/jet/hazelcast-jet-cdc-mysql/{full-version}/hazelcast-jet-cdc-mysql-{full-version}-jar-with-dependencies.jar[Hazelcast's Maven repository] and then copy it to the `lib/` directory. . Start Hazelcast. +