diff --git a/advocacy_docs/edb-postgres-ai/ai-ml/using-tech-preview/working-with-ai-data-in-postgres.mdx b/advocacy_docs/edb-postgres-ai/ai-ml/using-tech-preview/working-with-ai-data-in-postgres.mdx index 288c2eb0d7d..592ceee6a4f 100644 --- a/advocacy_docs/edb-postgres-ai/ai-ml/using-tech-preview/working-with-ai-data-in-postgres.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-ml/using-tech-preview/working-with-ai-data-in-postgres.mdx @@ -118,7 +118,7 @@ __OUTPUT__ (5 rows) ``` -### Working without auto embedding +## Working without auto embedding You can now create a retriever without auto embedding. This means that the application has control over when the embeddings computation occurs. It also means that the computation is a bulk operation. For demonstration you can simply create a second retriever for the same products table that you just previously created the first retriever for, but setting `auto_embedding` to false. diff --git a/advocacy_docs/edb-postgres-ai/overview/latest-release-news.mdx b/advocacy_docs/edb-postgres-ai/overview/latest-release-news/2024q2release.mdx similarity index 98% rename from advocacy_docs/edb-postgres-ai/overview/latest-release-news.mdx rename to advocacy_docs/edb-postgres-ai/overview/latest-release-news/2024q2release.mdx index 430a765f389..7f596fc727d 100644 --- a/advocacy_docs/edb-postgres-ai/overview/latest-release-news.mdx +++ b/advocacy_docs/edb-postgres-ai/overview/latest-release-news/2024q2release.mdx @@ -1,8 +1,8 @@ --- -title: EDB Postgres AI Overview - Latest release news -navTitle: Latest release news +title: "EDB Postgres AI Q2 2024 release highlights" +navTitle: Q2 2024 release highlights description: The latest features released and updated in EDB Postgres AI. -deepToC: true +date: 2024-05-23 --- **May 23, 2024** diff --git a/advocacy_docs/edb-postgres-ai/overview/latest-release-news/2024q3release.mdx b/advocacy_docs/edb-postgres-ai/overview/latest-release-news/2024q3release.mdx new file mode 100644 index 00000000000..f8ac86824d5 --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/overview/latest-release-news/2024q3release.mdx @@ -0,0 +1,88 @@ +--- +title: "EDB Postgres AI Q3 2024 release highlights" +navTitle: Q3 2024 release highlights +description: The latest features released and updated in EDB Postgres AI. +date: 2024-08-28 +--- + +** August 28, 2024 ** + +This [release roundup](https://www.enterprisedb.com/blog/release-radar-edb-postgresr-ai-q3-release-highlights) originally appeared on the EDB blog. + +There’s a lot of energy at EDB following the Q2 announcement of EDB Postgres AI, an intelligent platform for unified management of transactional, analytical, and AI workloads, last quarter. The features we’re unveiling today build on last quarter’s EDB Postgres AI announcement by further enhancing the value delivered across our platform to support next gen applications, while further strengthening our core transactional database capabilities. + +Let’s take a closer look at these features. + +## Trusted Postgres Architect release improves HA cluster deployment verification and PEM & EFM integration + +With Trusted Postgres Architect (TPA), we continuously seek to create useful default configurations for our users to minimize the manual work required to tweak the cluster after deployment. We’ve done that and more with our latest features. + +TPA Version 23.24 introduces clearer and more concise CLI output, helping operators verify that deployment was successful. Today’s release includes a new module that processes the raw Ansible output into a more concise form before streaming it to the operator, making it easier to spot important information. + +For clusters that include EDB [Postgres Enterprise Manager](https://www.enterprisedb.com/products/postgres-enterprise-manager) (PEM), operators can now specify additional options to pass to the “register-agent” command, as well as provide their own SSL certificates to be used by the PEM web server. This enhancement improves upon earlier processes requiring the use of custom hooks or manual configuration. For [Enterprise Failover Manager](https://www.enterprisedb.com/docs/efm/latest/) (EFM) clusters, TPA now supports the new configuration parameters introduced with [EFM 4.9](https://www.enterprisedb.com/docs/efm/latest/efm_rel_notes/01_efm_49_rel_notes/). In addition, the configuration files created by TPA now include all the formatting and comments provided in the default EFM configuration files, making it easier for any operator wanting to manually inspect these files. TPA now allows users to specify whether TPA uses host names or IP addresses in EFM configuration, where previously only IP addresses were supported. + +Find out more about [Trusted Postgres Architect](https://www.enterprisedb.com/docs/tpa/latest/). If you’re new to TPA, try out this [TPA tutorial to spin up your first cluster](https://www.enterprisedb.com/docs/tpa/latest/firstclusterdeployment/). + +## Postgres Enterprise Manager advances database and server monitoring + +Postgres Enterprise Manager provides tools to keep databases running smoothly, continuously monitoring database and server health with real-time graphical dashboards and automatic alerts. When issues are detected, PEM makes it easier to pinpoint and fix performance bottlenecks with integrated query profiling, performance, and log analysis tools. Highlights of Postgres Enterprise Manager Version 9.7.0 include the following: + +- **EDB Postgres Distributed information in the “Core Usage” report:** This release enhances the PEM core usage report by adding information about EDB Postgres Distributed (PGD), giving customers a more complete view of their EDB license usage. As a result, the core usage report shows how many cores are running each version of Postgres across all PEM-monitored servers, including PGD.  +- **Copy notification settings:** This release extends the copy feature used by PEM to synchronize settings across multiple servers to now include notification settings, making it more powerful and reducing the need for additional manual steps or coding using the REST API.  +- **Dynamic probe scheduling to avoid showing outdated information:** PEM Agents collect data from monitored servers periodically, so changes can take a short time to reflect in the data. To mitigate this, PEM will now force some key agent tasks (known as probes) to run immediately after a server is added or changed in PEM, reducing the lag between adding or modifying a Postgres instance and the information being reflected in PEM monitoring data.  + +To get the latest PEM and enjoy all these benefits, [update your PEM today](https://www.enterprisedb.com/docs/pem/latest/upgrading/upgrading_pem_installation/). + +## EDB Database Server Updates for PostgreSQL Community + +As part of EDB’s support for the open source community’s quarterly release schedule, we completed new software releases in the EDB repositories of PostgreSQL, EDB Postgres Extended (PGE) Server, and EDB Postgres Advanced Server (EPAS), including the following:  + +| Database Distributions | Versions Released | +| ---------------------- | ----------------- | +| PostgreSQL | 16.4, 15.8, 14.13, 13.16 and 12.20 | +| EDB Postgres Extended Server | 16.4.1, 15.8.1, 14.13.1, 13.16 and 12.20 | +| EDB Postgres Advanced Server | 16.4.1, 15.8.1, 14.13.1, 13.16.22 and 12.20.25 | + +The PostgreSQL minor releases were made available by the PostgreSQL Global Development Group on August 8th, addressing a security vulnerability. EDB repositories were simultaneously updated with new, minor releases of PostgreSQL, PGE, and EPAS, incorporating upstream fixes and additional feature enhancements. Complete [PGE release notes](https://www.enterprisedb.com/docs/pge/latest/release_notes/) and [EPAS release notes](https://www.enterprisedb.com/docs/epas/latest/epas_rel_notes/) are available. For details on the security fix and other improvements in PostgreSQL, please see [the Postgres release announcement](https://www.postgresql.org/about/news/postgresql-164-158-1413-1316-1220-and-17-beta-3-released-2910/). + + +## Seamlessly Convert Postgres Databases into a RESTful API with PostgREST Extension Support + +The popular open source tool [PostgREST](https://docs.postgrest.org/en/latest/) has been added to EDB’s supported [open source software list](https://www.enterprisedb.com/sites/default/files/pdf/edb_supported_open_source_software_20240515.pdf). This update unlocks more efficient and scalable Web services by enabling customers to seamlessly convert their Postgres database into a RESTful API. + +With EDB-supported software, customers can deploy with confidence knowing that these solutions are not only packaged by EDB but also come with the assurance of thorough EDB review and dedicated support for any issues encountered. + +## New Migration Toolkit command line options ease migration criteria specification + +Operators have long relied on the [EDB Migration Toolkit](https://www.enterprisedb.com/products/migration-toolkit-move-oracle-postgresql) command-line tool to help migrate tables and data from legacy DBMS to PostgreSQL or [EDB Postgres Advanced Server](https://www.enterprisedb.com/products/edb-postgres-advanced-server). By adding the ability to enable command line options to be specified via a file for configuration, EDB Migration Toolkit reduces administration risks and increases the predictability of Oracle-to-Postgres migrations. + +The new version of the Migration Toolkit is available for download from the EDB repositories or the [EDB software downloads page](https://www.enterprisedb.com/software-downloads-postgres#migration-toolkit). + +For more information about the new command line options and other enhancements and fixes in this release, see the [Migration Toolkit documentation](https://www.enterprisedb.com/docs/migration_toolkit/latest/). + +## EDB Query Advisor updates provide actionable index recommendations  + +EDB customers can leverage [EDB Query Advisor](https://www.enterprisedb.com/docs/pg_extensions/query_advisor/) with PostgreSQL, PGE, and EPAS to get index recommendations based on the running database workload. This soon-to-be-released version builds on that capability by analyzing the gap between estimated and actual rows in query plans. With this insight, EDB Query Advisor provides actionable recommendations for extended table statistics, resulting in more accurate query plans, improved performance, and enhanced overall database efficiency. + +## EDB Postgres for Kubernetes 1.24 released + +[EDB Postgres for Kubernetes](https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/) adds speed, efficiency, and protection for your k8s infrastructure modernization, with an enterprise-grade operator for Postgres. EDB Postgres for Kubernetes brings automation, security, and reliability to cloud-native data infrastructures. + +Our new EDB Postgres for Kubernetes upstream release is merged with CloudNativePG 1.24.0.  + +For more CloudNativePG information, check out the [1.24 Release Notes](https://cloudnative-pg.io/documentation/1.24/release_notes/v1.24/). + +## Barman 3.11 release leverages PostgreSQL 17 incremental backup and more + +[Barman](https://pgbarman.org/) provides robust backup and recovery solutions for PostgreSQL. We recently detailed how the combination of Barman 3.11 and PostgreSQL 17 combine to deliver seamless, enterprise-grade backup strategies in this EDB blog: [_Why PostgreSQL 17's Incremental Backup Feature is a Game-Changer_](https://www.enterprisedb.com/blog/why-postgresql-17s-incremental-backup-feature-game-changer). + +Learn more about enhanced Barman 3.11 features focused on saving resources and adding configuration options in the [Release Notes](https://github.com/EnterpriseDB/barman/blob/release/3.11.1/NEWS). + +Take a closer look at [Barman on EDB Docs](https://www.enterprisedb.com/docs/supported-open-source/barman/). + +## There’s more to come from EDB in Q3  + +We’re not done sharing news about our EDB Postgres AI enhancements. As a leading PostgreSQL community contributor, we already have [expert-level analysis](https://iw-resources.informationweek.com/free/w_defa6615/) into our favorite features in the upcoming PostgreSQL17 release. We’ll share news on how EDB database platforms build on this important community update as the September 15th release date arrives. Plus, stay tuned for new EDB announcements focused on our fully managed EDB Postgres AI Cloud Service and EDB Postgres Distributed database solutions.  + +For more information about our Q3 releases, [contact us today](https://www.enterprisedb.com/contact). + diff --git a/advocacy_docs/edb-postgres-ai/overview/latest-release-news/index.mdx b/advocacy_docs/edb-postgres-ai/overview/latest-release-news/index.mdx new file mode 100644 index 00000000000..f18a86349bc --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/overview/latest-release-news/index.mdx @@ -0,0 +1,92 @@ +--- +title: EDB Postgres AI Overview - Latest release news +navTitle: Release News +indexCards: simple +iconName: Earth +navigation: +- 2024q3release +- 2024q2release +--- + +** August 28, 2024 ** + +This [release roundup](https://www.enterprisedb.com/blog/release-radar-edb-postgresr-ai-q3-release-highlights) originally appeared on the EDB blog. + +There’s a lot of energy at EDB following the [Q2 announcement of EDB Postgres AI](2024q2release), an intelligent platform for unified management of transactional, analytical, and AI workloads, last quarter. The features we’re unveiling today build on last quarter’s EDB Postgres AI announcement by further enhancing the value delivered across our platform to support next gen applications, while further strengthening our core transactional database capabilities. + +Let’s take a closer look at these features. + +## Trusted Postgres Architect release improves HA cluster deployment verification and PEM & EFM integration + +With Trusted Postgres Architect (TPA), we continuously seek to create useful default configurations for our users to minimize the manual work required to tweak the cluster after deployment. We’ve done that and more with our latest features. + +TPA Version 23.24 introduces clearer and more concise CLI output, helping operators verify that deployment was successful. Today’s release includes a new module that processes the raw Ansible output into a more concise form before streaming it to the operator, making it easier to spot important information. + +For clusters that include EDB [Postgres Enterprise Manager](https://www.enterprisedb.com/products/postgres-enterprise-manager) (PEM), operators can now specify additional options to pass to the “register-agent” command, as well as provide their own SSL certificates to be used by the PEM web server. This enhancement improves upon earlier processes requiring the use of custom hooks or manual configuration. For [Enterprise Failover Manager](https://www.enterprisedb.com/docs/efm/latest/) (EFM) clusters, TPA now supports the new configuration parameters introduced with [EFM 4.9](https://www.enterprisedb.com/docs/efm/latest/efm_rel_notes/01_efm_49_rel_notes/). In addition, the configuration files created by TPA now include all the formatting and comments provided in the default EFM configuration files, making it easier for any operator wanting to manually inspect these files. TPA now allows users to specify whether TPA uses host names or IP addresses in EFM configuration, where previously only IP addresses were supported. + +Find out more about [Trusted Postgres Architect](https://www.enterprisedb.com/docs/tpa/latest/). If you’re new to TPA, try out this [TPA tutorial to spin up your first cluster](https://www.enterprisedb.com/docs/tpa/latest/firstclusterdeployment/). + +## Postgres Enterprise Manager advances database and server monitoring + +Postgres Enterprise Manager provides tools to keep databases running smoothly, continuously monitoring database and server health with real-time graphical dashboards and automatic alerts. When issues are detected, PEM makes it easier to pinpoint and fix performance bottlenecks with integrated query profiling, performance, and log analysis tools. Highlights of Postgres Enterprise Manager Version 9.7.0 include the following: + +- **EDB Postgres Distributed information in the “Core Usage” report:** This release enhances the PEM core usage report by adding information about EDB Postgres Distributed (PGD), giving customers a more complete view of their EDB license usage. As a result, the core usage report shows how many cores are running each version of Postgres across all PEM-monitored servers, including PGD.  +- **Copy notification settings:** This release extends the copy feature used by PEM to synchronize settings across multiple servers to now include notification settings, making it more powerful and reducing the need for additional manual steps or coding using the REST API.  +- **Dynamic probe scheduling to avoid showing outdated information:** PEM Agents collect data from monitored servers periodically, so changes can take a short time to reflect in the data. To mitigate this, PEM will now force some key agent tasks (known as probes) to run immediately after a server is added or changed in PEM, reducing the lag between adding or modifying a Postgres instance and the information being reflected in PEM monitoring data.  + +To get the latest PEM and enjoy all these benefits, [update your PEM today](https://www.enterprisedb.com/docs/pem/latest/upgrading/upgrading_pem_installation/). + +## EDB Database Server Updates for PostgreSQL Community + +As part of EDB’s support for the open source community’s quarterly release schedule, we completed new software releases in the EDB repositories of PostgreSQL, EDB Postgres Extended (PGE) Server, and EDB Postgres Advanced Server (EPAS), including the following:  + +| Database Distributions | Versions Released | +| ---------------------- | ----------------- | +| PostgreSQL | 16.4, 15.8, 14.13, 13.16 and 12.20 | +| EDB Postgres Extended Server | 16.4.1, 15.8.1, 14.13.1, 13.16 and 12.20 | +| EDB Postgres Advanced Server | 16.4.1, 15.8.1, 14.13.1, 13.16.22 and 12.20.25 | + +The PostgreSQL minor releases were made available by the PostgreSQL Global Development Group on August 8th, addressing a security vulnerability. EDB repositories were simultaneously updated with new, minor releases of PostgreSQL, PGE, and EPAS, incorporating upstream fixes and additional feature enhancements. Complete [PGE release notes](https://www.enterprisedb.com/docs/pge/latest/release_notes/) and [EPAS release notes](https://www.enterprisedb.com/docs/epas/latest/epas_rel_notes/) are available. For details on the security fix and other improvements in PostgreSQL, please see [the Postgres release announcement](https://www.postgresql.org/about/news/postgresql-164-158-1413-1316-1220-and-17-beta-3-released-2910/). + + +## Seamlessly Convert Postgres Databases into a RESTful API with PostgREST Extension Support + +The popular open source tool [PostgREST](https://docs.postgrest.org/en/latest/) has been added to EDB’s supported [open source software list](https://www.enterprisedb.com/sites/default/files/pdf/edb_supported_open_source_software_20240515.pdf). This update unlocks more efficient and scalable Web services by enabling customers to seamlessly convert their Postgres database into a RESTful API. + +With EDB-supported software, customers can deploy with confidence knowing that these solutions are not only packaged by EDB but also come with the assurance of thorough EDB review and dedicated support for any issues encountered. + +## New Migration Toolkit command line options ease migration criteria specification + +Operators have long relied on the [EDB Migration Toolkit](https://www.enterprisedb.com/products/migration-toolkit-move-oracle-postgresql) command-line tool to help migrate tables and data from legacy DBMS to PostgreSQL or [EDB Postgres Advanced Server](https://www.enterprisedb.com/products/edb-postgres-advanced-server). By adding the ability to enable command line options to be specified via a file for configuration, EDB Migration Toolkit reduces administration risks and increases the predictability of Oracle-to-Postgres migrations. + +The new version of the Migration Toolkit is available for download from the EDB repositories or the [EDB software downloads page](https://www.enterprisedb.com/software-downloads-postgres#migration-toolkit). + +For more information about the new command line options and other enhancements and fixes in this release, see the [Migration Toolkit documentation](https://www.enterprisedb.com/docs/migration_toolkit/latest/). + +## EDB Query Advisor updates provide actionable index recommendations  + +EDB customers can leverage [EDB Query Advisor](https://www.enterprisedb.com/docs/pg_extensions/query_advisor/) with PostgreSQL, PGE, and EPAS to get index recommendations based on the running database workload. This soon-to-be-released version builds on that capability by analyzing the gap between estimated and actual rows in query plans. With this insight, EDB Query Advisor provides actionable recommendations for extended table statistics, resulting in more accurate query plans, improved performance, and enhanced overall database efficiency. + +## EDB Postgres for Kubernetes 1.24 released + +[EDB Postgres for Kubernetes](https://www.enterprisedb.com/docs/postgres_for_kubernetes/latest/) adds speed, efficiency, and protection for your k8s infrastructure modernization, with an enterprise-grade operator for Postgres. EDB Postgres for Kubernetes brings automation, security, and reliability to cloud-native data infrastructures. + +Our new EDB Postgres for Kubernetes upstream release is merged with CloudNativePG 1.24.0.  + +For more CloudNativePG information, check out the [1.24 Release Notes](https://cloudnative-pg.io/documentation/1.24/release_notes/v1.24/). + +## Barman 3.11 release leverages PostgreSQL 17 incremental backup and more + +[Barman](https://pgbarman.org/) provides robust backup and recovery solutions for PostgreSQL. We recently detailed how the combination of Barman 3.11 and PostgreSQL 17 combine to deliver seamless, enterprise-grade backup strategies in this EDB blog: [_Why PostgreSQL 17's Incremental Backup Feature is a Game-Changer_](https://www.enterprisedb.com/blog/why-postgresql-17s-incremental-backup-feature-game-changer). + +Learn more about enhanced Barman 3.11 features focused on saving resources and adding configuration options in the [Release Notes](https://github.com/EnterpriseDB/barman/blob/release/3.11.1/NEWS). + +Take a closer look at [Barman on EDB Docs](https://www.enterprisedb.com/docs/supported-open-source/barman/). + +## There’s more to come from EDB in Q3  + +We’re not done sharing news about our EDB Postgres AI enhancements. As a leading PostgreSQL community contributor, we already have [expert-level analysis](https://iw-resources.informationweek.com/free/w_defa6615/) into our favorite features in the upcoming PostgreSQL17 release. We’ll share news on how EDB database platforms build on this important community update as the September 15th release date arrives. Plus, stay tuned for new EDB announcements focused on our fully managed EDB Postgres AI Cloud Service and EDB Postgres Distributed database solutions.  + +For more information about our Q3 releases, [contact us today](https://www.enterprisedb.com/contact). + + diff --git a/product_docs/docs/efm/4/04_configuring_efm/01_cluster_properties.mdx b/product_docs/docs/efm/4/04_configuring_efm/01_cluster_properties.mdx index 99abec3e4e4..7b636735100 100644 --- a/product_docs/docs/efm/4/04_configuring_efm/01_cluster_properties.mdx +++ b/product_docs/docs/efm/4/04_configuring_efm/01_cluster_properties.mdx @@ -1237,9 +1237,9 @@ lock.dir= -Use the `log.dir` property to specify the location to write agent log files. Failover Manager attempts to create the directory if the directory doesn't exist. The `log.dir` parameter defined in the efm.properties file determines the directory path where the EFM logs are stored. Please note that this parameter applies exclusively to the EFM logs and does not affect the logging configuration for any other components or services. +Use the `log.dir` property to specify the location to write agent log files. Failover Manager attempts to create the directory if the directory doesn't exist. The `log.dir` parameter defined in the `efm.properties` file determines the directory path where the EFM logs are stored. This parameter applies exclusively to the EFM logs and doesn't affect the logging configuration for any other components or services. -To change the startup log location for EFM, you need to modify the `runefm.sh` script located in the EFM's bin directory. Specifically, you can set the `LOG` parameter within this script to define the desired log file location. +To change the startup log location for EFM, modify the `runefm.sh` script located in the EFM bin directory. Set the `LOG` parameter in this script to define the desired log file location. ```ini # Specify the directory of agent logs on the node. If the path diff --git a/product_docs/docs/efm/4/efm_rel_notes/01_efm_410_rel_notes.mdx b/product_docs/docs/efm/4/efm_rel_notes/01_efm_410_rel_notes.mdx index 158f39a997c..c1cc0830f88 100644 --- a/product_docs/docs/efm/4/efm_rel_notes/01_efm_410_rel_notes.mdx +++ b/product_docs/docs/efm/4/efm_rel_notes/01_efm_410_rel_notes.mdx @@ -7,7 +7,7 @@ Enhancements, bug fixes, and other changes in EFM 4.10 include: | Type | Description | | ---- |------------ | | Enhancement | Failover Manager was upgraded to use the Bouncy Castle cryptographic library version 1.0.2.5. | -| Bug Fix | Improve handling of rare case where the standby to promote becomes unavailable during a switchover. [Support ticket: #37266] | -| Bug Fix | The `efm upgrade-conf` command will not lose .nodes file information when the `-source` and destination directories are the same. [Support ticket: #37479] | +| Bug Fix | Improved handling of rare case where the standby to promote becomes unavailable during a switchover. [Support ticket: #37266] | +| Bug Fix | The `efm upgrade-conf` command now doesn't lose `.nodes` file information when the `-source` and destination directories are the same. [Support ticket: #37479] | | Bug Fix | Fixed an issue where the `efm cluster-status` command hid connection errors if every database connection failed. [Support ticket: #39108] | -| Bug Fix | At startup, if an agent with a primary database sees that there is already a primary in the cluster, it will drop the VIP if applicable when fencing off the database. | +| Bug Fix | At startup, if an agent with a primary database sees that there is already a primary in the cluster, it now drops the VIP, if applicable, when fencing off the database. | diff --git a/product_docs/docs/eprs/7/05_smr_operation/03_creating_subscription/02_adding_subscription_database.mdx b/product_docs/docs/eprs/7/05_smr_operation/03_creating_subscription/02_adding_subscription_database.mdx index 0b37be48271..375bf4c8913 100644 --- a/product_docs/docs/eprs/7/05_smr_operation/03_creating_subscription/02_adding_subscription_database.mdx +++ b/product_docs/docs/eprs/7/05_smr_operation/03_creating_subscription/02_adding_subscription_database.mdx @@ -4,7 +4,7 @@ title: "Adding a subscription database" -You must identify to Replication Server the database for subscriptions. You do this buy creating a subscription database definition. +To allow Replication Server to identify an instance as a database for subscription, you must create a subscription database definition. After you create the subscription database definition, a Subscription Database node representing that subscription database definition appears in the replication tree of the Replication Server console. Subscriptions created subordinate to this subscription database definition have their publications replicated to the database identified by the subscription database definition. diff --git a/product_docs/docs/eprs/7/05_smr_operation/08_optimizing_performance/02_optimize_sync_replication/using_bacth_lob.mdx b/product_docs/docs/eprs/7/05_smr_operation/08_optimizing_performance/02_optimize_sync_replication/using_bacth_lob.mdx index 84715ba32cf..79d61be285a 100644 --- a/product_docs/docs/eprs/7/05_smr_operation/08_optimizing_performance/02_optimize_sync_replication/using_bacth_lob.mdx +++ b/product_docs/docs/eprs/7/05_smr_operation/08_optimizing_performance/02_optimize_sync_replication/using_bacth_lob.mdx @@ -2,11 +2,11 @@ title: "Using batch synchronization for LOB objects" --- -When synchronizing tables with LOB-type columns (with BLOB, CLOB, BYTEA etc. objects), Replication Server copies the rows by grouping 5 rows in a batch by default. This is to avoid causing out-of-memory errors when an LOB column has a large amount of data. You can customize the batch size to increase or decrease the number of rows copied in a batch. +When synchronizing tables with LOB-type columns (with BLOB, CLOB, BYTEA objects, and so on), Replication Server copies the rows by grouping 5 rows in a batch by default. This is to avoid causing out-of-memory errors when a LOB column has a large amount of data. You can customize the batch size to increase or decrease the number of rows copied in a batch. -To improve the performance of synchronization procedures, reduce network roundtrips and speed up data replication, you can copy rows with LOBs in larger batches. Configure larger batches only if the instance hosting the replication server has enough memory. +To improve the performance of synchronization procedures, reduce network roundtrips, and speed up data replication, you can copy rows with LOBs in larger batches. Configure larger batches only if the instance hosting the replication server has enough memory. -You can configure the replication of up to 1000 rows per batch. +You can configure the replication of up to 1000 rows per batch. !!!note Before increasing the number of rows to synchronize per batch, consider resource availability. Batch procedures increase memory consumption during replication processes. @@ -16,7 +16,7 @@ You can configure the replication of up to 1000 rows per batch. ### Prerequisites -You have upgraded your instances to a Replication Server version that includes this feature. See [Release Notes](../../../eprs_rel_notes) for an overview of the available release versions and the included enhancements. +Your instances must use a Replication Server version that includes this feature. See [Release notes](../../../eprs_rel_notes) for an overview of the available release versions and the included enhancements. ### Altering the number of rows per batch @@ -26,12 +26,12 @@ You have upgraded your instances to a Replication Server version that includes t 1. In the `xdb_pubserver.conf` file, look for the `syncLOBBatchSize` value. -1. Adapt the `syncLOBBatchSize` value according to your needs. If not uncommented already, uncomment the line to override the default. In this example, the Replication Server will synchronize 150 rows with LOB data per batch: +1. Adapt the `syncLOBBatchSize` value according to your needs. If not uncommented already, uncomment the line to override the default. This example sets Replication Server to synchronize 150 rows with LOB data per batch: ``` syncLOBBatchSize=150 ``` -1. Reload the configuration file as specified in [Reloading the configuration file](../../../08_xdb_cli/03_xdb_cli_commands/52_reload_conf_file). +1. [Reload the configuration file](../../../08_xdb_cli/03_xdb_cli_commands/52_reload_conf_file). - Now, each time you perform a synchronization, the Replication Server will copy 150 rows with LOB objects per network roundtrip. + Now, each time you perform a synchronization, the Replication Server copies 150 rows with LOB objects per network roundtrip. diff --git a/product_docs/docs/eprs/7/07_common_operations/09_offline_snapshot.mdx b/product_docs/docs/eprs/7/07_common_operations/09_offline_snapshot.mdx index 6149a51d10f..62cde3723dc 100644 --- a/product_docs/docs/eprs/7/07_common_operations/09_offline_snapshot.mdx +++ b/product_docs/docs/eprs/7/07_common_operations/09_offline_snapshot.mdx @@ -1,5 +1,6 @@ --- title: "Loading tables from an external data source (offline snapshot)" +deepToC: true --- @@ -87,31 +88,38 @@ The default value is `true`. You can use an offline snapshot to first load the subscription tables of a single-master replication system. For a publication that's intended to have multiple subscriptions, you can create some of the subscriptions using the default Replication Server snapshot replication process as described in [Performing snapshot replication](../05_smr_operation/04_on_demand_replication/01_perform_replication/#perform_replication). You can create other subscriptions from an offline snapshot. -To create a subscription from an offline snapshot: +### Preparing the publication and subscription server configuration: + +Perform these steps before creating any subscriptions: 1. Register the publication server, add the publication database definition, and create the publication as described in [Creating a publication](../05_smr_operation/02_creating_publication/#creating_publication). 1. Register the subscription server and add the subscription database definition as described in [Registering a subscription server](../05_smr_operation/03_creating_subscription/01_registering_subscription_server/#registering_subscription_server) and [Adding a subscription database](../05_smr_operation/03_creating_subscription/02_adding_subscription_database/#adding_subscription_database). - !!! Note - You must perform steps 3 and 4 before creating the subscription. You can repeat steps 5 through 9 each time you want to create another subscription from an offline snapshot. - 1. Modify the publication server configuration file if these options aren't already set as described by the following: - Change the `offlineSnapshot` option to `true`. When you restart the publication server or reload the publication server's configuration via reloadconf, `offlineSnapshot` set to `true` has two effects. One is that creating a subscription doesn't create the schema and subscription table definitions in the subscription database as is done with the default setting. The other is that creating a subscription sets a column in the control schema indicating an offline snapshot is used to load this subscription. - Set the `batchInitialSync` option to the appropriate setting for your situation as discussed at the end of [Non-batch mode synchronization](#non_batch_mode_sync). -1. If you modified the publication server configuration file in Step 3, reload configuration of the publication server. See ["Reloading the Publication or Subscription Server Configuration File (reloadconf)"](../05_smr_operation/02_creating_publication/01_registering_publication_server/#registering_publication_server) for directions on reloading the publication server's configuration. +1. If you modified the publication server configuration, reload the configuration. See [Reloading the Publication or Subscription Server Configuration File (reloadconf)](../05_smr_operation/02_creating_publication/01_registering_publication_server/#registering_publication_server) for directions on reloading the publication server's configuration. -1. In the subscription database, create the schema and the subscription table definitions, and load the subscription tables from your offline data source. The subscription database user name used in [Adding a subscription database](../05_smr_operation/03_creating_subscription/02_adding_subscription_database/#adding_subscription_database) must have full privileges over the database objects created in this step. Also review the beginning of [Adding a subscription database](../05_smr_operation/03_creating_subscription/02_adding_subscription_database/#adding_subscription_database) regarding the rules as to how Replication Server creates the subscription definitions from the publication for each database type. You must follow these same conventions when you create the target definitions manually. +### Creating subscription servers + +Execute these steps to create a subscription from an offline snapshot. Repeat them for each additional subscription. 1. Add the subscription as described in [Adding a subscription](../05_smr_operation/03_creating_subscription/03_adding_subscription/#adding_subscription). +1. In the subscription database, create the schema and the subscription table definitions, and load the subscription tables from your offline data source. The subscription database user name used in [Adding a subscription database](../05_smr_operation/03_creating_subscription/02_adding_subscription_database/#adding_subscription_database) must have full privileges over the database objects created in this step. Also review the beginning of [Adding a subscription database](../05_smr_operation/03_creating_subscription/02_adding_subscription_database/#adding_subscription_database) regarding the rules as to how Replication Server creates the subscription definitions from the publication for each database type. You must follow these same conventions when you create the target definitions manually. + + !!!note + Ensure you don't load the offline data source from the source Publication database until after you complete the creation of a subscription. Otherwise, certain changes from the source database won't be replicated. + !!! + 1. Perform an on-demand synchronization replication. See [Performing synchronization replication](../05_smr_operation/04_on_demand_replication/02_perform_sync_replication/#perform_sync_replication) to learn how to perform an on-demand synchronization replication. 1. If you aren't planning to load any other subscriptions using an offline snapshot at this time, change the `offlineSnapshot` option back to `false` and the `batchInitialSync` option to `true` in the publication server configuration file. -1. If you modified the publication server configuration file in step 8, reload configuration of the publication server. +1. If you modified the publication server configuration, reload the configuration file. ## Multi-master replication offline snapshot @@ -120,12 +128,11 @@ You can use an offline snapshot to first load the primary nodes of a multi-maste !!! Note Offline snapshots aren't supported for a multi-master replication system that's actively in use. Any changes on an active primary node are lost during the offline snapshot process of dumping or restoring the data of another node. -To create a primary node from an offline snapshot: +### Preparing the publication and subscription server configurations: -1. Register the publication server, add the primary definition node, and create the publication as described in [Creating a publication](../06_mmr_operation/02_creating_publication_mmr/#creating_publication_mmr). +Perform these steps before adding primary nodes: - !!! Note - You must perform the steps 3 and 4 before adding a primary node to be loaded by an offline snapshot. You can repeat Steps 5 through 10 each time you want to create another primary node from an offline snapshot. +1. Register the publication server, add the primary definition node, and create the publication as described in [Creating a publication](../06_mmr_operation/02_creating_publication_mmr/#creating_publication_mmr). 1. Be sure there's no schedule defined on the replication system. If there is, remove the schedule until you complete this process. See [Removing a schedule](03_managing_schedule/#remove_schedule) for details. @@ -134,16 +141,24 @@ To create a primary node from an offline snapshot: - Set the `offlineSnapshot` option to `true`. When you restart the publication server or reload the publication server's configuration via reloadconf, this setting has the effect that adding a primary node sets a column in the control schema indicating an offline snapshot is used to load this primary node. - Set the `batchInitialSync` option to the appropriate setting for your situation as discussed at the end of [Non-batch mode synchronization](#non_batch_mode_sync). -1. If you modified the publication server configuration file in step 3, reload configuration of the publication server. See ["Reloading the Publication or Subscription Server Configuration File (reloadconf)" ](../05_smr_operation/02_creating_publication/01_registering_publication_server/#registering_publication_server) for directions to reload the publication server's configuration. +1. If you modified the publication server configuration file, reload the configuration. See [Reloading the Publication or Subscription Server Configuration File (reloadconf)](../05_smr_operation/02_creating_publication/01_registering_publication_server/#registering_publication_server) for directions to reload the publication server's configuration. -1. In the database to use as the new primary node, create the schema and the table definitions, and load the tables from your offline data source. +### Adding primary nodes + +Execute these steps to add a primary node from an offline snapshot. Repeat them for each additional primary node. 1. Add the primary node as described in [Creating more primary nodes](../06_mmr_operation/03_creating_primary_nodes/#creating_primary_nodes) with the options **Replicate Publication Schema** and **Perform Initial Snapshot** cleared. +1. In the database to use as the new primary node, create the schema and the table definitions, and load the tables from your offline data source. + + !!!note + Ensure you don't load the offline data source from the source Publication database until after you add the target node in the MMR cluster. Otherwise, certain changes from the source database won't be replicated. + !!! + 1. Perform an initial on-demand synchronization. See [Performing synchronization replication](../06_mmr_operation/05_on_demand_replication_mmr/#perform_synchronization_replication_mmr) to learn how to perform an on demand-synchronization. 1. If you aren't planning to load any other primary nodes using an offline snapshot at this time, change the `offlineSnapshot` option back to `false` and the `batchInitialSync` option to `true` in the publication server configuration file. -1. If you modified the publication server configuration file in step 8, reload configuration of the publication server. +1. If you modified the publication server configuration, reload the configuration file. 1. Add the schedule again if you removed it. See [Creating a schedule](02_creating_schedule/#creating_schedule) to learn how to create a schedule. diff --git a/product_docs/docs/pgd/5/consistency/crdt.mdx b/product_docs/docs/pgd/5/consistency/crdt.mdx deleted file mode 100644 index b6feabf40bf..00000000000 --- a/product_docs/docs/pgd/5/consistency/crdt.mdx +++ /dev/null @@ -1,553 +0,0 @@ ---- -navTitle: CRDT data types -title: Conflict-free replicated data types -redirects: - - /pgd/latest/bdr/crdt/ ---- - -Conflict-free replicated data types (CRDT) support merging values from concurrently modified rows instead of discarding one of the rows as traditional resolution does. - -!!! Note Permissions required -PGD CRDT requires usage access to CRDT types. Therefore any user needing to access CRDT types must have at least the [bdr_application](../security/pgd-predefined-roles/#bdr_application) role assigned to them. -!!! - -Each CRDT type is implemented as a separate PostgreSQL data type with an extra callback added to the `bdr.crdt_handlers` catalog. The merge process happens inside the PGD writer on the apply side without any user -action needed. - -CRDTs require the table to have column-level conflict resolution enabled, as described in [CLCD](column-level-conflicts). - -The only action you need to take is to use a particular data type in CREATE/ALTER TABLE rather than standard built-in data types such as integer. For example, consider the following table with one regular integer counter and a single row: - -``` -CREATE TABLE non_crdt_example ( - id integer PRIMARY KEY, - counter integer NOT NULL DEFAULT 0 -); - -INSERT INTO non_crdt_example (id) VALUES (1); -``` - -Suppose you issue the following SQL on two nodes at same time: - -``` -UPDATE non_crdt_example - SET counter = counter + 1 -- "reflexive" update - WHERE id = 1; -``` - -After both updates are applied, you can see the resulting values using this query: - -``` -SELECT * FROM non_crdt_example WHERE id = 1; - id | counter - -----+----------- - 1 | 1 -(1 row) -``` - -This code shows that you lost one of the increments due to the `update_if_newer` conflict resolver. If you use the CRDT counter data type instead, the result looks like this: - -``` -CREATE TABLE crdt_example ( - id integer PRIMARY KEY, - counter bdr.crdt_gcounter NOT NULL DEFAULT 0 -); - -ALTER TABLE crdt_example REPLICA IDENTITY FULL; - -SELECT bdr.alter_table_conflict_detection('crdt_example', - 'column_modify_timestamp', 'cts'); - -INSERT INTO crdt_example (id) VALUES (1); -``` - -Again issue the following SQL on two nodes at same time, and then wait for the changes to be applied: - -``` -UPDATE crdt_example - SET counter = counter + 1 -- "reflexive" update - WHERE id = 1; - -SELECT id, counter FROM crdt_example WHERE id = 1; - id | counter - -----+----------- - 1 | 2 -(1 row) -``` - -This example shows that CRDTs correctly allow accumulator columns to work, even in the face of asynchronous concurrent updates that otherwise conflict. - -The `crdt_gcounter` type is an example of state-based CRDT types that work only with reflexive UPDATE SQL, such as `x = x + 1`, as the example shows. - -The `bdr.crdt_raw_value` configuration option determines whether queries return the current value or the full internal state of the CRDT type. By default, only the current numeric value is returned. When set to `true`, queries return representation of the full state. You can use the special hash operator -(`#`) to request only the current numeric value without using the special operator (the default behavior). If the full state is dumped using `bdr.crdt_raw_value = on`, then the value can reload only with `bdr.crdt_raw_value = on`. - -!!! Note - The `bdr.crdt_raw_value` applies formatting only of data returned to clients, that is, simple column references in the select list. Any column references in other parts of the query (such as `WHERE` clause or even expressions in the select list) might still require use of the `#` operator. - -Another class of CRDT data types is referred to as *delta CRDT* types. These are a special subclass of operation-based CRDTs. - -With delta CRDTs, any update to a value is compared to the previous value on the same node. Then a change is applied as a delta on all other nodes. - -``` -CREATE TABLE crdt_delta_example ( - id integer PRIMARY KEY, - counter bdr.crdt_delta_counter NOT NULL DEFAULT 0 -); - -ALTER TABLE crdt_delta_example REPLICA IDENTITY FULL; - -SELECT bdr.alter_table_conflict_detection('crdt_delta_example', - 'column_modify_timestamp', 'cts'); - -INSERT INTO crdt_delta_example (id) VALUES (1); -``` - -Suppose you issue the following SQL on two nodes at same time: - -``` -UPDATE crdt_delta_example - SET counter = 2 -- notice NOT counter = counter + 2 - WHERE id = 1; -``` - -After both updates are applied, you can see the resulting values using this query: - -``` -SELECT id, counter FROM crdt_delta_example WHERE id = 1; - id | counter - -----+--------- - 1 | 4 -(1 row) -``` - -With a regular `integer` column, the result is `2`. But when you update the row with a delta CRDT counter, you start with the OLD row version, make a NEW row version, and send both to the remote node. There, compare them with the version found there (e.g., the LOCAL version). Standard CRDTs merge the NEW and the LOCAL version, while delta CRDTs compare the OLD and NEW versions and apply the delta -to the LOCAL version. - -The CRDT types are installed as part of `bdr` into the `bdr` schema. For convenience, the basic operators (`+`, `#` and `!`) and a number of common aggregate functions (`min`, `max`, `sum`, and `avg`) are created in `pg_catalog`. Thus they are available without having to tweak `search_path`. - -An important question is how query planning and optimization works with these new data types. CRDT types are handled transparently. Both `ANALYZE` and the optimizer work, so estimation and query planning works fine without having to do anything else. - -## State-based and operation-based CRDTs - -Following the notation from [1], both operation-based and state-based CRDTs are implemented. - -### Operation-based CRDT types (CmCRDT) - -The implementation of operation-based types is trivial because the operation isn't transferred explicitly but computed from the old and new row received from the remote node. - -Currently, these operation-based CRDTs are implemented: - -- `crdt_delta_counter` — `bigint` counter (increments/decrements) -- `crdt_delta_sum` — `numeric` sum (increments/decrements) - -These types leverage existing data types with a little bit of code to compute the delta. For example, `crdt_delta_counter` is a domain on a `bigint`. - -This approach is possible only for types for which the method for computing the delta is known, but the result is simple and cheap (both in terms of space and CPU) and has a couple of added benefits. For example, it can leverage operators/syntax for the underlying data type. - -The main disadvantage is that you can't reset this value reliably in an asynchronous and concurrent environment. - -!!! Note - Implementing more complicated operation-based types by creating custom data types is possible, storing the state and the last operation. (Every change is decoded and transferred, so multiple - operations aren't needed). But at that point, the main benefits (simplicity, reuse of existing data types) are lost without gaining any advantage compared to state-based types (for example, still no capability to reset) except for the space requirements. (A per-node state isn't needed.) - -### State-based CRDT types (CvCRDT) - -State-based types require a more complex internal state and so can't use the regular data types directly the way operation-based types do. - -Currently, four state-based CRDTs are implemented: - -- `crdt_gcounter` — `bigint` counter (increment-only) -- `crdt_gsum` — `numeric` sum/counter (increment-only) -- `crdt_pncounter` — `bigint` counter (increments/decrements) -- `crdt_pnsum` — `numeric` sum/counter (increments/decrements) - -The internal state typically includes per-node information, increasing the on-disk size but allowing added benefits. The need to implement custom data types implies more code (in/out functions and operators). - -The advantage is the ability to reliably reset the values, a somewhat self-healing nature in the presence of lost changes (which doesn't happen in a cluster that operates properly), and the ability to receive changes from other than source nodes. - -Consider, for example, that a value is modified on node A, and the change gets replicated to B but not C due to network issue between A and C. If B modifies the value and this change gets replicated to C, it -includes even the original change from A. With operation-based CRDTs, node C doesn't receive the change until the A-C network connection starts working again. - -The main disadvantages of CvCRDTs are higher costs in terms of disk space and CPU usage. A bit of information for each node is needed, including nodes that were already removed from the cluster. The complex nature of the state (serialized into varlena types) means increased CPU use. - -## Disk-space requirements - -An important consideration is the overhead associated with CRDT types, particularly the on-disk size. - -For operation-based types, this is trivial because the types are merely domains on top of other types. They have the same disk space requirements no matter how many nodes are there: - -- `crdt_delta_counter` — Same as `bigint` (8 bytes) -- `crdt_delta_sum` — Same as `numeric` (variable, depending on precision - and scale) - -There's no dependency on the number of nodes because operation-based CRDT types don't store any per-node information. - -For state-based types, the situation is more complicated. All the types are variable length (stored essentially as a `bytea` column) and consist of a header and a certain amount of per-node information for each node that modified the value. - -For the `bigint` variants, formulas computing approximate size are: - -- `crdt_gcounter` — `32B (header) + N * 12B (per-node)` -- `crdt_pncounter` -—`48B (header) + N * 20B (per-node)` - - `N` denotes the number of nodes that modified this value. - -For the `numeric` variants, there's no exact formula because both the header and per-node parts include `numeric` variable-length values. To give you an idea of how many such values you need to keep: - -- `crdt_gsum` - - fixed: `20B (header) + N * 4B (per-node)` - - variable: `(2 + N)` `numeric` values -- `crdt_pnsum` - - fixed: `20B (header) + N * 4B (per-node)` - - variable: `(4 + 2 * N)` `numeric` values - -!!! Note - It doesn't matter how many nodes are in the cluster if the values are never updated on multiple nodes. It also doesn't matter whether the updates were concurrent (causing a conflict). - - In addition, it doesn't matter how many of those nodes were already removed from the cluster. There's no way to compact the state yet. - -## CRDT types versus conflicts handling - -As tables can contain both CRDT and non-CRDT columns (most columns are expected to be non-CRDT), you need to do both the regular conflict resolution and CRDT merge. - -The conflict resolution happens first and is responsible for deciding the tuple to keep (applytuple) and the one to discard. The merge phase happens next, merging data for CRDT columns from the discarded -tuple into the applytuple. - -!!! Note - This handling makes CRDT types somewhat more expensive compared to plain conflict resolution because the merge needs to happen every time. This is the case even when the conflict resolution can use one of the fast paths (such as those modified in the current transaction). - -## CRDT types versus conflict reporting - -By default, detected conflicts are individually reported. Without CRDT types, this makes sense because the conflict resolution essentially throws away half of the available information (local or remote row, depending on configuration). This presents a data loss. - -CRDT types allow both parts of the information to be combined without throwing anything away, eliminating the data loss issue. This approach makes the conflict reporting unnecessary. - -For this reason, conflict reporting is skipped when the conflict can be fully resolved by CRDT merge. Each column must meet at least one of these two conditions: - -- The values in local and remote tuple are the same (NULL or equal). -- It uses a CRDT data type and so can be merged. - -!!! Note - Conflict reporting is also skipped when there are no CRDT columns but all values in local/remote tuples are equal. - -## Resetting CRDT values - -Resetting CRDT values is possible but requires special handling. The asynchronous nature of the -cluster means that different nodes might see the reset operation at different places in the change stream no matter how it's implemented. Different nodes might also initiate a reset concurrently, that is, before -observing the reset from the other node. - -In other words, to make the reset operation behave correctly, it needs to be commutative with respect to the regular operations. Many naive ways to reset a value that might work well on a single-node fail -for this reason. - -For example, the simplest approach to resetting a value might be: - -``` -UPDATE crdt_table SET cnt = 0 WHERE id = 1; -``` - -With state-based CRDTs this doesn't work. It throws away the state for the other nodes but only locally. It's added back by merge functions on remote nodes, causing diverging values and eventually receiving it -back due to changes on the other nodes. - -With operation-based CRDTs, this might seem to work because the update is interpreted as a subtraction of `-cnt`. But it works only in the absence of concurrent resets. Once two nodes attempt to do a reset at the same time, the delta is applied twice, getting a negative value (which isn't expected from a reset). - -It might also seem that you can use `DELETE + INSERT` as a reset, but this approach has a couple of weaknesses, too. If the row is reinserted with the same key, it's not guaranteed that all nodes see it at the same position in the stream of operations with respect to changes from other nodes. PGD specifically discourages reusing the same primary key value since it can lead to data anomalies in concurrent cases. - -State-based CRDT types can reliably handle resets using a special `!` operator like this: - -``` -UPDATE tab SET counter = !counter WHERE ...; -``` - -"Reliably" means the values don't have the two issues of multiple concurrent resets and divergence. - -Operation-based CRDT types can be reset reliably only using [Eager Replication](eager), since this avoids multiple concurrent resets. You can also use Eager Replication to set either kind of CRDT to a specific -value. - -## Implemented CRDT data types - -Currently, six CRDT data types are implemented: - -- Grow-only counter and sum -- Positive-negative counter and sum -- Delta counter and sum - -The counters and sums behave mostly the same, except that the counter types are integer based (`bigint`), while the sum types are decimal-based (`numeric`). - -Additional CRDT types, described at [1], might be implemented later. - -You can list the currently implemented CRDT data types with the following query: - -```sql -SELECT n.nspname, t.typname -FROM bdr.crdt_handlers c -JOIN (pg_type t JOIN pg_namespace n ON t.typnamespace = n.oid) - ON t.oid = c.crdt_type_id; -``` - -### grow-only counter (`crdt_gcounter`) - -- Supports only increments with nonnegative values (`value + int` and `counter + bigint` operators). - -- You can obtain the current value of the counter either using `#` operator or by casting it to `bigint`. - -- Isn't compatible with simple assignments like `counter = value` (which is common pattern when the new value is computed somewhere in the application). - -- Allows simple reset of the counter using the `!` operator ( `counter = !counter` ). - -- You can inspect the internal state using `crdt_gcounter_to_text`. - -``` -CREATE TABLE crdt_test ( - id INT PRIMARY KEY, - cnt bdr.crdt_gcounter NOT NULL DEFAULT 0 -); - -INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 -INSERT INTO crdt_test VALUES (2, 129824); -- initialized to 129824 -INSERT INTO crdt_test VALUES (3, -4531); -- error: negative value - --- enable CLCD on the table -ALTER TABLE crdt_test REPLICA IDENTITY FULL; -SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); - --- increment counters -UPDATE crdt_test SET cnt = cnt + 1 WHERE id = 1; -UPDATE crdt_test SET cnt = cnt + 120 WHERE id = 2; - --- error: minus operator not defined -UPDATE crdt_test SET cnt = cnt - 1 WHERE id = 1; - --- error: increment has to be non-negative -UPDATE crdt_test SET cnt = cnt + (-1) WHERE id = 1; - --- reset counter -UPDATE crdt_test SET cnt = !cnt WHERE id = 1; - --- get current counter value -SELECT id, cnt::bigint, cnt FROM crdt_test; - --- show internal structure of counters -SELECT id, bdr.crdt_gcounter_to_text(cnt) FROM crdt_test; -``` - -### grow-only sum (`crdt_gsum`) - -- Supports only increments with nonnegative values (`sum + numeric`). - -- You can obtain the current value of the sum either by using the `#` operator or by casting it to `numeric`. - -- Isn't compatible with simple assignments like `sum = value`, which is the common pattern when the new value is computed somewhere in the application. - -- Allows simple reset of the sum using the `!` operator (`sum = !sum`). - -- Can inspect internal state using `crdt_gsum_to_text`. - -``` -CREATE TABLE crdt_test ( - id INT PRIMARY KEY, - gsum bdr.crdt_gsum NOT NULL DEFAULT 0.0 -); - -INSERT INTO crdt_test VALUES (1, 0.0); -- initialized to 0 -INSERT INTO crdt_test VALUES (2, 1298.24); -- initialized to 1298.24 -INSERT INTO crdt_test VALUES (3, -45.31); -- error: negative value - --- enable CLCD on the table -ALTER TABLE crdt_test REPLICA IDENTITY FULL; -SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); - --- increment sum -UPDATE crdt_test SET gsum = gsum + 11.5 WHERE id = 1; -UPDATE crdt_test SET gsum = gsum + 120.33 WHERE id = 2; - --- error: minus operator not defined -UPDATE crdt_test SET gsum = gsum - 15.2 WHERE id = 1; - --- error: increment has to be non-negative -UPDATE crdt_test SET gsum = gsum + (-1.56) WHERE id = 1; - --- reset sum -UPDATE crdt_test SET gsum = !gsum WHERE id = 1; - --- get current sum value -SELECT id, gsum::numeric, gsum FROM crdt_test; - --- show internal structure of sums -SELECT id, bdr.crdt_gsum_to_text(gsum) FROM crdt_test; -``` - -### positive-negative counter (`crdt_pncounter`) - -- Supports increments with both positive and negative values (through `counter + int` and `counter + bigint` operators). - -- You can obtain the current value of the counter either by using the `#` operator or by casting to `bigint`. - -- Isn't compatible with simple assignments like `counter = value`, which is the common pattern when the new value is computed somewhere in the application. - -- Allows simple reset of the counter using the `!` operator (`counter = !counter`). - -- You can inspect the internal state using `crdt_pncounter_to_text`. - -``` -CREATE TABLE crdt_test ( - id INT PRIMARY KEY, - cnt bdr.crdt_pncounter NOT NULL DEFAULT 0 -); - -INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 -INSERT INTO crdt_test VALUES (2, 129824); -- initialized to 129824 -INSERT INTO crdt_test VALUES (3, -4531); -- initialized to -4531 - --- enable CLCD on the table -ALTER TABLE crdt_test REPLICA IDENTITY FULL; -SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); - --- increment counters -UPDATE crdt_test SET cnt = cnt + 1 WHERE id = 1; -UPDATE crdt_test SET cnt = cnt + 120 WHERE id = 2; -UPDATE crdt_test SET cnt = cnt + (-244) WHERE id = 3; - --- decrement counters -UPDATE crdt_test SET cnt = cnt - 73 WHERE id = 1; -UPDATE crdt_test SET cnt = cnt - 19283 WHERE id = 2; -UPDATE crdt_test SET cnt = cnt - (-12) WHERE id = 3; - --- get current counter value -SELECT id, cnt::bigint, cnt FROM crdt_test; - --- show internal structure of counters -SELECT id, bdr.crdt_pncounter_to_text(cnt) FROM crdt_test; - --- reset counter -UPDATE crdt_test SET cnt = !cnt WHERE id = 1; - --- get current counter value after the reset -SELECT id, cnt::bigint, cnt FROM crdt_test; -``` - -### positive-negative sum (`crdt_pnsum`) - -- Supports increments with both positive and negative values through `sum + numeric`. - -- You can obtain the current value of the sum either by using then `#` operator or by casting to `numeric`. - -- Isn't compatible with simple assignments like `sum = value`, which is the common pattern when the new value is computed somewhere in the application. - -- Allows simple reset of the sum using the `!` operator (`sum = !sum`). - -- You can inspect the internal state using `crdt_pnsum_to_text`. - -``` -CREATE TABLE crdt_test ( - id INT PRIMARY KEY, - pnsum bdr.crdt_pnsum NOT NULL DEFAULT 0 -); - -INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 -INSERT INTO crdt_test VALUES (2, 1298.24); -- initialized to 1298.24 -INSERT INTO crdt_test VALUES (3, -45.31); -- initialized to -45.31 - --- enable CLCD on the table -ALTER TABLE crdt_test REPLICA IDENTITY FULL; -SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); - --- increment sums -UPDATE crdt_test SET pnsum = pnsum + 1.44 WHERE id = 1; -UPDATE crdt_test SET pnsum = pnsum + 12.20 WHERE id = 2; -UPDATE crdt_test SET pnsum = pnsum + (-24.34) WHERE id = 3; - --- decrement sums -UPDATE crdt_test SET pnsum = pnsum - 7.3 WHERE id = 1; -UPDATE crdt_test SET pnsum = pnsum - 192.83 WHERE id = 2; -UPDATE crdt_test SET pnsum = pnsum - (-12.22) WHERE id = 3; - --- get current sum value -SELECT id, pnsum::numeric, pnsum FROM crdt_test; - --- show internal structure of sum -SELECT id, bdr.crdt_pnsum_to_text(pnsum) FROM crdt_test; - --- reset sum -UPDATE crdt_test SET pnsum = !pnsum WHERE id = 1; - --- get current sum value after the reset -SELECT id, pnsum::numeric, pnsum FROM crdt_test; -``` - -### delta counter (`crdt_delta_counter`) - -- Is defined a `bigint` domain, so works exactly like a `bigint` column. - -- Supports increments with both positive and negative values. - -- Is compatible with simple assignments like `counter = value`, which is common when the new value is computed somewhere in the application. - -- There's no simple way to reset the value reliably. - -``` -CREATE TABLE crdt_test ( - id INT PRIMARY KEY, - cnt bdr.crdt_delta_counter NOT NULL DEFAULT 0 -); - -INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 -INSERT INTO crdt_test VALUES (2, 129824); -- initialized to 129824 -INSERT INTO crdt_test VALUES (3, -4531); -- initialized to -4531 - --- enable CLCD on the table -ALTER TABLE crdt_test REPLICA IDENTITY FULL; -SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); - --- increment counters -UPDATE crdt_test SET cnt = cnt + 1 WHERE id = 1; -UPDATE crdt_test SET cnt = cnt + 120 WHERE id = 2; -UPDATE crdt_test SET cnt = cnt + (-244) WHERE id = 3; - --- decrement counters -UPDATE crdt_test SET cnt = cnt - 73 WHERE id = 1; -UPDATE crdt_test SET cnt = cnt - 19283 WHERE id = 2; -UPDATE crdt_test SET cnt = cnt - (-12) WHERE id = 3; - --- get current counter value -SELECT id, cnt FROM crdt_test; -``` - -### delta sum (`crdt_delta_sum`) - -- Is defined as a `numeric` domain so works exactly like a `numeric` column. - -- Supports increments with both positive and negative values. - -- Is compatible with simple assignments like `sum = value`, which is common when the new value is computed somewhere in the application. - -- There's no simple way to reset the value reliably. - -``` -CREATE TABLE crdt_test ( - id INT PRIMARY KEY, - dsum bdr.crdt_delta_sum NOT NULL DEFAULT 0 -); - -INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 -INSERT INTO crdt_test VALUES (2, 129.824); -- initialized to 129824 -INSERT INTO crdt_test VALUES (3, -4.531); -- initialized to -4531 - --- enable CLCD on the table -ALTER TABLE crdt_test REPLICA IDENTITY FULL; -SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); - --- increment counters -UPDATE crdt_test SET dsum = dsum + 1.32 WHERE id = 1; -UPDATE crdt_test SET dsum = dsum + 12.01 WHERE id = 2; -UPDATE crdt_test SET dsum = dsum + (-2.4) WHERE id = 3; - --- decrement counters -UPDATE crdt_test SET dsum = dsum - 7.33 WHERE id = 1; -UPDATE crdt_test SET dsum = dsum - 19.83 WHERE id = 2; -UPDATE crdt_test SET dsum = dsum - (-1.2) WHERE id = 3; - --- get current counter value -SELECT id, cnt FROM crdt_test; -``` - -[1] diff --git a/product_docs/docs/pgd/5/consistency/crdt/00_crdt_overview.mdx b/product_docs/docs/pgd/5/consistency/crdt/00_crdt_overview.mdx new file mode 100644 index 00000000000..bdcb2c8687b --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/crdt/00_crdt_overview.mdx @@ -0,0 +1,17 @@ +--- +navTitle: Overview +title: CRDTs Overview +--- + +## Introduction to CRDTs + +Conflict-free replicated data types (CRDTs) support merging values from concurrently modified rows instead of discarding one of the rows as the traditional resolution does. + +Each CRDT type is implemented as a separate PostgreSQL data type with an extra callback added to the `bdr.crdt_handlers` catalog. The merge process happens inside the PGD writer on the apply side without any user action needed. + +CRDTs require the table to have column-level conflict resolution enabled, as described in [Column-level conflict resolution](../column-level-conflicts/02_enabling_disabling.mdx). + +## CRDTs in PostgreSQL + +[The CRDTs](06_crdt-implemented) are installed as part of `bdr` into the `bdr` schema. For convenience, the basic operators (`+`, `#` and `!`) and a number of common aggregate functions (`min`, `max`, `sum`, and `avg`) are created in `pg_catalog`. Thus they are available without having to tweak `search_path`. + diff --git a/product_docs/docs/pgd/5/consistency/crdt/01_crdt_usage.mdx b/product_docs/docs/pgd/5/consistency/crdt/01_crdt_usage.mdx new file mode 100644 index 00000000000..33fa62e3c68 --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/crdt/01_crdt_usage.mdx @@ -0,0 +1,147 @@ +--- +navTitle: Using CRDTs +title: Using CRDTs +--- + +## Using CRDTs in tables + +!!! Note Permissions required +PGD CRDTs requires usage access to CRDT types. Therefore, any user needing to access CRDT types must have at least the [bdr_application](../../security/pgd-predefined-roles/#bdr_application) role assigned to them. +!!! + +To use CRDTs, you need to use a particular data type in CREATE/ALTER TABLE rather than standard built-in data types such as `integer`. For example, consider the following table with one regular integer counter and a single row: + +### Non-CRDT example + +```sql +CREATE TABLE non_crdt_example ( + id integer PRIMARY KEY, + counter integer NOT NULL DEFAULT 0 +); + +INSERT INTO non_crdt_example (id) VALUES (1); +``` + +Suppose you issue the following SQL on two different nodes at same time: + +```sql +UPDATE non_crdt_example + SET counter = counter + 1 -- "reflexive" update + WHERE id = 1; +``` + +After both updates are applied, you can see the resulting values using this query: + +```sql +SELECT * FROM non_crdt_example WHERE id = 1; + id | counter + -----+----------- + 1 | 1 +(1 row) +``` + +This code shows that you lost one of the increments due to the `update_if_newer` conflict resolver. + +### CRDT example + +To use a CRDT counter data type instead, you would follow these steps: + +Create the table but with a CRDT (`bdr.crdt_gcounter`) as the counters data type. + +```sql +CREATE TABLE crdt_example ( + id integer PRIMARY KEY, + counter bdr.crdt_gcounter NOT NULL DEFAULT 0 +); +``` + +Configure the table for column-level conflict resolution: + +```sql +ALTER TABLE crdt_example REPLICA IDENTITY FULL; + +SELECT bdr.alter_table_conflict_detection('crdt_example', + 'column_modify_timestamp', 'cts'); +``` + +And then insert a row with a value for this example. + +``` +INSERT INTO crdt_example (id) VALUES (1); +``` + +If you now issue, as before, the same SQL on two nodes at same time. + +```sql +UPDATE crdt_example + SET counter = counter + 1 -- "reflexive" update + WHERE id = 1; +``` + +Once the changes are applied, you find that the counter has managed to concurrenct updates. + +```sql +SELECT id, counter FROM crdt_example WHERE id = 1; + id | counter + -----+----------- + 1 | 2 +(1 row) +``` + +This example shows that the CRDT correctly allows the accumulator columns to work, even in the face of asynchronous concurrent updates that otherwise conflict. + +## Configuration options + +The `bdr.crdt_raw_value` configuration option determines whether queries return the current value or the full internal state of the CRDT type. By default, only the current numeric value is returned. When set to `true`, queries return representation of the full state. You can use the special hash operator +(`#`) to request only the current numeric value without using the special operator (the default behavior). If the full state is dumped using `bdr.crdt_raw_value = on`, then the value can reload only with `bdr.crdt_raw_value = on`. + +!!! Note + The `bdr.crdt_raw_value` applies formatting only of data returned to clients, that is, simple column references in the select list. Any column references in other parts of the query (such as `WHERE` clause or even expressions in the select list) might still require use of the `#` operator. + +## Different types of CRDTs + +The `crdt_gcounter` type is an example of state-based CRDT types that work only with reflexive UPDATE SQL, such as `x = x + 1`, as the example shows. + +Another class of CRDTs are *delta CRDT* types. These are a special subclass of [operation-based CRDT](02_state-op-crdts#operation-based-crdt-types-cmcrdt). + +With delta CRDTs, any update to a value is compared to the previous value on the same node. Then a change is applied as a delta on all other nodes. + +```sql +CREATE TABLE crdt_delta_example ( + id integer PRIMARY KEY, + counter bdr.crdt_delta_counter NOT NULL DEFAULT 0 +); + +ALTER TABLE crdt_delta_example REPLICA IDENTITY FULL; + +SELECT bdr.alter_table_conflict_detection('crdt_delta_example', + 'column_modify_timestamp', 'cts'); + +INSERT INTO crdt_delta_example (id) VALUES (1); +``` + +Suppose you issue the following SQL on two nodes at same time: + +```sql +UPDATE crdt_delta_example + SET counter = 2 -- notice NOT counter = counter + 2 + WHERE id = 1; +``` + +After both updates are applied, you can see the resulting values using this query: + +```sql +SELECT id, counter FROM crdt_delta_example WHERE id = 1; + id | counter + -----+--------- + 1 | 4 +(1 row) +``` + +With a regular `integer` column, the result is `2`. But when you update the row with a delta CRDT counter, you start with the OLD row version, make a NEW row version, and send both to the remote node. There, compare them with the version found there (e.g., the LOCAL version). Standard CRDTs merge the NEW and the LOCAL version, while delta CRDTs compare the OLD and NEW versions and apply the delta +to the LOCAL version. + +## Query planning and optimization + +An important question is how query planning and optimization works with these new data types. CRDT types are handled transparently. Both `ANALYZE` and the optimizer work, so estimation and query planning works fine without having to do anything else. + diff --git a/product_docs/docs/pgd/5/consistency/crdt/02_state-op-crdts.mdx b/product_docs/docs/pgd/5/consistency/crdt/02_state-op-crdts.mdx new file mode 100644 index 00000000000..842b4b0dd8a --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/crdt/02_state-op-crdts.mdx @@ -0,0 +1,43 @@ +--- +navTitle: Operation and state-based CRDTs +title: Operation-based and state-based CRDTs +--- + +## Operation-based CRDT types (CmCRDT) + +The implementation of operation-based types is trivial because the operation isn't transferred explicitly but computed from the old and new row received from the remote node. + +Currently, these operation-based CRDTs are implemented: + +- [`crdt_delta_counter`](06_crdt-implemented/#delta-counter-crdt_delta_counter) — `bigint` counter (increments/decrements) +- [`crdt_delta_sum`](06_crdt-implemented/#delta-sum-crdt_delta_sum) — `numeric` sum (increments/decrements) + +These types leverage existing data types with a little bit of code to compute the delta. For example, `crdt_delta_counter` is a domain on a `bigint`. + +This approach is possible only for types for which the method for computing the delta is known, but the result is simple and cheap (both in terms of space and CPU) and has a couple of added benefits. For example, it can leverage operators/syntax for the underlying data type. + +The main disadvantage is that you can't reset this value reliably in an asynchronous and concurrent environment. + +!!! Note + Implementing more complicated operation-based types by creating custom data types is possible, storing the state and the last operation. (Every change is decoded and transferred, so multiple + operations aren't needed). But at that point, the main benefits (simplicity, reuse of existing data types) are lost without gaining any advantage compared to state-based types (for example, still no capability to reset) except for the space requirements. (A per-node state isn't needed.) + +## State-based CRDT types (CvCRDT) + +State-based types require a more complex internal state and so can't use the regular data types directly the way operation-based types do. + +Currently, four state-based CRDTs are implemented: + +- [`crdt_gcounter`](06_crdt-implemented/#grow-only-counter-crdt_gcounter) — `bigint` counter (increment-only) +- [`crdt_gsum`](06_crdt-implemented/#grow-only-sum-crdt_gsum) — `numeric` sum/counter (increment-only) +- [`crdt_pncounter`](06_crdt-implemented/#positive-negative-counter-crdt_pncounter) — `bigint` counter (increments/decrements) +- [`crdt_pnsum`](06_crdt-implemented/#positive-negative-sum-crdt_pnsum) — `numeric` sum/counter (increments/decrements) + +The internal state typically includes per-node information, increasing the on-disk size but allowing added benefits. The need to implement custom data types implies more code (in/out functions and operators). + +The advantage is the ability to reliably reset the values, a somewhat self-healing nature in the presence of lost changes (which doesn't happen in a cluster that operates properly), and the ability to receive changes from other than source nodes. + +Consider, for example, that a value is modified on node A, and the change gets replicated to B but not C due to network issue between A and C. If B modifies the value and this change gets replicated to C, it +includes even the original change from A. With operation-based CRDTs, node C doesn't receive the change until the A-C network connection starts working again. + +The main disadvantages of CvCRDTs are higher costs in terms of [disk space and CPU usage](03_crdt-disk-reqs/#state-based-crdt-disk-space-reqs). A bit of information for each node is needed, including nodes that were already removed from the cluster. The complex nature of the state (serialized into varlena types) means increased CPU use. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/crdt/03_crdt-disk-reqs.mdx b/product_docs/docs/pgd/5/consistency/crdt/03_crdt-disk-reqs.mdx new file mode 100644 index 00000000000..af4798dba82 --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/crdt/03_crdt-disk-reqs.mdx @@ -0,0 +1,41 @@ +--- +navTitle: Disk-space requirements +title: CRDT Disk-space requirements +--- + +An important consideration is the overhead associated with CRDT types, particularly the on-disk size. + +## Operation-based CRDT disk-space reqs + +For [operation-based types](02_state-op-crdts/#operation-based-crdt-types-cmcrdt), this is trivial because the types are merely domains on top of other types. They have the same disk space requirements no matter how many nodes are there: + +- `crdt_delta_counter` — Same as `bigint` (8 bytes) +- `crdt_delta_sum` — Same as `numeric` (variable, depending on precision + and scale) + +There's no dependency on the number of nodes because operation-based CRDT types don't store any per-node information. + +## State-based CRDT disk-space reqs + +For [state-based types](02_state-op-crdts/#state-based-crdt-types-cvcrdt), the situation is more complicated. All the types are variable length (stored essentially as a `bytea` column) and consist of a header and a certain amount of per-node information for each node that modified the value. + +For the `bigint` variants, formulas computing approximate size are: + +- `crdt_gcounter` — `32B (header) + N * 12B (per-node)` +- `crdt_pncounter` -—`48B (header) + N * 20B (per-node)` + + `N` denotes the number of nodes that modified this value. + +For the `numeric` variants, there's no exact formula because both the header and per-node parts include `numeric` variable-length values. To give you an idea of how many such values you need to keep: + +- `crdt_gsum` + - fixed: `20B (header) + N * 4B (per-node)` + - variable: `(2 + N)` `numeric` values +- `crdt_pnsum` + - fixed: `20B (header) + N * 4B (per-node)` + - variable: `(4 + 2 * N)` `numeric` values + +!!! Note + It doesn't matter how many nodes are in the cluster if the values are never updated on multiple nodes. It also doesn't matter whether the updates were concurrent (causing a conflict). + + In addition, it doesn't matter how many of those nodes were already removed from the cluster. There's no way to compact the state yet. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/crdt/04_crdt-vs-conflict.mdx b/product_docs/docs/pgd/5/consistency/crdt/04_crdt-vs-conflict.mdx new file mode 100644 index 00000000000..62e31b619c7 --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/crdt/04_crdt-vs-conflict.mdx @@ -0,0 +1,28 @@ +--- +navTitle: CRDTs vs conflict handling/reporting +title: CRDTs vs conflict handling/reporting +--- + +## CRDT types versus conflicts handling + +As tables can contain both CRDT and non-CRDT columns (most columns are expected to be non-CRDT), you need to do both the regular conflict resolution and CRDT merge. + +The conflict resolution happens first and is responsible for deciding the tuple to keep (applytuple) and the one to discard. The merge phase happens next, merging data for CRDT columns from the discarded +tuple into the applytuple. + +!!! Note + This handling makes CRDT types somewhat more expensive compared to plain conflict resolution because the merge needs to happen every time. This is the case even when the conflict resolution can use one of the fast paths (such as those modified in the current transaction). + +## CRDT types versus conflict reporting + +By default, detected conflicts are individually reported. Without CRDT types, this makes sense because the conflict resolution essentially throws away half of the available information (local or remote row, depending on configuration). This presents a data loss. + +CRDT types allow both parts of the information to be combined without throwing anything away, eliminating the data loss issue. This approach makes the conflict reporting unnecessary. + +For this reason, conflict reporting is skipped when the conflict can be fully resolved by CRDT merge. Each column must meet at least one of these two conditions: + +- The values in local and remote tuple are the same (NULL or equal). +- It uses a CRDT data type and so can be merged. + +!!! Note + Conflict reporting is also skipped when there are no CRDT columns but all values in local/remote tuples are equal. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/consistency/crdt/05_crdt-reset.mdx b/product_docs/docs/pgd/5/consistency/crdt/05_crdt-reset.mdx new file mode 100644 index 00000000000..9156da909cb --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/crdt/05_crdt-reset.mdx @@ -0,0 +1,39 @@ +--- +navTitle: Resetting CRDT values +title: Resetting CRDT values +--- + +Resetting CRDT values is possible but requires special handling. The asynchronous nature of the +cluster means that different nodes might see the reset operation at different places in the change stream no matter how it's implemented. Different nodes might also initiate a reset concurrently, that is, before +observing the reset from the other node. + +In other words, to make the reset operation behave correctly, it needs to be commutative with respect to the regular operations. Many naive ways to reset a value that might work well on a single-node fail +for this reason. + +## Challenges when resetting CRDT values + +For example, the simplest approach to resetting a value might be: + +``` +UPDATE crdt_table SET cnt = 0 WHERE id = 1; +``` + +With state-based CRDTs this doesn't work. It throws away the state for the other nodes but only locally. It's added back by merge functions on remote nodes, causing diverging values and eventually receiving it +back due to changes on the other nodes. + +With operation-based CRDTs, this might seem to work because the update is interpreted as a subtraction of `-cnt`. But it works only in the absence of concurrent resets. Once two nodes attempt to do a reset at the same time, the delta is applied twice, getting a negative value (which isn't expected from a reset). + +It might also seem that you can use `DELETE + INSERT` as a reset, but this approach has a couple of weaknesses, too. If the row is reinserted with the same key, it's not guaranteed that all nodes see it at the same position in the stream of operations with respect to changes from other nodes. PGD specifically discourages reusing the same primary key value since it can lead to data anomalies in concurrent cases. + +## How to reliably handle resetting CRDT values + +State-based CRDT types can reliably handle resets using a special `!` operator like this: + +``` +UPDATE tab SET counter = !counter WHERE ...; +``` + +"Reliably" means the values don't have the two issues of multiple concurrent resets and divergence. + +Operation-based CRDT types can be reset reliably only using [Eager Replication](../eager), since this avoids multiple concurrent resets. You can also use Eager Replication to set either kind of CRDT to a specific +value. diff --git a/product_docs/docs/pgd/5/consistency/crdt/06_crdt-implemented.mdx b/product_docs/docs/pgd/5/consistency/crdt/06_crdt-implemented.mdx new file mode 100644 index 00000000000..13d0a7860ba --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/crdt/06_crdt-implemented.mdx @@ -0,0 +1,288 @@ +--- +navTitle: Implemented CRDTs +title: Implemented CRDTs +deepToC: true +--- + +Currently, six CRDT data types are implemented: + +- Grow-only counter and sum +- Positive-negative counter and sum +- Delta counter and sum + +The counters and sums behave mostly the same, except that the counter types are integer based (`bigint`), while the sum types are decimal-based (`numeric`). + +You can list the currently implemented CRDT data types with the following query: + +```sql +SELECT n.nspname, t.typname +FROM bdr.crdt_handlers c +JOIN (pg_type t JOIN pg_namespace n ON t.typnamespace = n.oid) + ON t.oid = c.crdt_type_id; +``` + +## Grow-only counter (`crdt_gcounter`) + +- Supports only increments with nonnegative values (`value + int` and `counter + bigint` operators). + +- You can obtain the current value of the counter either using `#` operator or by casting it to `bigint`. + +- Isn't compatible with simple assignments like `counter = value` (which is common pattern when the new value is computed somewhere in the application). + +- Allows simple reset of the counter using the `!` operator ( `counter = !counter` ). + +- You can inspect the internal state using `crdt_gcounter_to_text`. + +```sql +CREATE TABLE crdt_test ( + id INT PRIMARY KEY, + cnt bdr.crdt_gcounter NOT NULL DEFAULT 0 +); + +INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 +INSERT INTO crdt_test VALUES (2, 129824); -- initialized to 129824 +INSERT INTO crdt_test VALUES (3, -4531); -- error: negative value + +-- enable CLCD on the table +ALTER TABLE crdt_test REPLICA IDENTITY FULL; +SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); + +-- increment counters +UPDATE crdt_test SET cnt = cnt + 1 WHERE id = 1; +UPDATE crdt_test SET cnt = cnt + 120 WHERE id = 2; + +-- error: minus operator not defined +UPDATE crdt_test SET cnt = cnt - 1 WHERE id = 1; + +-- error: increment has to be non-negative +UPDATE crdt_test SET cnt = cnt + (-1) WHERE id = 1; + +-- reset counter +UPDATE crdt_test SET cnt = !cnt WHERE id = 1; + +-- get current counter value +SELECT id, cnt::bigint, cnt FROM crdt_test; + +-- show internal structure of counters +SELECT id, bdr.crdt_gcounter_to_text(cnt) FROM crdt_test; +``` + +## Grow-only sum (`crdt_gsum`) + +- Supports only increments with nonnegative values (`sum + numeric`). + +- You can obtain the current value of the sum either by using the `#` operator or by casting it to `numeric`. + +- Isn't compatible with simple assignments like `sum = value`, which is the common pattern when the new value is computed somewhere in the application. + +- Allows simple reset of the sum using the `!` operator (`sum = !sum`). + +- Can inspect internal state using `crdt_gsum_to_text`. + +```sql +CREATE TABLE crdt_test ( + id INT PRIMARY KEY, + gsum bdr.crdt_gsum NOT NULL DEFAULT 0.0 +); + +INSERT INTO crdt_test VALUES (1, 0.0); -- initialized to 0 +INSERT INTO crdt_test VALUES (2, 1298.24); -- initialized to 1298.24 +INSERT INTO crdt_test VALUES (3, -45.31); -- error: negative value + +-- enable CLCD on the table +ALTER TABLE crdt_test REPLICA IDENTITY FULL; +SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); + +-- increment sum +UPDATE crdt_test SET gsum = gsum + 11.5 WHERE id = 1; +UPDATE crdt_test SET gsum = gsum + 120.33 WHERE id = 2; + +-- error: minus operator not defined +UPDATE crdt_test SET gsum = gsum - 15.2 WHERE id = 1; + +-- error: increment has to be non-negative +UPDATE crdt_test SET gsum = gsum + (-1.56) WHERE id = 1; + +-- reset sum +UPDATE crdt_test SET gsum = !gsum WHERE id = 1; + +-- get current sum value +SELECT id, gsum::numeric, gsum FROM crdt_test; + +-- show internal structure of sums +SELECT id, bdr.crdt_gsum_to_text(gsum) FROM crdt_test; +``` + +## Positive-negative counter (`crdt_pncounter`) + +- Supports increments with both positive and negative values (through `counter + int` and `counter + bigint` operators). + +- You can obtain the current value of the counter either by using the `#` operator or by casting to `bigint`. + +- Isn't compatible with simple assignments like `counter = value`, which is the common pattern when the new value is computed somewhere in the application. + +- Allows simple reset of the counter using the `!` operator (`counter = !counter`). + +- You can inspect the internal state using `crdt_pncounter_to_text`. + +```sql +CREATE TABLE crdt_test ( + id INT PRIMARY KEY, + cnt bdr.crdt_pncounter NOT NULL DEFAULT 0 +); + +INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 +INSERT INTO crdt_test VALUES (2, 129824); -- initialized to 129824 +INSERT INTO crdt_test VALUES (3, -4531); -- initialized to -4531 + +-- enable CLCD on the table +ALTER TABLE crdt_test REPLICA IDENTITY FULL; +SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); + +-- increment counters +UPDATE crdt_test SET cnt = cnt + 1 WHERE id = 1; +UPDATE crdt_test SET cnt = cnt + 120 WHERE id = 2; +UPDATE crdt_test SET cnt = cnt + (-244) WHERE id = 3; + +-- decrement counters +UPDATE crdt_test SET cnt = cnt - 73 WHERE id = 1; +UPDATE crdt_test SET cnt = cnt - 19283 WHERE id = 2; +UPDATE crdt_test SET cnt = cnt - (-12) WHERE id = 3; + +-- get current counter value +SELECT id, cnt::bigint, cnt FROM crdt_test; + +-- show internal structure of counters +SELECT id, bdr.crdt_pncounter_to_text(cnt) FROM crdt_test; + +-- reset counter +UPDATE crdt_test SET cnt = !cnt WHERE id = 1; + +-- get current counter value after the reset +SELECT id, cnt::bigint, cnt FROM crdt_test; +``` + +## Positive-negative sum (`crdt_pnsum`) + +- Supports increments with both positive and negative values through `sum + numeric`. + +- You can obtain the current value of the sum either by using then `#` operator or by casting to `numeric`. + +- Isn't compatible with simple assignments like `sum = value`, which is the common pattern when the new value is computed somewhere in the application. + +- Allows simple reset of the sum using the `!` operator (`sum = !sum`). + +- You can inspect the internal state using `crdt_pnsum_to_text`. + +```sql +CREATE TABLE crdt_test ( + id INT PRIMARY KEY, + pnsum bdr.crdt_pnsum NOT NULL DEFAULT 0 +); + +INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 +INSERT INTO crdt_test VALUES (2, 1298.24); -- initialized to 1298.24 +INSERT INTO crdt_test VALUES (3, -45.31); -- initialized to -45.31 + +-- enable CLCD on the table +ALTER TABLE crdt_test REPLICA IDENTITY FULL; +SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); + +-- increment sums +UPDATE crdt_test SET pnsum = pnsum + 1.44 WHERE id = 1; +UPDATE crdt_test SET pnsum = pnsum + 12.20 WHERE id = 2; +UPDATE crdt_test SET pnsum = pnsum + (-24.34) WHERE id = 3; + +-- decrement sums +UPDATE crdt_test SET pnsum = pnsum - 7.3 WHERE id = 1; +UPDATE crdt_test SET pnsum = pnsum - 192.83 WHERE id = 2; +UPDATE crdt_test SET pnsum = pnsum - (-12.22) WHERE id = 3; + +-- get current sum value +SELECT id, pnsum::numeric, pnsum FROM crdt_test; + +-- show internal structure of sum +SELECT id, bdr.crdt_pnsum_to_text(pnsum) FROM crdt_test; + +-- reset sum +UPDATE crdt_test SET pnsum = !pnsum WHERE id = 1; + +-- get current sum value after the reset +SELECT id, pnsum::numeric, pnsum FROM crdt_test; +``` + +## Delta counter (`crdt_delta_counter`) + +- Is defined a `bigint` domain, so works exactly like a `bigint` column. + +- Supports increments with both positive and negative values. + +- Is compatible with simple assignments like `counter = value`, which is common when the new value is computed somewhere in the application. + +- There's no simple way to reset the value reliably. + +```sql +CREATE TABLE crdt_test ( + id INT PRIMARY KEY, + cnt bdr.crdt_delta_counter NOT NULL DEFAULT 0 +); + +INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 +INSERT INTO crdt_test VALUES (2, 129824); -- initialized to 129824 +INSERT INTO crdt_test VALUES (3, -4531); -- initialized to -4531 + +-- enable CLCD on the table +ALTER TABLE crdt_test REPLICA IDENTITY FULL; +SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); + +-- increment counters +UPDATE crdt_test SET cnt = cnt + 1 WHERE id = 1; +UPDATE crdt_test SET cnt = cnt + 120 WHERE id = 2; +UPDATE crdt_test SET cnt = cnt + (-244) WHERE id = 3; + +-- decrement counters +UPDATE crdt_test SET cnt = cnt - 73 WHERE id = 1; +UPDATE crdt_test SET cnt = cnt - 19283 WHERE id = 2; +UPDATE crdt_test SET cnt = cnt - (-12) WHERE id = 3; + +-- get current counter value +SELECT id, cnt FROM crdt_test; +``` + +## Delta sum (`crdt_delta_sum`) + +- Is defined as a `numeric` domain so works exactly like a `numeric` column. + +- Supports increments with both positive and negative values. + +- Is compatible with simple assignments like `sum = value`, which is common when the new value is computed somewhere in the application. + +- There's no simple way to reset the value reliably. + +```sql +CREATE TABLE crdt_test ( + id INT PRIMARY KEY, + dsum bdr.crdt_delta_sum NOT NULL DEFAULT 0 +); + +INSERT INTO crdt_test VALUES (1, 0); -- initialized to 0 +INSERT INTO crdt_test VALUES (2, 129.824); -- initialized to 129824 +INSERT INTO crdt_test VALUES (3, -4.531); -- initialized to -4531 + +-- enable CLCD on the table +ALTER TABLE crdt_test REPLICA IDENTITY FULL; +SELECT bdr.alter_table_conflict_detection('crdt_test', 'column_modify_timestamp', 'cts'); + +-- increment counters +UPDATE crdt_test SET dsum = dsum + 1.32 WHERE id = 1; +UPDATE crdt_test SET dsum = dsum + 12.01 WHERE id = 2; +UPDATE crdt_test SET dsum = dsum + (-2.4) WHERE id = 3; + +-- decrement counters +UPDATE crdt_test SET dsum = dsum - 7.33 WHERE id = 1; +UPDATE crdt_test SET dsum = dsum - 19.83 WHERE id = 2; +UPDATE crdt_test SET dsum = dsum - (-1.2) WHERE id = 3; + +-- get current counter value +SELECT id, cnt FROM crdt_test; +``` diff --git a/product_docs/docs/pgd/5/consistency/crdt/index.mdx b/product_docs/docs/pgd/5/consistency/crdt/index.mdx new file mode 100644 index 00000000000..70292e7018c --- /dev/null +++ b/product_docs/docs/pgd/5/consistency/crdt/index.mdx @@ -0,0 +1,22 @@ +--- +navTitle: CRDTs +title: Conflict-free replicated data types +redirects: + - /pgd/latest/bdr/crdt/ +--- + +Conflict-free replicated data types (CRDTs) support merging values from concurrently modified rows instead of discarding one of the rows as the traditional resolution does. + +- [Overview](00_crdt_overview) provides an introduction to CRDTs, including how to use CRDTs in tables, configuration options, and examples of CRDTs. + +- [Using CRDTs](01_crdt_usage) investigates how to use CRDTs in tables, reviews some configuration options, and reviews some examples of CRDTs and how they work. + +- [Operation-based and state-based CRDTs](02_state-op-crdts) reviews the differences between operation-based and state-based CRDTs. + +- [Disk-space requirements](03_crdt-disk-reqs) covers disk-size considerations for CRDTs, especially state-based CRDTs. + +- [CRDTs vs conflict handling/reporting](04_crdt-vs-conflict) explains how conflict handling and reporting works with CRDTs. + +- [Resetting CRDT values](05_crdt-reset) discusses the challenges of resetting CRDT values and provides some guidance on doing so successfully. + +- [Implemented CRDTs](06_crdt-implemented) details each of the 6 available CRDTs available in PGD, with implementation examples. \ No newline at end of file diff --git a/product_docs/docs/pgd/5/postgres-configuration.mdx b/product_docs/docs/pgd/5/postgres-configuration.mdx index ebda1bc0641..38de2bd3905 100644 --- a/product_docs/docs/pgd/5/postgres-configuration.mdx +++ b/product_docs/docs/pgd/5/postgres-configuration.mdx @@ -36,7 +36,7 @@ which vary according to the size and scale of the cluster: - One per database on that instance - Four per PGD-enabled database - One per peer node in the PGD group - - One for each writer-enabled per peer node in the PGD group + - The number of peer nodes times the (number of writers (bdr.num_writers) plus one) You might need more worker processes temporarily when a node is being removed from a PGD group. - `max_wal_senders` — Two needed for every peer node. @@ -51,10 +51,10 @@ N slots and WAL senders. During synchronization, PGD temporarily uses another N-1 slots and WAL senders, so be careful to set the parameters high enough for this occasional peak demand. -With parallel apply turned on, the number of slots must be increased to -N slots from the formula \* writers. This is because the `max_replication_slots` +With Parallel Apply turned on, the number of slots must be increased to +N slots from the formula \* writers. This is because `max_replication_slots` also sets the maximum number of replication origins, and some of the functionality -of parallel apply uses extra origin per writer. +of Parallel Apply uses an extra origin per writer. When the [decoding worker](node_management/decoding_worker/) is enabled, this process requires one extra replication slot per PGD group. @@ -79,4 +79,4 @@ timestamp to use. ### `max_prepared_transactions` -Needs to be set high enough to cope with the maximum number of concurrent prepared transactions across the cluster due to explicit two-phase commits, CAMO, or Eager transactions. Exceeding the limit prevents a node from running a local two-phase commit or CAMO transaction and prevents all Eager transactions on the cluster. This parameter can only be set at Postgres server start. +Needs to be set high enough to cope with the maximum number of concurrent prepared transactions across the cluster due to explicit two-phase commits, CAMO, or Eager transactions. Exceeding the limit prevents a node from running a local two-phase commit or CAMO transaction and prevents all Eager transactions on the cluster. This parameter can be set only at Postgres server start. diff --git a/product_docs/docs/pgd/5/reference/repsets-ddl-filtering.mdx b/product_docs/docs/pgd/5/reference/repsets-ddl-filtering.mdx index 968f06956ba..6437e1137b6 100644 --- a/product_docs/docs/pgd/5/reference/repsets-ddl-filtering.mdx +++ b/product_docs/docs/pgd/5/reference/repsets-ddl-filtering.mdx @@ -3,16 +3,16 @@ title: DDL replication filtering indexdepth: 3 --- -See also [DDL replication filtering](../repsets#ddl-replication-filtering) +See also [DDL replication filtering](../repsets#ddl-replication-filtering). ### `bdr.replication_set_add_ddl_filter` This function adds a DDL filter to a replication set. Any DDL that matches the given filter is replicated to any node that's -subscribed to that set. This also affects replication of PGD admin functions. +subscribed to that set. This function also affects replication of PGD admin functions. -This doesn't prevent execution of DDL on any node. It only +This function doesn't prevent execution of DDL on any node. It only alters whether DDL is replicated to other nodes. Suppose two nodes have a replication filter between them that excludes all index commands. Index commands can still be executed freely by directly connecting to @@ -20,7 +20,7 @@ each node and executing the desired DDL on that node. The DDL filter can specify a `command_tag` and `role_name` to allow replication of only some DDL statements. The `command_tag` is the same as those -used by [EVENT TRIGGERs](https://www.postgresql.org/docs/current/static/event-trigger-matrix.html) +used by [event triggers](https://www.postgresql.org/docs/current/static/event-trigger-matrix.html) for regular PostgreSQL commands. A typical example might be to create a filter that prevents additional index commands on a logical standby from being replicated to all other nodes. @@ -30,7 +30,7 @@ qualified function name. For example, `bdr.replication_set_add_table` is the command tag for the function of the same name. In this case, this tag allows all PGD functions to be filtered using `bdr.*`. -The `role_name` is used for matching against the current role that is executing +The `role_name` is used for matching against the current role that's executing the command. Both `command_tag` and `role_name` are evaluated as regular expressions, which are case sensitive. @@ -48,29 +48,29 @@ bdr.replication_set_add_ddl_filter(set_name name, #### Parameters -- `set_name` — name of the replication set; if NULL then the PGD - group default replication set is used -- `ddl_filter_name` — name of the DDL filter; this must be unique across the - whole PGD group -- `command_tag` — regular expression for matching command tags; NULL means - match everything -- `role_name` — regular expression for matching role name; NULL means - match all roles -- `base_relation_name` — reserved for future use, must be NULL -- `query_match` — regular expression for matching the query; NULL means - match all queries -- `exclusive` — if true, other matched filters are not taken into - consideration (that is, only the exclusive filter is applied), when multiple - exclusive filters match, we throw an error. This is useful for routing - specific commands to specific replication set, while keeping the default +- `set_name` — Name of the replication set. If NULL then the PGD + group default replication set is used. +- `ddl_filter_name` — Name of the DDL filter. This name must be unique across the + whole PGD group. +- `command_tag` — Regular expression for matching command tags. NULL means + match everything. +- `role_name` — Regular expression for matching role name. NULL means + match all roles. +- `base_relation_name` — Reserved for future use. Must be NULL. +- `query_match` — Regular expression for matching the query. NULL means + match all queries. +- `exclusive` — If true, other matched filters aren't taken into + consideration (that is, only the exclusive filter is applied). When multiple + exclusive filters match, an error is thrown. This parameter is useful for routing + specific commands to a specific replication set, while keeping the default replication through the main replication set. #### Notes This function uses the same replication mechanism as `DDL` statements. This means -that the replication is affected by the [ddl filters](../repsets#ddl-replication-filtering) -configuration. This also means that replication of changes to ddl -filter configuration is affected by the existing ddl filter configuration. +that the replication is affected by the [DDL filters](../repsets#ddl-replication-filtering) +configuration. This also means that replication of changes to DDL +filter configuration is affected by the existing DDL filter configuration. The function takes a `DDL` global lock. @@ -88,7 +88,7 @@ To include only PGD admin functions, define a filter like this: SELECT bdr.replication_set_add_ddl_filter('mygroup', 'mygroup_admin', $$bdr\..*$$); ``` -To exclude everything apart from index DDL: +To exclude everything except for index DDL: ```sql SELECT bdr.replication_set_add_ddl_filter('mygroup', 'index_filter', @@ -96,7 +96,7 @@ SELECT bdr.replication_set_add_ddl_filter('mygroup', 'index_filter', ``` To include all operations on tables and indexes but exclude all others, add -two filters: one for tables, one for indexes. This shows that +two filters: one for tables and one for indexes. This example shows that multiple filters provide the union of all allowed DDL commands: ```sql @@ -109,7 +109,7 @@ SELECT bdr.replication_set_add_ddl_filter('bdrgroup','table_filter', '^((?!TABLE This function removes the DDL filter from a replication set. Replication of this command is affected by DDL replication configuration, -including DDL filtering settings themselves. +including the DDL filtering settings. #### Synopsis @@ -128,7 +128,7 @@ bdr.replication_set_remove_ddl_filter(set_name name, This function uses the same replication mechanism as `DDL` statements. This means that the replication is affected by the -[ddl filters](../repsets#ddl-replication-filtering) configuration. +[DDL filters](../repsets#ddl-replication-filtering) configuration. This also means that replication of changes to the DDL filter configuration is affected by the existing DDL filter configuration. diff --git a/product_docs/docs/pgd/5/reference/repsets-management.mdx b/product_docs/docs/pgd/5/reference/repsets-management.mdx index 9bfc3a7b070..e0a67b55812 100644 --- a/product_docs/docs/pgd/5/reference/repsets-management.mdx +++ b/product_docs/docs/pgd/5/reference/repsets-management.mdx @@ -13,7 +13,7 @@ apply to them, if that's currently active. See [DDL replication](../ddl). This function creates a replication set. -Replication of this command is affected by DDL replication configuration +Replication of this command is affected by DDL replication configuration, including DDL filtering settings. ### Synopsis @@ -49,7 +49,7 @@ bdr.create_replication_set(set_name name, ### Notes By default, new replication sets don't replicate DDL or PGD administration -function calls. See [ddl filters](../repsets#ddl-replication-filtering) for how to set +function calls. See [DDL filters](../repsets#ddl-replication-filtering) for how to set up DDL replication for replication sets. A preexisting DDL filter is set up for the default group replication set that replicates all DDL and admin function calls. It's created when the group is created but can be dropped @@ -57,7 +57,7 @@ in case you don't want the PGD group default replication set to replicate DDL or the PGD administration function calls. This function uses the same replication mechanism as `DDL` statements. This means -that the replication is affected by the [ddl filters](../repsets#ddl-replication-filtering) +that the replication is affected by the [DDL filters](../repsets#ddl-replication-filtering) configuration. The function takes a `DDL` global lock. @@ -103,7 +103,7 @@ before. ### Notes This function uses the same replication mechanism as `DDL` statements. This means -the replication is affected by the [ddl filters](../repsets#ddl-replication-filtering) +the replication is affected by the [DDL filters](../repsets#ddl-replication-filtering) configuration. The function takes a `DDL` global lock. @@ -143,7 +143,7 @@ transaction. !!! Warning Don't drop a replication set that's being used by at least - another node, because doing so stops replication on that + another node because doing so stops replication on that node. If that happens, unsubscribe the affected node from that replication set. For the same reason, don't drop a replication set with @@ -154,7 +154,7 @@ transaction. local to each node, so that you can configure it on a node before it joins the group. -You can manage replication set subscription for a node using `alter_node_replication_sets`. +You can manage replication set subscriptions for a node using `alter_node_replication_sets`. ## `bdr.alter_node_replication_sets` @@ -169,7 +169,7 @@ bdr.alter_node_replication_sets(node_name name, ### Parameters -- `node_name` — The node to modify. Currently has to be local node. +- `node_name` — The node to modify. Currently must be a local node. - `set_names` — Array of replication sets to replicate to the specified node. An empty array results in the use of the group default replication set. @@ -181,17 +181,16 @@ The replication sets listed aren't checked for existence, since this function is designed to execute before the node joins. Be careful to specify replication set names correctly to avoid errors. -This allows for calling the function not only on the node that's part of the +This behavior allows for calling the function not only on the node that's part of the PGD group but also on a node that hasn't joined any group yet. This approach limits the data synchronized during the join. However, the schema is always fully synchronized without regard to the replication sets setting. All tables are copied across, not just the ones specified in the replication set. You can drop unwanted tables by referring to -the `bdr.tables` catalog table. These might be removed automatically in later -versions of PGD. This is currently true even if the [ddl filters](../repsets#ddl-replication-filtering) -configuration otherwise prevent replication of DDL. +the `bdr.tables` catalog table. (These might be removed automatically in later +versions of PGD.) This is currently true even if the [DDL filters](../repsets#ddl-replication-filtering) +configuration otherwise prevents replication of DDL. The replication sets that the node subscribes to after this call are published by the other nodes for actually replicating the changes from those nodes to the node where this function is executed. - diff --git a/product_docs/docs/pgd/5/reference/repsets-membership.mdx b/product_docs/docs/pgd/5/reference/repsets-membership.mdx index 373ae1d4be7..dedee1a0f60 100644 --- a/product_docs/docs/pgd/5/reference/repsets-membership.mdx +++ b/product_docs/docs/pgd/5/reference/repsets-membership.mdx @@ -8,10 +8,9 @@ indexdepth: 2 This function adds a table to a replication set. -This adds a table to a replication set and starts replicating changes -from this moment (or rather transaction commit). Any existing data the table +This function adds a table to a replication set and starts replicating changes +from the committing of the transaction that contains the call to the function. Any existing data the table might have on a node isn't synchronized. - Replication of this command is affected by DDL replication configuration, including DDL filtering settings. @@ -30,8 +29,8 @@ bdr.replication_set_add_table(relation regclass, - `set_name` — Name of the replication set. If NULL (the default), then the PGD group default replication set is used. - `columns` — Reserved for future use (currently does nothing and must be NULL). -- `row_filter` — SQL expression to be used for filtering the replicated rows. - If this expression isn't defined (that is, set to NULL, the default) then all rows are sent. +- `row_filter` — SQL expression to use for filtering the replicated rows. + If this expression isn't defined (that is, it's set to NULL, the default) then all rows are sent. The `row_filter` specifies an expression producing a Boolean result, with NULLs. Expressions evaluating to True or Unknown replicate the row. A False value @@ -53,11 +52,11 @@ You can replicate just some columns of a table. See ### Notes This function uses the same replication mechanism as `DDL` statements. This means -that the replication is affected by the [ddl filters](../repsets#ddl-replication-filtering) +that the replication is affected by the [DDL filters](../repsets#ddl-replication-filtering) configuration. -The function takes a `DML` global lock on the relation that's being -added to the replication set if the `row_filter` isn't NULL. Otherwise +If the `row_filter` isn't NULL, the function takes a `DML` global lock on the relation that's being +added to the replication set. Otherwise it takes just a `DDL` global lock. This function is transactional. You can roll back the effects with the @@ -87,7 +86,7 @@ bdr.replication_set_remove_table(relation regclass, ### Notes This function uses the same replication mechanism as `DDL` statements. This means -the replication is affected by the [ddl filters](../repsets#ddl-replication-filtering) +the replication is affected by the [DDL filters](../repsets#ddl-replication-filtering) configuration. The function takes a `DDL` global lock. @@ -95,4 +94,3 @@ The function takes a `DDL` global lock. This function is transactional. You can roll back the effects with the `ROLLBACK` of the transaction. The changes are visible to the current transaction. - diff --git a/product_docs/docs/pgd/5/reference/routing.mdx b/product_docs/docs/pgd/5/reference/routing.mdx index 76bdaaed46d..600be7e26fd 100644 --- a/product_docs/docs/pgd/5/reference/routing.mdx +++ b/product_docs/docs/pgd/5/reference/routing.mdx @@ -23,12 +23,12 @@ bdr.create_proxy(proxy_name text, node_group text, proxy_mode text); | `node_group` | text | | Name of the group to be used by the proxy. | | `proxy_mode` | text | `'default'` | Mode of the proxy. It can be `'default'` (listen_port connections follow write leader, no read_listen_port), `'read-only'` (no listen_port, read_listen_port connections follow read-only nodes), or `'any'` (listen_port connections follow write_leader, read_listen_port connections follow read-only nodes). Default is `'default'`. | -When proxy_mode is set to `'default'`, all read options in the proxy config will be set to NULL. When it's set to `'read-only'`, all write options in the proxy config will be set to NULL. When set to `'any'` all options will be set to their defaults. +When proxy_mode is set to `'default'`, all read options in the proxy config are set to NULL. When it's set to `'read-only'`, all write options in the proxy config are set to NULL. When set to `'any'` all options are set to their defaults. ### `bdr.alter_proxy_option` -Change a proxy configuration +Change a proxy configuration. #### Synopsis @@ -40,23 +40,23 @@ bdr.alter_proxy_option(proxy_name text, config_key text, config_value text); | Name | Type | Default | Description | |----------------|------|---------|-----------------------------------------------| -| `proxy_name` | text | | Name of the proxy to be changed. | -| `config_key` | text | | Key of the option in the proxy to be changed. | -| `config_value` | text | | New value to be set for the given key. | +| `proxy_name` | text | | Name of the proxy to change. | +| `config_key` | text | | Key of the option in the proxy to change. | +| `config_value` | text | | New value to set for the given key. | The table shows the proxy options (`config_key`) that can be changed using this function. | Option | Description | |-------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `listen_address` | Address for the proxy to listen on. Default is '{0.0.0.0}' | -| `listen_port` | Port for the proxy to listen on. Default is '6432' in 'default' or 'any' mode and '0' in 'read-only' mode which disables the write leader following port. | +| `listen_address` | Address for the proxy to listen on. Default is '{0.0.0.0}'. | +| `listen_port` | Port for the proxy to listen on. Default is '6432' in 'default' or 'any' mode and '0' in 'read-only' mode, which disables the write leader following port. | | `max_client_conn` | Maximum number of connections for the proxy to accept. Default is '32767'. | | `max_server_conn` | Maximum number of connections the proxy can make to the Postgres node. Default is '32767'. | | `server_conn_timeout` | Connection timeout for server connections. Default is '2' (seconds). | | `server_conn_keepalive` | Keepalive interval for server connections. Default is '10' (seconds). | | `consensus_grace_period` | Duration for which proxy continues to route even upon loss of a Raft leader. If set to 0s, proxy stops routing immediately. Default is generally '6' (seconds) for local proxies and '12' (seconds) for global proxies. These values will be overridden if `raft_response_timeout`, `raft_global_election_timeout`, or `raft_group_election_timeout` are changed from their defaults. | | `read_listen_address` | Address for the read-only proxy to listen on. Default is '{0.0.0.0}'. | -| `read_listen_port` | Port for the read-only proxy to listen on. Default is '6433' in 'read-only' or 'any' mode and '0' in 'default' mode which disables the read-only port . | +| `read_listen_port` | Port for the read-only proxy to listen on. Default is '6433' in 'read-only' or 'any' mode and '0' in 'default' mode, which disables the read-only port. | | `read_max_client_conn` | Maximum number of connections for the read-only proxy to accept. Default is '32767'. | | `read_max_server_conn` | Maximum number of connections the read-only proxy can make to the Postgres node. Default is '32767'. | | `read_server_conn_keepalive` | Keepalive interval for read-only server connections. Default is '10' (seconds). | @@ -79,7 +79,7 @@ bdr.drop_proxy(proxy_name text); | Name | Type | Default | Description | |--------------|------|---------|-----------------------------------------------| -| `proxy_name` | text | | Name of the proxy to be dropped. | +| `proxy_name` | text | | Name of the proxy to drop. | ### `bdr.routing_leadership_transfer` @@ -102,4 +102,3 @@ bdr.routing_leadership_transfer(node_group_name text, | `leader_name` | text | | Name of node that will become write leader. | | `transfer_method` | text | `'strict'` | Type of the transfer. It can be `'fast'` or the default, `'strict'`, which checks the maximum lag. | | `transfer_timeout` | interval | '10s' | Timeout of the leadership transfer. Default is 10 seconds. | - diff --git a/product_docs/docs/pgd/5/reference/sequences.mdx b/product_docs/docs/pgd/5/reference/sequences.mdx index a0e12b71ecf..5168a53447a 100644 --- a/product_docs/docs/pgd/5/reference/sequences.mdx +++ b/product_docs/docs/pgd/5/reference/sequences.mdx @@ -20,7 +20,7 @@ Once set, `seqkind` is visible only by way of the `bdr.sequences` view. In all other ways, the sequence appears as a normal sequence. PGD treats this function as `DDL`, so DDL replication and global locking applies, -if it's currently active. See [DDL Replication](../ddl). +if it's currently active. See [DDL replication](../ddl). #### Synopsis @@ -48,7 +48,7 @@ ALTER SEQUENCE seq_name START starting_value RESTART ``` This function uses the same replication mechanism as `DDL` statements. This means -that the replication is affected by the [ddl filters](../repsets#ddl-replication-filtering) +that the replication is affected by the [DDL filters](../repsets#ddl-replication-filtering) configuration. The function takes a global `DDL` lock. It also locks the sequence locally. @@ -57,14 +57,14 @@ This function is transactional. You can roll back the effects with the `ROLLBACK` of the transaction. The changes are visible to the current transaction. -Only the owner of the sequence can execute the `bdr.alter_sequence_set_kind` function +Only the owner of the sequence can execute the `bdr.alter_sequence_set_kind` function, unless `bdr.backwards_compatibility` is -set is set to 30618 or lower. +set to 30618 or lower. ### `bdr.extract_timestamp_from_snowflakeid` This function extracts the timestamp component of the `snowflakeid` sequence. -The return value is of type timestamptz. +The return value is of type `timestamptz`. #### Synopsis ```sql @@ -72,7 +72,7 @@ bdr.extract_timestamp_from_snowflakeid(snowflakeid bigint) ``` #### Parameters - - `snowflakeid` — Value of a snowflakeid sequence. + - `snowflakeid` — Value of a `snowflakeid` sequence. #### Notes @@ -88,7 +88,7 @@ bdr.extract_nodeid_from_snowflakeid(snowflakeid bigint) ``` #### Parameters - - `snowflakeid` — Value of a snowflakeid sequence. + - `snowflakeid` — Value of a `snowflakeid` sequence. #### Notes @@ -104,7 +104,7 @@ bdr.extract_localseqid_from_snowflakeid(snowflakeid bigint) ``` #### Parameters - - `snowflakeid` — Value of a snowflakeid sequence. + - `snowflakeid` — Value of a `snowflakeid` sequence. #### Notes @@ -112,13 +112,13 @@ This function executes only on the local node. ### `bdr.timestamp_to_snowflakeid` -This function converts a timestamp value to a dummy snowflakeid sequence value. +This function converts a timestamp value to a dummy `snowflakeid` sequence value. This is useful for doing indexed searches or comparisons of values in the -snowflakeid column and for a specific timestamp. +`snowflakeid` column and for a specific timestamp. For example, given a table `foo` with a column `id` that's using a `snowflakeid` -sequence, we can get the number of changes since yesterday midnight like this: +sequence, you can get the number of changes since yesterday midnight like this: ``` SELECT count(1) FROM foo WHERE id > bdr.timestamp_to_snowflakeid('yesterday') @@ -132,7 +132,7 @@ bdr.timestamp_to_snowflakeid(ts timestamptz) ``` #### Parameters - - `ts` — Timestamp to use for the snowflakeid sequence generation. + - `ts` — Timestamp to use for the `snowflakeid` sequence generation. #### Notes @@ -141,7 +141,7 @@ This function executes only on the local node. ### `bdr.extract_timestamp_from_timeshard` This function extracts the timestamp component of the `timeshard` sequence. -The return value is of type timestamptz. +The return value is of type `timestamptz`. #### Synopsis @@ -151,7 +151,7 @@ bdr.extract_timestamp_from_timeshard(timeshard_seq bigint) #### Parameters -- `timeshard_seq` — Value of a timeshard sequence. +- `timeshard_seq` — Value of a `timeshard` sequence. #### Notes @@ -169,7 +169,7 @@ bdr.extract_nodeid_from_timeshard(timeshard_seq bigint) #### Parameters -- `timeshard_seq` — Value of a timeshard sequence. +- `timeshard_seq` — Value of a `timeshard` sequence. #### Notes @@ -187,7 +187,7 @@ bdr.extract_localseqid_from_timeshard(timeshard_seq bigint) #### Parameters -- `timeshard_seq` — Value of a timeshard sequence. +- `timeshard_seq` — Value of a `timeshard` sequence. #### Notes @@ -195,13 +195,13 @@ This function executes only on the local node. ### `bdr.timestamp_to_timeshard` -This function converts a timestamp value to a dummy timeshard sequence value. +This function converts a timestamp value to a dummy `timeshard` sequence value. This is useful for doing indexed searches or comparisons of values in the -timeshard column and for a specific timestamp. +`timeshard` column and for a specific timestamp. For example, given a table `foo` with a column `id` that's using a `timeshard` -sequence, we can get the number of changes since yesterday midnight like this: +sequence, you can get the number of changes since yesterday midnight like this: ``` SELECT count(1) FROM foo WHERE id > bdr.timestamp_to_timeshard('yesterday') @@ -217,13 +217,13 @@ bdr.timestamp_to_timeshard(ts timestamptz) #### Parameters -- `ts` — Timestamp to use for the timeshard sequence generation. +- `ts` — Timestamp to use for the `timeshard` sequence generation. #### Notes This function executes only on the local node. -## KSUUID v2 Functions +## KSUUID v2 functions Functions for working with `KSUUID` v2 data, K-Sortable UUID data. See also [KSUUID in the sequences documentation](../sequences/#ksuuids). @@ -269,7 +269,7 @@ This function executes only on the local node. ### `bdr.extract_timestamp_from_ksuuid_v2` This function extracts the timestamp component of `KSUUID` v2. -The return value is of type timestamptz. +The return value is of type `timestamptz`. #### Synopsis @@ -329,7 +329,7 @@ This function executes only on the local node. ### `bdr.extract_timestamp_from_ksuuid` This function extracts the timestamp component of `KSUUID` v1 or `UUIDv1` values. -The return value is of type timestamptz. +The return value is of type `timestamptz`. #### Synopsis @@ -344,10 +344,3 @@ bdr.extract_timestamp_from_ksuuid(uuid) #### Notes This function executes on the local node. - - - - - - - diff --git a/product_docs/docs/pgd/5/reference/testingandtuning.mdx b/product_docs/docs/pgd/5/reference/testingandtuning.mdx index 7fe121b1886..1e43b60b817 100644 --- a/product_docs/docs/pgd/5/reference/testingandtuning.mdx +++ b/product_docs/docs/pgd/5/reference/testingandtuning.mdx @@ -80,7 +80,7 @@ The complete list of options (pgd_bench and pgbench) follow. #### Options to select what to run - `-b, --builtin=NAME[@W]` — Add built-in script NAME weighted at W. The default is 1. Use `-b list` to list available scripts. - `-f, --file=FILENAME[@W]` — Add script `FILENAME` weighted at W. The default is 1. -- `-N, --skip-some-updates` — Updates of pgbench_tellers and pgbench_branches. Same as `-b simple-update`. +- `-N, --skip-some-updates` — Updates of pgbench_tellers and pgbench_branches. Same as `-b simple-update`. - `-S, --select-only` — Perform SELECT-only transactions. Same as `-b select-only`. #### Benchmarking options @@ -107,7 +107,7 @@ The complete list of options (pgd_bench and pgbench) follow. - `--max-tries=NUM` — Max number of tries to run transaction. The default is `1`. - `--progress-timestamp` — Use Unix epoch timestamps for progress. - `--random-seed=SEED` — Set random seed (`time`, `rand`, `integer`). -- `--retry` — Retry transactions on failover, used with `-m`. +- `--retry` — Retry transactions on failover. Used with `-m`. - `--sampling-rate=NUM` — Fraction of transactions to log, for example, 0.01 for 1%. - `--show-script=NAME` — Show built-in script code, then exit. - `--verbose-errors` — Print messages of all errors. diff --git a/product_docs/docs/pgd/5/repsets.mdx b/product_docs/docs/pgd/5/repsets.mdx index bd266860a1d..afcf3cf7bab 100644 --- a/product_docs/docs/pgd/5/repsets.mdx +++ b/product_docs/docs/pgd/5/repsets.mdx @@ -1,19 +1,19 @@ --- title: Replication sets -description: Grouping tables to enable more complex replication topologies +description: Grouping tables to enable more complex replication topologies. redirects: - ../bdr/repsets --- A replication set is a group of tables that a PGD node can subscribe to. You can use replication sets to create more complex replication topologies -than regular symmetric multi-master where each node is an exact copy of the other +than regular symmetric multi-master topologies where each node is an exact copy of the other nodes. Every PGD group creates a replication set with the same name as the group. This replication set is the default replication set, which is used for all user tables and DDL replication. All nodes are subscribed to it. -In other words, by default all user tables are replicated between all nodes. +In other words, by default, all user tables are replicated between all nodes. ## Using replication sets @@ -75,12 +75,12 @@ logic doesn't have to be executed. ## Behavior with foreign keys -A foreign key constraint ensures that each row in the referencing table matches +A foreign-key constraint ensures that each row in the referencing table matches a row in the referenced table. Therefore, if the referencing table is a member of a replication set, the referenced table must also be a member of the same replication set. -The current version of PGD doesn't automatically check for or enforce this +The current version of PGD doesn't check for or enforce this condition. When adding a table to a replication set, the database administrator must make sure that all the tables referenced by foreign keys are also added. @@ -105,7 +105,7 @@ SELECT t1.relname, ); ``` -The output of this query looks like the following: +The output of this query looks like this: ```sql relname | nspname | conname | set_name @@ -140,7 +140,7 @@ handled by DDL replication set filters (see [DDL replication filtering](#ddl-rep The replication uses the table membership in replication sets with the node replication sets configuration to determine the actions to -replicate to which node. The decision is done using the union of all the +replicate and the node to replicate them to. The decision is done using the union of all the memberships and replication set options. Suppose that a table is a member of replication set A that replicates only INSERT actions and replication set B that replicates only UPDATE actions. Both INSERT and UPDATE actions are replicated if the @@ -233,13 +233,13 @@ PGD group replication set. This replication is achieved using a DDL filter with as the PGD group. This filter is added to the default PGD group replication set when the PGD group is created. -You can adjust this by changing the DDL replication filters for all existing +You can adjust this behavior by changing the DDL replication filters for all existing replication sets. These filters are independent of table membership in the replication sets. Just like data changes, each DDL statement is replicated only once, even if it's matched by multiple filters on multiple replication sets. -You can list existing DDL filters with the following query, which shows for each -filter the regular expression applied to the command tag and to the role name: +You can list existing DDL filters with the following query, which shows, for each +filter, the regular expression applied to the command tag and to the role name: ```sql SELECT * FROM bdr.ddl_replication; @@ -249,29 +249,29 @@ You can use [`bdr.replication_set_add_ddl_filter`](/pgd/latest/reference/repsets They're considered to be `DDL` and are therefore subject to DDL replication and global locking. -## Selective Replication Example +## Selective replication example -In this example, we configure EDB Postgres Distributed to selectively replicate tables to particular groups of nodes. +This example configures EDB Postgres Distributed to selectively replicate tables to particular groups of nodes. -### Cluster Configuration +### Cluster configuration -This example assumes we have a cluster of six data nodes, `data-a1` to `data-a3` and `data-b1` to `data-b3` in two locations, represented by them being members of the `region_a` and `region_b` groups. +This example assumes you have a cluster of six data nodes, `data-a1` to `data-a3` and `data-b1` to `data-b3` in two locations. The two locations they're members of are represented as `region_a` and `region_b` groups. -There's also, as we recommend, a witness node, named `witness` in `region-c`, but we won't be needing to mention that in this example. The cluster itself will be called `sere`. +There's also, as we recommend, a witness node named `witness` in `region-c` that isn't mentioned in this example. The cluster is called `sere`. This configuration looks like this: ![Multi-Region 3 Nodes Configuration](./planning/images/always-on-2x3-aa-updated.png) -This is the standard Always-on multiregion configuration as discussed in the [Choosing your architecture](planning/architectures) section. +This is the standard Always-on Multi-region configuration discussed in [Choosing your architecture](planning/architectures). -### Application Requirements +### Application requirements -For this example, we are going to work with an application which record the opinions of people who attended performances of musical works. There is a table for attendees, a table for the works and an opinion table which records which attendee saw which work, where, when and how they scored the work. Because of data regulation, the example assumes that opinion data must stay only in the region where the opinion was recorded. +This example works with an application that records the opinions of people who attended performances of musical works. There's a table for attendees, a table for the works, and an opinion table. The opinion table records each work each attendee saw, where and when they saw it, and how they scored the work. Because of data regulation, the example assumes that opinion data must stay only in the region where the opinion was recorded. ### Creating tables -The first step is to create appropriate tables. +The first step is to create appropriate tables: ```sql CREATE TABLE attendee ( @@ -297,9 +297,9 @@ CREATE TABLE opinion ( ### Viewing groups and replication sets -By default, EDB Postgres Distributed is configured to replicate each table in its entireity to each and every node. This is managed through Replication Sets. +By default, EDB Postgres Distributed is configured to replicate each table in its entirety to each and every node. This is managed through replication sets. -To view the initial configuration's default replication sets run: +To view the initial configuration's default replication sets, run: ```sql SELECT node_group_name, default_repset, parent_group_name @@ -314,13 +314,13 @@ FROM bdr.node_group_summary; region_c | region_c | sere ``` -In the output, you can see there is the top level group, `sere` with a default replication set named `sere`. Each of the three subgroups has a replication set with the same name as the subgroup; the `region_a` group has a `region_a` default replication set. +In the output, you can see there's the top-level group, `sere`, with a default replication set named `sere`. Each of the three subgroups has a replication set with the same name as the subgroup. The `region_a` group has a `region_a` default replication set. By default, all existing tables and new tables become members of the replication set of the top-level group. ### Adding tables to replication sets -The next step in this process is to add tables to the replication sets belonging to the groups that represent our regions. As previously mentioned, all new tables are automatically added to the `sere` replication set. We can confirm that by running: +The next step is to add tables to the replication sets belonging to the groups that represent the regions. As previously mentioned, all new tables are automatically added to the `sere` replication set. You can confirm that by running: ```sql SELECT relname, set_name FROM bdr.tables ORDER BY relname, set_name; @@ -334,20 +334,20 @@ SELECT relname, set_name FROM bdr.tables ORDER BY relname, set_name; (3 rows) ``` -We want the `opinion` table to be replicated only within `region_a`, and separately only within `region_b`. To do that, we add the table to the replica sets of each region. +You want the `opinion` table to be replicated only in `region_a` and, separately, only in `region_b`. To do that, you add the table to the replica sets of each region: ```sql SELECT bdr.replication_set_add_table('opinion', 'region_a'); SELECT bdr.replication_set_add_table('opinion', 'region_b'); ``` -But, we are not done, because `opinion` is still a member of the `sere` replication set. When a table is a member of multiple replication sets, it is replicated within each. This doesn't impact performance though as each row in only replicated once on each target node. We don't want `opinion` replicated across all nodes, so we need to remove it from the top-level group's replication set: +But you're not done, because `opinion` is still a member of the `sere` replication set. When a table is a member of multiple replication sets, it's replicated in each. This doesn't affect performance, though, as each row is replicated only once on each target node. You don't want `opinion` replicated across all nodes, so you need to remove it from the top-level group's replication set: ```sql SELECT bdr.replication_set_remove_table('opinion', 'sere'); ``` -We can now review these changes: +You can now review these changes: ```sql SELECT relname, set_name FROM bdr.tables ORDER BY relname, set_name; @@ -362,11 +362,11 @@ SELECT relname, set_name FROM bdr.tables ORDER BY relname, set_name; (4 rows) ``` -This should provide the selective replication we desired. The next step is to test it. +This process should provide the selective replication you wanted. To verify whether it did, use the next step to test it. -### Testing Selective Replication +### Testing selective replication -Let's create some test data, two works and an attendee. We'll connect directly to data-a1 to run this next code: +First create some test data: two works and an attendee. Connect directly to `data-a1` to run this next code: ```sql INSERT INTO work VALUES (1, 'Aida', 'Verdi'); @@ -374,7 +374,7 @@ INSERT INTO work VALUES (2, 'Lohengrin', 'Wagner'); INSERT INTO attendee (email) VALUES ('gv@example.com'); ``` -Now that there is some data in these tables, we can insert into the `opinion` table without violating foreign key constraints. +Now that there's some data in these tables, you can insert into the `opinion` table without violating foreign key constraints: ```sql INSERT INTO opinion (work_id, attendee_id, country, day, score) @@ -384,7 +384,7 @@ SELECT work.id, attendee.id, 'Italy', '1871-11-19', 3 AND attendee.email = 'gv@example.com'; ``` -Once inserted, we can validate the contents of the database on the same node: +Once you've done the insert, you can validate the contents of the database on the same node: ```sql SELECT a.email @@ -404,9 +404,9 @@ JOIN attendee a ON a.id = o.attendee_id; (1 row) ``` -If we now connect to nodes `data-a2` and `data-a3` and run the same query, we will get the same result. The data is being replicated in `region_a`. If we connect to `data-b1`, `data-b2` or `data-b3`, the query will return no rows. That's because, although the `attendee` and `work` tables are populated, there's no `opinion` row that could be selected. That, in turn, is because the replication of `opinion` on `region_a` only happens in that region. +If you now connect to nodes `data-a2` and `data-a3` and run the same query, you get the same result. The data is being replicated in `region_a`. If you connect to `data-b1`, `data-b2`, or `data-b3`, the query returns no rows. That's because, although the `attendee` and `work` tables are populated, there's no `opinion` row to select. That, in turn, is because the replication of `opinion` on `region_a` happens only in that region. -If we now connect to `data-b1` and insert an opinion on there like so: +Now connect to `data-b1` and insert an opinion there: ```sql INSERT INTO attendee (email) VALUES ('fb@example.com'); @@ -418,7 +418,7 @@ SELECT work.id, attendee.id, 'Germany', '1850-08-27', 9 AND attendee.email = 'fb@example.com'; ``` -This opinion will only be replicated on `region_b`. On `data-b1`, `data-b2` and `data-b3`, you can run: +This opinion is replicated only on `region_b`. On `data-b1`, `data-b2`, and `data-b3`, you can run: ```sql SELECT a.email @@ -438,9 +438,9 @@ JOIN attendee a ON a.id = o.attendee_id; (1 row) ``` -You will see the same result on each of the `region_b` data nodes. Run the query on `region_a` nodes and you will not see this particular entry. +You see the same result on each of the `region_b` data nodes. Run the query on `region_a` nodes, and you don't see this particular entry. -Finally, we should note that the `attendee` table is shared identically across all nodes; on any node, running the query: +Finally, notice that the `attendee` table is shared identically across all nodes. On any node, run the query: ```sql SELECT * FROM attendee; @@ -452,6 +452,3 @@ SELECT * FROM attendee; 904261037006536704 | fb@example.com (2 rows) ``` - - - diff --git a/product_docs/docs/pgd/5/scaling.mdx b/product_docs/docs/pgd/5/scaling.mdx index fae54ea26c3..2f064cee977 100644 --- a/product_docs/docs/pgd/5/scaling.mdx +++ b/product_docs/docs/pgd/5/scaling.mdx @@ -13,9 +13,9 @@ dropping partitions. You can create new partitions regularly and then drop them when the data retention period expires. -PGD management is primarily accomplished by functions that can be called by SQL. +You perform PGD management primarily by using functions that can be called by SQL. All functions in PGD are exposed in the `bdr` schema. Unless you put it into -your search_path, you need to schema-qualify the name of each function. +your search_path, you need to schema qualify the name of each function. ## Auto creation of partitions @@ -24,9 +24,9 @@ function to create or alter the definition of automatic range partitioning for a no definition exists, it's created. Otherwise, later executions will alter the definition. -PGD Autopartition in PGD 5.5 and later leverages underlying Postgres features that allow a +PGD AutoPartition in PGD 5.5 and later leverages underlying Postgres features that allow a partition to be attached or detached/dropped without locking the rest of the -table. Versions of PGD prior to 5.5 don't support this feature and will lock the tables. +table. Versions of PGD earlier than 5.5 don't support this feature and lock the tables. An error is raised if the table isn't RANGE partitioned or a multi-column partition key is used. @@ -39,8 +39,8 @@ Raft. You can change this behavior by setting `managed_locally` to `true`. In that case, all partitions are managed locally on each node. Managing partitions -locally is useful when the partitioned table isn't a replicated table, in which -case you might not need or want to have all partitions on all nodes. For +locally is useful when the partitioned table isn't a replicated table. In that +case, you might not need or want to have all partitions on all nodes. For example, the built-in [`bdr.conflict_history`](/pgd/latest/reference/catalogs-visible#bdrconflict_history) table isn't a replicated table. It's managed by AutoPartition locally. Each node @@ -87,8 +87,8 @@ bdr.autopartition('Orders', '1000000000', ## RANGE-partitioned tables -A new partition is added for every `partition_increment` range of values, with -lower and upper bound `partition_increment` apart. For tables with a partition +A new partition is added for every `partition_increment` range of values. +Lower and upper bound are `partition_increment` apart. For tables with a partition key of type `timestamp` or `date`, the `partition_increment` must be a valid constant of type `interval`. For example, specifying `1 Day` causes a new partition to be added each day, with partition bounds that are one day apart. @@ -114,11 +114,11 @@ value. The system always tries to have a certain minimum number of advance partitions. To decide whether to create new partitions, it uses the specified `partition_autocreate_expression`. This can be an expression that can be -evaluated by SQL, which is evaluated every time a check is performed. For -example, for a partitioned table on column type `date`, if +evaluated by SQL that's evaluated every time a check is performed. For +example, for a partitioned table on column type `date`, suppose `partition_autocreate_expression` is specified as -`DATE_TRUNC('day',CURRENT_DATE)`, `partition_increment` is specified as `1 Day` -and `minimum_advance_partitions` is specified as `2`, then new partitions are +`DATE_TRUNC('day',CURRENT_DATE)`, `partition_increment` is specified as `1 Day`, +and `minimum_advance_partitions` is specified as `2`. New partitions are then created until the upper bound of the last partition is less than `DATE_TRUNC('day', CURRENT_DATE) + '2 Days'::interval`. @@ -134,11 +134,11 @@ the `partcol` so that the query runs efficiently. If you don't specify the `integer`, `smallint`, or `bigint`, then the system sets it to `max(partcol)`. If the `data_retention_period` is set, partitions are dropped after this period. -Partitions are dropped at the same time as new partitions are added, to minimize -locking. If this value isn't set, you must drop the partitions manually. +To minimize locking, partitions are dropped at the same time as new partitions are added. +If you don't set this value, you must drop the partitions manually. -The `data_retention_period` parameter is supported only for timestamp (and -related) based partitions. The period is calculated by considering the upper +The `data_retention_period` parameter is supported only for timestamp-based (and +related) partitions. The period is calculated by considering the upper bound of the partition. The partition is dropped if the given period expires, relative to the upper bound. @@ -161,7 +161,7 @@ partitioned table name and a partition key column value and waits until the partition that holds that value is created. The function waits only for the partitions to be created locally. It doesn't -guarantee that the partitions also exists on the remote nodes. +guarantee that the partitions also exist on the remote nodes. To wait for the partition to be created on all PGD nodes, use the [`bdr.autopartition_wait_for_partitions_on_all_nodes()`](/pgd/latest/reference/autopartition#bdrautopartition_wait_for_partitions_on_all_nodes) @@ -172,9 +172,9 @@ waits until the partition is created everywhere. Use the [`bdr.autopartition_find_partition()`](/pgd/latest/reference/autopartition#bdrautopartition_find_partition) -function to find the partition for the given partition key value. If partition -to hold that value doesn't exist, then the function returns NULL. Otherwise Oid -of the partition is returned. +function to find the partition for the given partition key value. If a partition +to hold that value doesn't exist, then the function returns NULL. Otherwise it returns the Oid +of the partition. ## Enabling or disabling autopartitioning @@ -184,5 +184,3 @@ to enable autopartitioning on the given table. If autopartitioning is already enabled, then no action occurs. Similarly, use [`bdr.autopartition_disable()`](/pgd/latest/reference/autopartition#bdrautopartition_disable) to disable autopartitioning on the given table. - - diff --git a/product_docs/docs/pgd/5/sequences.mdx b/product_docs/docs/pgd/5/sequences.mdx index a72a9a77113..78e9dfd457d 100644 --- a/product_docs/docs/pgd/5/sequences.mdx +++ b/product_docs/docs/pgd/5/sequences.mdx @@ -55,8 +55,8 @@ useful property of recording the timestamp when the values were created. SnowflakeId sequences have the restriction that they work only for 64-bit BIGINT -datatypes and produce values up to 19 digits long, which might be too long for -use in some host language datatypes, such as Javascript Integer types. +datatypes and produce values up to 19 digits long. This might be too long for +use in some host language datatypes, such as JavaScript Number types. Globally allocated sequences allocate a local range of values that can be replenished as needed by inter-node consensus, making them suitable for either BIGINT or INTEGER sequences. @@ -66,20 +66,20 @@ function. This function takes a standard PostgreSQL sequence and marks it as a PGD global sequence. It can also convert the sequence back to the standard PostgreSQL sequence. -PGD also provides the configuration variable [`bdr.default_sequence_kind`](/pgd/latest/reference/pgd-settings/#bdrdefault_sequence_kind), which +PGD also provides the configuration variable [`bdr.default_sequence_kind`](/pgd/latest/reference/pgd-settings/#bdrdefault_sequence_kind). This variable determines the kind of sequence to create when the `CREATE SEQUENCE` command is executed or when a `serial`, `bigserial`, or `GENERATED BY DEFAULT AS IDENTITY` column is created. Valid settings are: -- `local`, meaning that newly created +- `local` — Newly created sequences are the standard PostgreSQL (local) sequences. -- `galloc`, which always creates globally allocated range sequences. -- `snowflakeid`, which creates global sequences for BIGINT sequences that +- `galloc` — Always creates globally allocated range sequences. +- `snowflakeid` — Creates global sequences for BIGINT sequences that consist of time, nodeid, and counter components. You can't use it with INTEGER sequences (so you can use it for `bigserial` but not for `serial`). -- `timeshard`, which is the older version of SnowflakeId sequence and is provided for +- `timeshard` — The older version of SnowflakeId sequence. Provided for backward compatibility only. The SnowflakeId is preferred. -- `distributed` (the default), which is a special value that you can use only for +- `distributed` (default) — A special value that you can use only for [`bdr.default_sequence_kind`](reference/pgd-settings/#global-sequence-parameters). It selects `snowflakeid` for `int8` sequences (that is, `bigserial`) and `galloc` sequence for `int4` (that is, `serial`) and `int2` sequences. @@ -94,7 +94,7 @@ The ids generated by SnowflakeId sequences are loosely time ordered so you can use them to get the approximate order of data insertion, like standard PostgreSQL sequences. Values generated within the same millisecond might be out of order, even on one node. The property of loose time ordering means they're suitable -for use as range partition keys. +for use as range-partition keys. SnowflakeId sequences work on one or more nodes and don't require any inter-node communication after the node-join process completes. So you can continue to @@ -158,16 +158,16 @@ Timeshard sequences are provided for backward compatibility with existing installations but aren't recommended for new application use. We recommend using the SnowflakeId sequence instead. -Timeshard is very similar to SnowflakeId but has different limits and fewer -protections and slower performance. +Timeshard is very similar to SnowflakeId but has different limits, fewer +protections, and slower performance. -The differences between timeshard and SnowflakeId are as following: +The differences between timeshard and SnowflakeId are as follows: - Timeshard can generate up to 16384 per millisecond (about 16 million per second), which is more than SnowflakeId. However, there's no protection against wraparound within a given millisecond. Schemas using the timeshard sequence must protect the use of the `UNIQUE` constraint when using timeshard values - for given column. + for a given column. - The timestamp component of timeshard sequence runs out of values in the year 2050 and, if used in combination with bigint, the values wrap to negative numbers in the year 2033. This means that sequences generated @@ -181,7 +181,7 @@ The differences between timeshard and SnowflakeId are as following: The globally allocated range (or `galloc`) sequences allocate ranges (chunks) of values to each node. When the local range is used up, a new range is -allocated globally by consensus amongst the other nodes. This uses the key +allocated globally by consensus among the other nodes. This behavior uses the key space efficiently but requires that the local node be connected to a majority of the nodes in the cluster for the sequence generator to progress when the currently assigned local range is used up. @@ -189,18 +189,18 @@ currently assigned local range is used up. Unlike SnowflakeId sequences, `galloc` sequences support all sequence data types provided by PostgreSQL: `smallint`, `integer`, and `bigint`. This means that you can use `galloc` sequences in environments where 64-bit sequences are -problematic. Examples include using integers in javascript, since that supports only +problematic. Examples include using integers in JavaScript, since that supports only 53-bit values, or when the sequence is displayed on output with limited space. -The range assigned by each voting is currently predetermined based on the +The range assigned by each voting node is currently predetermined based on the datatype the sequence is using: - smallint — 1 000 numbers - integer — 1 000 000 numbers - bigint — 1 000 000 000 numbers -Each node allocates two chunks of seq_chunk_size, one for the current use -plus a reserved chunk for future usage, so the values generated from any one +Each node allocates two chunks of seq_chunk_size—one for the current use +plus a reserved chunk for future use—so the values generated from any one node increase monotonically. However, viewed globally, the values generated aren't ordered at all. This might cause a loss of performance due to the effects on b-tree indexes and typically means that generated @@ -217,14 +217,14 @@ with `galloc` sequences. However, you need to set them before transforming the sequence to the `galloc` kind. The `INCREMENT BY` option also works correctly. However, you can't assign an increment value that's equal to or more than the above ranges assigned for each sequence datatype. -`setval()` doesn't reset the global state for `galloc` sequences; don't use it. +`setval()` doesn't reset the global state for `galloc` sequences. Don't use it. A few limitations apply to `galloc` sequences. PGD tracks `galloc` sequences in a special PGD catalog [bdr.sequence_alloc](/pgd/latest/reference/catalogs-visible/#bdrsequence_alloc). This catalog is required to track the currently allocated chunks for the `galloc` -sequences. The sequence name and namespace is stored in this catalog. Since the +sequences. The sequence name and namespace is stored in this catalog. The sequence chunk allocation is managed by Raft, whereas any changes to the -sequence name/namespace is managed by the replication stream, PGD currently doesn't +sequence name/namespace is managed by the replication stream. So PGD currently doesn't support renaming `galloc` sequences or moving them to another namespace or renaming the namespace that contains a `galloc` sequence. Be mindful of this limitation while designing application schema. @@ -236,14 +236,14 @@ prerequisites. ##### 1. Verify that sequence and column data type match -Check that the sequence's data type matches the data type of the column with +Check that the sequence's data type matches the datatype of the column with which it will be used. For example, you can create a `bigint` sequence and assign an `integer` column's default to the `nextval()` returned by that sequence. With galloc sequences, which for `bigint` are allocated in blocks of 1 000 000 000, this quickly results in the values returned by `nextval()` exceeding the `int4` range if more than two nodes are in use. -The following example shows what can happen: +This example shows what can happen: ```sql CREATE SEQUENCE int8_seq; @@ -281,7 +281,7 @@ SELECT * FROM seqtest; However, attempting the same operation on a third node fails with an `integer out of range` error, as the sequence generated the value -`4000000002`. +`4000000002`. !!! Tip You can retrieve the current data type of a sequence from the PostgreSQL @@ -296,10 +296,10 @@ When the sequence kind is altered to `galloc`, it's rewritten and restarts from the defined start value of the local sequence. If this happens on an existing sequence in a production database, you need to query the current value and then set the start value appropriately. To help with this use case, PGD -allows users to pass a starting value with the function [`bdr.alter_sequence_set_kind()`](reference/sequences/#bdralter_sequence_set_kind). +lets you pass a starting value with the function [`bdr.alter_sequence_set_kind()`](reference/sequences/#bdralter_sequence_set_kind). If you're already using offset and you have writes from multiple nodes, you need to check what's the greatest used value and restart the sequence to at least -the next value. +the next value: ```sql -- determine highest sequence value across all nodes @@ -321,7 +321,7 @@ ranges allocated around the whole cluster. In this example, the sequence starts at `333`, and the cluster has two nodes. The number of allocation is 4, which is 2 per node, -and the chunk size is 1000000 that's related to an integer sequence. +and the chunk size is 1000000, which is related to an integer sequence. ```sql SELECT * FROM bdr.sequence_alloc @@ -387,9 +387,9 @@ sequences that can be used with PGD. For example: - Local sequences with a different offset per node (i.e., manual) - An externally coordinated natural key -PGD applications can't use other methods safely: -counter-table-based approaches relying on `SELECT ... FOR UPDATE`, `UPDATE ... RETURNING ...` -or similar for sequence generation doesn't work correctly in PGD because PGD +PGD applications can't use other methods safely. +Counter-table-based approaches relying on `SELECT ... FOR UPDATE`, `UPDATE ... RETURNING ...` +or similar for sequence generation don't work correctly in PGD because PGD doesn't take row locks between nodes. The same values are generated on more than one node. For the same reason, the usual strategies for "gapless" sequence generation don't work with PGD. In most cases, the application @@ -418,14 +418,14 @@ Also, not all applications cope well with `UUID` keys. ### KSUUIDs -PGD provides functions for working with a K-Sortable variant of `UUID` data, -known as KSUUID, which generates values that can be stored using the PostgreSQL +PGD provides functions for working with a K-sortable variant of `UUID` data. +Known as KSUUID, it generates values that can be stored using the PostgreSQL standard `UUID` data type. A `KSUUID` value is similar to `UUIDv1` in that it stores both timestamp and random data, following the `UUID` standard. -The difference is that `KSUUID` is K-Sortable, meaning that it's weakly +The difference is that `KSUUID` is K-sortable, meaning that it's weakly sortable by timestamp. This makes it more useful as a database key, as it -produces more compact `btree` indexes, which improves -the effectiveness of search, and allows natural time-sorting of result data. +produces more compact `btree` indexes. This behavior improves +the effectiveness of search and allows natural time-sorting of result data. Unlike `UUIDv1`, `KSUUID` values don't include the MAC of the computer on which they were generated, so there are no security concerns from using them. @@ -495,7 +495,7 @@ per-node offsets on such step/offset sequences. #### Composite keys A variant on step/offset sequences is to use a composite key composed of -`PRIMARY KEY (node_number, generated_value)`, where the +`PRIMARY KEY (node_number, generated_value)`. The node number is usually obtained from a function that returns a different number on each node. You can create such a function by temporarily disabling DDL replication and creating a constant SQL function. Alternatively, you can use diff --git a/product_docs/docs/pgd/5/striggers.mdx b/product_docs/docs/pgd/5/striggers.mdx index 4b27b6707b3..377e0a44d2e 100644 --- a/product_docs/docs/pgd/5/striggers.mdx +++ b/product_docs/docs/pgd/5/striggers.mdx @@ -22,7 +22,7 @@ to them. Stream triggers are designed to be trigger-like in syntax. They leverage the PostgreSQL BEFORE trigger architecture and are likely to have similar -performance characteristics as PostgreSQL BEFORE Triggers. +performance characteristics as PostgreSQL BEFORE triggers. Multiple trigger definitions can use one trigger function, just as with normal PostgreSQL triggers. @@ -52,10 +52,10 @@ Also, any DML that's applied while executing a stream trigger isn't replicated to other PGD nodes and doesn't trigger the execution of standard local triggers. This is intentional. You can use it, for example, to log changes or conflicts captured by a -stream trigger into a table that is crash-safe and specific to that +stream trigger into a table that's crash-safe and specific to that node. See [Stream triggers examples](#stream-triggers-examples) for a working example. -## Trigger execution during Apply +## Trigger execution during apply Transform triggers execute first—once for each incoming change in the triggering table. These triggers fire before we attempt to locate a @@ -98,7 +98,7 @@ with the default value. However, when replicating from a node having the new schema version to a node having the old one, the column is missing from the target table. The `ignore_if_null` resolver isn't appropriate for a -rolling upgrade because it breaks replication as soon as the user +rolling upgrade because it breaks replication as soon as a user inserts a tuple with a non-NULL value in the new column in any of the upgraded nodes. @@ -137,13 +137,13 @@ The default `ignore_if_null` resolver isn't affected by this risk because any row replicated to node 2 has `col=NULL`. -Based on this example, we recommend running LiveCompare against the +Based on this example, we recommend running [LiveCompare](/livecompare/latest) against the whole cluster at the end of a rolling schema upgrade where the `ignore` resolver was used. This practice helps to ensure that you detect and fix any divergence. ## Terminology of row-types -We use these row-types: +PGD uses these row-types: - `SOURCE_OLD` is the row before update, that is, the key. - `SOURCE_NEW` is the new row coming from another node. @@ -173,9 +173,7 @@ resolution occurs once on each node and can occur with a significant time difference between them. As a result, communication between the multiple executions of the conflict trigger isn't possible. It's the responsibility of the author of the conflict trigger to ensure that the trigger gives exactly the same result for all related events. -Otherwise, data divergence occurs. Technical Support recommends that you formally test all conflict -triggers using the isolationtester tool supplied with -PGD. +Otherwise, data divergence occurs. !!! Warning - You can specify multiple conflict triggers on a single table, but @@ -200,8 +198,8 @@ In some cases, timestamp conflict detection doesn't detect a conflict at all. For example, in a concurrent `UPDATE`/`DELETE` where the `DELETE` occurs just after the `UPDATE`, any nodes that see first the `UPDATE` and then the `DELETE` don't see any conflict. If no conflict is seen, -the conflict trigger are never called. In the same situation but using -row version conflict detection, a conflict is seen, which a conflict trigger can then +the conflict trigger is never called. In the same situation but using +row-version conflict detection, a conflict is seen, which a conflict trigger can then handle. The trigger function has access to additional state information as well as @@ -218,7 +216,7 @@ You can use the function `bdr.trigger_get_row()` to retrieve `SOURCE_OLD`, `SOUR or `TARGET` rows, if a value exists for that operation. Changes to conflict triggers happen transactionally and are protected by -global DML locks during replication of the configuration change, similarly +global DML locks during replication of the configuration change. This behavior is similar to how some variants of `ALTER TABLE` are handled. If primary keys are updated inside a conflict trigger, it can @@ -238,7 +236,7 @@ Transform triggers execute in alphabetical order. A transform trigger can filter away rows, and it can do additional operations as needed. It can alter the values of any column or set them to `NULL`. The -return value decides the further action taken: +return value decides the next action taken: - If the trigger function returns a row, it's applied to the target. - If the trigger function returns a `NULL` row, there's no further action to @@ -263,7 +261,7 @@ important differences: BEFORE triggers aren't called at all for `UPDATE` and `DELETE` changes if a matching row in a table isn't found. -- Transform triggers are called before partition table routing occurs. +- Transform triggers are called before partition-table routing occurs. - Transform triggers have access to the lookup key via `SOURCE_OLD`, which isn't available to normal SQL triggers. @@ -361,7 +359,7 @@ END; $$; ``` -This example shows a conflict trigger that implements trusted source +This example shows a conflict trigger that implements trusted-source conflict detection, also known as trusted site, preferred node, or Always Wins resolution. It uses the `bdr.trigger_get_origin_node_id()` function to provide a solution that works with three or more nodes. diff --git a/product_docs/docs/pgd/5/terminology.mdx b/product_docs/docs/pgd/5/terminology.mdx index efbc63e12c5..2e1a76092f7 100644 --- a/product_docs/docs/pgd/5/terminology.mdx +++ b/product_docs/docs/pgd/5/terminology.mdx @@ -11,7 +11,7 @@ A type of replication that copies data to other PGD cluster members after the tr #### Commit scopes -Rules for managing how transactions are committed between the nodes and groups of a PGD cluster. Used to configure [synchronous replication](#synchronous-replication), [Group Commit](#group-commit), [CAMO](#camo-or-commit-at-most-once), [Eager](#eager), lag control, and other PGD features. +Rules for managing how transactions are committed between the nodes and groups of a PGD cluster. Used to configure [synchronous replication](#synchronous-replication), [Group Commit](#group-commit), [CAMO](#camo-or-commit-at-most-once), [Eager](#eager), Lag Control, and other PGD features. #### CAMO or commit-at-most-once @@ -102,7 +102,7 @@ The ability of a system to handle increasing read workloads. For example, PGD ca #### Subscription -PGD nodes will publish changes being made to data to nodes that are interested. Other PGD nodes will ask to subscribe to those changes. This creates a subscription and is the mechanism by which each node is updated. PGD nodes bidirectionally subscribe to other PGD node's changes. +PGD nodes will publish changes being made to data to nodes that are interested. Other PGD nodes will ask to subscribe to those changes. This behavior creates a subscription and is the mechanism by which each node is updated. PGD nodes bidirectionally subscribe to other PGD nodes' changes. #### Switchover @@ -134,7 +134,7 @@ A traditional computing approach of increasing a resource (CPU, memory, storage, #### Witness nodes -Witness nodes primarily serve to help the cluster establish a consensus. An odd number of data nodes are needed to establish a consensus and, where resources are limited, a witness node can be used to participate in cluster decisions but not replicate the data. Not holding the data means it can't operate as a standby server or provide majorities in synchronous commits. +Witness nodes primarily serve to help the cluster establish a consensus. An odd number of data nodes is needed to establish a consensus. Where resources are limited, a witness node can be used to participate in cluster decisions but not replicate the data. Not holding the data means it can't operate as a standby server or provide majorities in synchronous commits. #### Write leader @@ -142,5 +142,4 @@ In an Always-on architecture, a node is selected as the correct connection endpo #### Writer - When a [subscription](#subscription) delivers data changes to a PGD node, the database server tasks a worker process called a writer with getting those changes applied. - + When a [subscription](#subscription) delivers data changes to a PGD node, the database server tasks a worker process, called a writer, with getting those changes applied. diff --git a/product_docs/docs/pge/15/release_notes/rel_notes15.8.1.mdx b/product_docs/docs/pge/15/release_notes/rel_notes15.8.1.mdx index fadf09ef3c3..074ffb27630 100644 --- a/product_docs/docs/pge/15/release_notes/rel_notes15.8.1.mdx +++ b/product_docs/docs/pge/15/release_notes/rel_notes15.8.1.mdx @@ -9,5 +9,5 @@ New features, enhancements, bug fixes, and other changes in EDB Postgres Extende | Type | Description | Ticket | |---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------| -| Bug fix | A previous release introduced a new `in_create` field to the LogicalDecodingContext structure and changed its memory layout. This means that any code that expects the old layout will no longer be compatible. The release fixed the issue to ensure that the layout of previous database server versions is still compatible with the creation of a logical replication slot. | | +| Bug fix | A previous release introduced a new `in_create` field to the LogicalDecodingContext structure and changed its memory layout. This means that any code that expects the old layout will no longer be compatible. The release fixed the issue to ensure that the layout of previous database server versions is still compatible with the creation of a logical replication slot. | | diff --git a/product_docs/docs/pge/16/release_notes/rel_notes16.4.1.mdx b/product_docs/docs/pge/16/release_notes/rel_notes16.4.1.mdx index c0c767ab31f..6f28196ee7c 100644 --- a/product_docs/docs/pge/16/release_notes/rel_notes16.4.1.mdx +++ b/product_docs/docs/pge/16/release_notes/rel_notes16.4.1.mdx @@ -9,5 +9,5 @@ New features, enhancements, bug fixes, and other changes in EDB Postgres Extende | Type | Description | Ticket | |---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------| -| Bug fix | A previous release introduced a new `in_create` field to the LogicalDecodingContext structure and changed its memory layout. This means that any code that expects the old layout will no longer be compatible. The release fixed the issue to ensure that the layout of previous database server versions is still compatible with the creation of a logical replication slot. | | +| Bug fix | A previous release introduced a new `in_create` field to the LogicalDecodingContext structure and changed its memory layout. This means that any code that expects the old layout will no longer be compatible. The release fixed the issue to ensure that the layout of previous database server versions is still compatible with the creation of a logical replication slot. | | diff --git a/product_docs/docs/tde/15/troubleshooting.mdx b/product_docs/docs/tde/15/troubleshooting.mdx index e4de188a388..0b5e48adfa7 100644 --- a/product_docs/docs/tde/15/troubleshooting.mdx +++ b/product_docs/docs/tde/15/troubleshooting.mdx @@ -1,39 +1,61 @@ --- title: "Troubleshooting with encrypted WAL files" navTitle: WAL files +deepToC: true --- -You can encrypt WAL files. When troubleshooting with encrypted WAL falls, you can use WAL command options. +When TDE is enabled, WAL files are encrypted. If you want to perform operations on the encrypted WAL files, you need to allow the operations to decrypt the file. + +When troubleshooting with encrypted WAL files, you can use WAL command options. ## Dumping a TDE-encrypted WAL file -To work with an encrypted WAL file, the [pg_waldump](https://www.postgresql.org/docs/15/pgwaldump.html) needs to be aware of the unwrap key. You can either pass the key for the unwrap command using the following options to the `pg_waldump` command or depend on the fallback environment variable: +To work with an encrypted WAL file, you need to ensure the [pg_waldump](https://www.postgresql.org/docs/current/pgwaldump.html) utility can access the unwrap key and decrypt it. For this purpose, the utility requires awareness of three values. + +Pass these values using the following options to the `pg_waldump` command. Be sure to use the same values you used when initializing the TDE-enabled cluster. ### `--data-encryption` -Consider the WAL files to encrypt, and decrypt them before processing them. You must specify this option if the WAL files were encrypted by transparent data encryption. `pg_waldump` can't automatically detect whether WAL files are encrypted. Optionally, specify an AES key length. Valid values are 128 and 256. The default is 128. +Specify this option if the WAL files were encrypted by transparent data encryption. + +The `--data-encryption` or `-y` option ensures the command is aware of the encryption. Otherwise, `pg_waldump` can't detect whether WAL files are encrypted. + +Provide the same encryption configuration you used when initializing the TDE-enabled database cluster. For example, if you specified an AES key length during the cluster creation, you must specify it here as well. Otherwise, run the flag with no values. See [Using initdb TDE options](enabling_tde/#using-initdb-tde-options) for more information. ### `--key-file-name=` -Load the data encryption key from the given location. +Use the `--key-file-name=` option to reference the file that contains the data encryption key required to decrypt the WAL file. Provide the location of the `pg_encryption/key.bin` file. This file is generated when you initialize a cluster with encryption enabled. + +The command can then load the data encryption key from the provided location. ### `--key-unwrap-command=` -Specifies a command to unwrap (decrypt) the data encryption key. The command must include a placeholder `%p` that specifies the file to read the wrapped key from. The command needs to write the unwrapped key to its standard output. If you don't specify this option, the environment variable `PGDATAKEYUNWRAPCMD` is used. - -Use the special value `-` if you don't want to apply any key unwrapping command. - -You must specify this option or the environment variable fallback if you're using data encryption. See [Securing the data encryption key](./key_stores/) for more information. +For the `--key-unwrap-command=` option, provide the decryption command you specified to unwrap (decrypt) the data encryption key when initializing the TDE cluster. See [Using initdb TDE options](enabling_tde/#using-initdb-tde-options) for more information. + +Alternatively, you can set the `PGDATAKEYUNWRAPCMD` environment variable before running the `pg_waldump` command. If the `--key-unwrap-command=` option isn't specified,`pg_waldump` falls back on `PGDATAKEYUNWRAPCMD`. This [cluster initialization example](enabling_tde/#example) shows how to export an environment variable. + +### Example + +This example uses `pg_waldump` to display the WAL log of an encrypted cluster that uses `openssl` to wrap the data encryption key: + +``` +pg_waldump --data-encryption --key-file-name=pg_encryption/key.bin --key-unwrap-command='openssl enc -d -aes-128-cbc -pass pass: -in %p' +``` ## Resetting a corrupt TDE-encrypted WAL file -To reset a corrupt encrypted WAL file, the [pg_resetwal](https://www.postgresql.org/docs/15/app-pgresetwal.html) command needs to be aware of the unwrap key. You can either pass the key for the unwrap command using the following option to the `pg_resetwal` command or depend on the fallback environment variable: +To reset a corrupt encrypted WAL file, you must ensure the [pg_resetwal](https://www.postgresql.org/docs/current/app-pgresetwal.html) command can access the unwrap key and decrypt it. You can either pass the key for the unwrap command using the following option to the `pg_resetwal` command or depend on the fallback environment variable. ### `--key-unwrap-command=` -Specifies a command to unwrap (decrypt) the data encryption key. The command must include a placeholder `%p` that specifies the file to read the wrapped key from. The command needs to write the unwrapped key to its standard output. If you don't specify this option, the environment variable `PGDATAKEYUNWRAPCMD` is used. - -Use the special value `-` if you don't want to apply any key unwrapping command. - -You must specify this option or the environment variable fallback if you're using data encryption. See [Securing the data encryption key](./key_stores/) for more information. +For the `--key-unwrap-command=` option, provide the decryption command you specified to unwrap (decrypt) the data encryption key when initializing the TDE cluster. See [Using initdb TDE options](enabling_tde/#using-initdb-tde-options) for more information. + +Alternatively, you can set the `PGDATAKEYUNWRAPCMD` environment variable before running the `pg_resetwal` command. If the `--key-unwrap-command=` option isn't specified, `pg_resetwal` falls back on `PGDATAKEYUNWRAPCMD`. This [cluster initialization example](enabling_tde/#example) shows how to export an environment variable. + +### Example + +This example uses `pg_resetwal` to reset a corrupt encrypted WAL log of an encrypted cluster that uses `openssl` to wrap the data encryption key: +``` +pg_resetwal --key-unwrap-command='openssl enc -d -aes-128-cbc -pass pass: -in %p' +``` \ No newline at end of file