Merge pull request #4130 from esl/update-cets-docs

📖 Document CETS as an alternative to Mnesia
esl · Sep 27, 2023 · c5d0b24 · c5d0b24
2 parents b0237ed + 4b8a0ca
commit c5d0b24
Show file tree

Hide file tree

Showing 7 changed files with 253 additions and 70 deletions.
diff --git a/doc/configuration/database-backends-configuration.md b/doc/configuration/database-backends-configuration.md
@@ -21,14 +21,16 @@ Subsequent sections go into more depth on each database: what they are suitable
 
 Transient data:
 
-* Mnesia - we highly recommend Mnesia (a highly available and distributed database) over Redis for storing **transient** data.
- Being an Erlang-based database, it's the default persistence option for most modules in MongooseIM.
-
-    !!! Warning
-        We **strongly recommend** keeping **persistent** data in an external DB (RDBMS) for production.
-        Mnesia is not suitable for the volumes of **persistent** data which some modules may require.
-        Sooner or later a migration will be needed which may be painful.
-        It is possible to store all data in Mnesia, but only for testing purposes, not for any serious deployments.
+* CETS - a library to synchronise ETS tables between nodes.
+  A new choice to share live data across the MongooseIM cluster.
+  We recommend to use this backend for transient data.
+  This backend requires an RDBMS database configured because we use an external database to discover nodes in the cluster.
+  For a CETS config example, see [tutorials](../tutorials/CETS-configure.md).
+
+* Mnesia - a built-in Erlang Database.
+  Mnesia is fine for a cluster of fixed size with reliable networking between nodes and with nodes rarely restarted.
+  There are some issues when nodes are restarting or new ones joining the cluster. For this case, we recommend to use CETS instead.
+  Mnesia is still the default backend for some modules for compatibility reasons with older config files.
 
 * Redis - A fantastic choice for storing live data.
  It's highly scalable and it can be easily shared by multiple MongooseIM nodes.
@@ -47,12 +49,19 @@ Persistent Data:
 
 * ElasticSearch - Only for MAM (Message Archive Management).
 
+* Mnesia - some backends support Mnesia to store data, but it is not recommended.
+  It is still the default option, when not specifying a backend for many modules, so be careful.
+
+    !!! Warning
+        We **strongly recommend** keeping **persistent** data in an external DB (RDBMS) for production.
+        Mnesia is not suitable for the volumes of **persistent** data which some modules may require.
+        Sooner or later a migration will be needed which may be painful.
+        It is possible to store all data in Mnesia, but only for testing purposes, not for any serious deployments.
 
 User Data:
 
 * LDAP -  Used for: users, shared rosters, vCards
 
-
 ## RDBMS
 
 ### MySQL

diff --git a/doc/configuration/general.md b/doc/configuration/general.md
@@ -147,7 +147,8 @@ These options can be used to configure the way MongooseIM manages user sessions.
 * **Example:** `sm_backend = "redis"`
 
 Backend for storing user session data. All nodes in a cluster must have access to a complete session database.
-Mnesia is sufficient in most cases, use Redis only in large deployments when you notice issues with the mnesia backend. Requires a redis pool with the `default` tag defined in the `outgoing_pools` section.
+CETS is a new backend, requires RDBMS configured to work properly.
+Mnesia is a legacy backend, sufficient in most cases, use Redis only in large deployments when you notice issues with the mnesia backend. Requires a redis pool with the `default` tag defined in the `outgoing_pools` section.
 See the section about [redis connection setup](./outgoing-connections.md#redis-specific-options) for more information.
 
 !!! Warning

diff --git a/doc/operation-and-maintenance/Cluster-configuration-and-node-management.md b/doc/operation-and-maintenance/Cluster-configuration-and-node-management.md
@@ -64,98 +64,186 @@ Checklist:
 - the same cookie across all nodes (`vm.args` `-setcookie` parameter)
 - each node should be able to ping other nodes using its sname
    (ex. `net_adm:ping('mongoose@localhost')`)
+- RDBMS backend is configured, so CETS could discover nodes
 
 ### Initial node
 
-There is no action required on the initial node.
+=== "CETS"
 
-Just start MongooseIM using `mongooseim start` or `mongooseim live`.
+    Clustering is automatic. There is no difference between nodes.
+
+=== "Mnesia"
+
+    There is no action required on the initial node.
+
+     Just start MongooseIM using `mongooseim start` or `mongooseim live`.
 
 ### New node - joining cluster
 
+=== "CETS"
 
-```bash
-mongooseimctl start
-mongooseimctl started #waits until MongooseIM starts
-mongooseimctl join_cluster ClusterMember
-```
+    Clustering is automatic.
 
-`ClusterMember` is the name of a running node set in `vm.args` file, for example `mongooseim@localhost`.
-This node has to be part of the cluster we'd like to join.
+=== "Mnesia"
 
-First, MongooseIM will display a warning and a question if the operation should proceed:
+    ```bash
+    mongooseimctl start
+    mongooseimctl started #waits until MongooseIM starts
+    mongooseimctl join_cluster ClusterMember
+    ```
 
-```text
-Warning. This will drop all current connections and will discard all persistent data from Mnesia. Do you want to continue? (yes/no)
-```
+    `ClusterMember` is the name of a running node set in `vm.args` file, for example `mongooseim@localhost`.
+    This node has to be part of the cluster we'd like to join.
 
-If you type `yes` MongooseIM will start joining the cluster.
-Successful output may look like the following:
+    First, MongooseIM will display a warning and a question if the operation should proceed:
 
-```text
-You have successfully joined the node mongooseim2@localhost to the cluster with node member mongooseim@localhost
-```
+    ```text
+    Warning. This will drop all current connections and will discard all persistent data from Mnesia. Do you want to continue? (yes/no)
+    ```
 
-In order to skip the question you can add option `-f` which will perform the action
-without displaying the warning and waiting for the confirmation.
+    If you type `yes` MongooseIM will start joining the cluster.
+    Successful output may look like the following:
+
+    ```text
+    You have successfully joined the node mongooseim2@localhost to the cluster with node member mongooseim@localhost
+    ```
+
+    In order to skip the question you can add option `-f` which will perform the action
+    without displaying the warning and waiting for the confirmation.
 
 ### Leaving cluster
 
-To leave a running node from the cluster, call:
+=== "CETS"
 
-```bash
-mongooseimctl leave_cluster
-```
+    Stopping the node is enough to leave the cluster.
+    If you want to avoid the node joining the cluster again, you have to specify a different `cluster_name`
+    option in the CETS backend configuration. A different Erlang cookie is a good idea too.
 
-It only makes sense to use it if the node is the part of a cluster, e.g `join_cluster` was called from that node before.
+=== "Mnesia"
 
-Similarly to `join_cluster` a warning and a question will be displayed unless the option `-f` is added to the command.
+    To leave a running node from the cluster, call:
 
-The successful output from the above command may look like the following:
+    ```bash
+    mongooseimctl leave_cluster
+    ```
 
-```text
-The node mongooseim2@localhost has successfully left the cluster
-```
+    It only makes sense to use it if the node is part of a cluster, e.g `join_cluster` was called on that node before.
 
-### Removing a node from the cluster
+    Similarly to `join_cluster` a warning and a question will be displayed unless the option `-f` is added to the command.
 
-To remove another node from the cluster, call the following command from one of the cluster members:
+    The successful output from the above command may look like the following:
 
-```bash
-mongooseimctl remove_from_cluster RemoteNodeName
-```
+    ```text
+    The node mongooseim2@localhost has successfully left the cluster
+    ```
 
-where `RemoteNodeName` is a name of the node that we'd like to remove from our cluster.
-This command could be useful when the node is dead and not responding and we'd like to remove it remotely.
-The successful output from the above command may look like the following:
+### Removing a node from the cluster
 
-```text
-The node mongooseim2@localhost has been removed from the cluster
-```
+=== "CETS"
 
-### Cluster status
+    A stopped node would be automatically removed from the node discovery table in RDBMS database after some time.
+    It is needed so other nodes would not try to connect to the stopped node.
 
-You can use the following commands on any of the running nodes to examine the cluster
-or to see if a newly added node is properly clustered:
+=== "Mnesia"
 
-```bash
-mongooseimctl mnesia info | grep "running db nodes"
-```
+    To remove another node from the cluster, call the following command from one of the cluster members:
 
-This command shows all running nodes.
-A healthy cluster should contain all nodes here.
-For example:
-```bash
-running db nodes = [mongooseim@node1, mongooseim@node2]
-```
-To see stopped or misbehaving nodes following command can be useful:
+    ```bash
+    mongooseimctl remove_from_cluster RemoteNodeName
+    ```
 
-```bash
-mongooseimctl mnesia info | grep "stopped db nodes"
-```
+    where `RemoteNodeName` is the name of the node that we'd like to remove from our cluster.
+    This command could be useful when the node is dead and not responding and we'd like to remove it remotely.
+    The successful output from the above command may look like the following:
+
+    ```text
+    The node mongooseim2@localhost has been removed from the cluster
+    ```
+
+### Cluster status
 
-This command shows which nodes are considered stopped.
-This does not necessarily indicate that they are down but might be a symptom of a communication problem.
+=== "CETS"
+
+    Run the command:
+
+    ```bash
+    mongooseimctl cets systemInfo
+    ```
+
+    `joinedNodes` should contain a list of properly joined nodes:
+
+    ```json
+    "joinedNodes" : [
+      "mongooseim@node1",
+      "mongooseim@node2"
+    ]
+    ```
+
+    It should generally be equal to the list of `discoveredNodes`.
+
+    If it is not equal, you could have some configuration or networking issues.
+    You can check the `unavailableNodes`, `remoteNodesWithUnknownTables`,
+    and `remoteNodesWithMissingTables` lists for more information (generally, these lists should be empty).
+    You can read the description for other fields of `systemInfo` in the
+    [GraphQL API reference](../graphql-api/admin-graphql-doc.html#definition-CETSSystemInfo).
+
+    For a properly configured 2 nodes cluster the metrics would show something like that:
+
+    ```json
+    mongooseimctl metric getMetrics --name '["global", "cets", "system"]'
+    {
+      "data" : {
+        "metric" : {
+          "getMetrics" : [
+            {
+              "unavailable_nodes" : 0,
+              "type" : "cets_system",
+              "remote_unknown_tables" : 0,
+              "remote_nodes_without_disco" : 0,
+              "remote_nodes_with_unknown_tables" : 0,
+              "remote_nodes_with_missing_tables" : 0,
+              "remote_missing_tables" : 0,
+              "name" : [
+                "global",
+                "cets",
+                "system"
+              ],
+              "joined_nodes" : 2,
+              "discovery_works" : 1,
+              "discovered_nodes" : 2,
+              "conflict_tables" : 0,
+              "conflict_nodes" : 0,
+              "available_nodes" : 2
+            }
+          ]
+        }
+      }
+    }
+    ```
+
+=== "Mnesia"
+
+    You can use the following commands on any of the running nodes to examine the cluster
+    or to see if a newly added node is properly clustered:
+
+    ```bash
+    mongooseimctl mnesia info | grep "running db nodes"
+    ```
+
+    This command shows all running nodes.
+    A healthy cluster should contain all nodes here.
+    For example:
+    ```bash
+    running db nodes = [mongooseim@node1, mongooseim@node2]
+    ```
+    To see stopped or misbehaving nodes the following command can be useful:
+
+    ```bash
+    mongooseimctl mnesia info | grep "stopped db nodes"
+    ```
+
+    This command shows which nodes are considered stopped.
+    This does not necessarily indicate that they are down but might be a symptom of a communication problem.
 
 ## Load Balancing
 

diff --git a/doc/operation-and-maintenance/MongooseIM-metrics.md b/doc/operation-and-maintenance/MongooseIM-metrics.md
@@ -179,6 +179,27 @@ Metrics specific to an extension, e.g. Message Archive Management, are described
 | `[global, data, dist]` | proplist | Network stats for an Erlang distributed communication. A proplist with values: `recv_oct`, `recv_cnt`, `recv_max`, `send_oct`, `send_max`, `send_cnt`, `send_pend`, `connections`. |
 | `[global, data, rdbms, PoolName]` | proplist | For every RDBMS pool defined, an instance of this metric is available. It is a proplist with values `workers`, `recv_oct`, `recv_cnt`, `recv_max`, `send_oct`, `send_max`, `send_cnt`, `send_pend`. |
 
+### CETS system metrics
+
+| Metric name | Type | Description |
+| ----------- | ---- | ----------- |
+| `[global, cets, system]` | proplist | A proplist with a list of stats. Description is below. |
+
+| Stat Name | Description |
+| ----------- | ----------- |
+| `available_nodes` | Available nodes (nodes that are connected to us and have the CETS disco process started). |
+| `unavailable_nodes` | Unavailable nodes (nodes that do not respond to our pings). |
+| `joined_nodes` | Joined nodes (nodes that have our local tables running). |
+| `discovered_nodes` | Discovered nodes (nodes that are extracted from the discovery backend). |
+| `remote_nodes_without_disco` | Nodes that have more tables registered than the local node. |
+| `remote_nodes_with_unknown_tables` | Nodes with unknown tables. |
+| `remote_unknown_tables` | Unknown remote tables. |
+| `remote_nodes_with_missing_tables` | Nodes that are available, but do not host some of our local tables. |
+| `remote_missing_tables` | Nodes that replicate at least one of our local tables to a different list of nodes. |
+| `conflict_nodes` | Nodes that replicate at least one of our local tables to a different list of nodes. |
+| `conflict_tables` | Tables that have conflicting replication destinations. |
+| `discovery_works` | Returns 1 if the last discovery attempt is successful (otherwise returns 0). |
+
 ### VM metrics
 
 | Metric name | Type | Description |

diff --git a/doc/tutorials/CETS-configure.md b/doc/tutorials/CETS-configure.md
@@ -0,0 +1,63 @@
+## CETS Config Example
+
+[CETS](https://github.com/esl/cets/) is a library, which allows to replicate in-memory data
+across the MongooseIM cluster. It could be used to store:
+
+- information about online XMPP sessions;
+- information about outgoung S2S connections;
+- stream management session IDs;
+- information about online MUC rooms.
+
+If you want to use CETS instead of Mnesia, ensure that these options are set:
+
+```toml
+[general]
+  sm_backend = "cets"
+  component_backend = "cets"
+  s2s_backend = "cets"
+
+[internal_databases.cets]
+
+# The list of modules that use CETS
+# You should enable only modules that you use
+[modules.mod_stream_management]
+  backend = "cets"
+
+[modules.mod_bosh]
+  backend = "cets"
+
+[modules.mod_muc]
+  online_backend = "cets"
+
+[modules.mod_jingle_sip]
+  backend = "cets"
+```
+
+Ensure that `outgoing_pools` are configured with RDBMS, so CETS could get a list of MongooseIM nodes, which use the same
+relational database and cluster them together.
+
+A preferred way to install MongooseIM is [Helm Charts](https://github.com/esl/MongooseHelm/) on Kubernetes, so it allows
+to set `volatileDatabase` to `cets` and the values would be applied using Helm's templates
+
+
+## CETS with the file discovery backend
+
+It is possible to read a list of nodes to cluster from a file. But MongooseIM does not modify this file, so it is the task
+for the operator to update the file. But MongooseIM would reread the file without the restart:
+
+```toml
+[internal_databases.cets]
+    backend = "file"
+    node_list_file = "/etc/mongooseim/mongooseim_nodes.txt"
+```
+
+And the format of the `node_list_file` file is a new line separated list of nodes:
+
+```
+[email protected]
+[email protected]
+[email protected]
+```
+
+File backend for CETS is only useful if you do not use an RDBMS database.
+You could use some external script to get the list of nodes from the AWS CLI command or some other way.