Skip to content

Commit

Permalink
Merge pull request #4130 from esl/update-cets-docs
Browse files Browse the repository at this point in the history
📖 Document CETS as an alternative to Mnesia
  • Loading branch information
chrzaszcz authored Sep 27, 2023
2 parents b0237ed + 4b8a0ca commit c5d0b24
Show file tree
Hide file tree
Showing 7 changed files with 253 additions and 70 deletions.
27 changes: 18 additions & 9 deletions doc/configuration/database-backends-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,16 @@ Subsequent sections go into more depth on each database: what they are suitable

Transient data:

* Mnesia - we highly recommend Mnesia (a highly available and distributed database) over Redis for storing **transient** data.
Being an Erlang-based database, it's the default persistence option for most modules in MongooseIM.

!!! Warning
We **strongly recommend** keeping **persistent** data in an external DB (RDBMS) for production.
Mnesia is not suitable for the volumes of **persistent** data which some modules may require.
Sooner or later a migration will be needed which may be painful.
It is possible to store all data in Mnesia, but only for testing purposes, not for any serious deployments.
* CETS - a library to synchronise ETS tables between nodes.
A new choice to share live data across the MongooseIM cluster.
We recommend to use this backend for transient data.
This backend requires an RDBMS database configured because we use an external database to discover nodes in the cluster.
For a CETS config example, see [tutorials](../tutorials/CETS-configure.md).

* Mnesia - a built-in Erlang Database.
Mnesia is fine for a cluster of fixed size with reliable networking between nodes and with nodes rarely restarted.
There are some issues when nodes are restarting or new ones joining the cluster. For this case, we recommend to use CETS instead.
Mnesia is still the default backend for some modules for compatibility reasons with older config files.

* Redis - A fantastic choice for storing live data.
It's highly scalable and it can be easily shared by multiple MongooseIM nodes.
Expand All @@ -47,12 +49,19 @@ Persistent Data:

* ElasticSearch - Only for MAM (Message Archive Management).

* Mnesia - some backends support Mnesia to store data, but it is not recommended.
It is still the default option, when not specifying a backend for many modules, so be careful.

!!! Warning
We **strongly recommend** keeping **persistent** data in an external DB (RDBMS) for production.
Mnesia is not suitable for the volumes of **persistent** data which some modules may require.
Sooner or later a migration will be needed which may be painful.
It is possible to store all data in Mnesia, but only for testing purposes, not for any serious deployments.

User Data:

* LDAP - Used for: users, shared rosters, vCards


## RDBMS

### MySQL
Expand Down
3 changes: 2 additions & 1 deletion doc/configuration/general.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,8 @@ These options can be used to configure the way MongooseIM manages user sessions.
* **Example:** `sm_backend = "redis"`

Backend for storing user session data. All nodes in a cluster must have access to a complete session database.
Mnesia is sufficient in most cases, use Redis only in large deployments when you notice issues with the mnesia backend. Requires a redis pool with the `default` tag defined in the `outgoing_pools` section.
CETS is a new backend, requires RDBMS configured to work properly.
Mnesia is a legacy backend, sufficient in most cases, use Redis only in large deployments when you notice issues with the mnesia backend. Requires a redis pool with the `default` tag defined in the `outgoing_pools` section.
See the section about [redis connection setup](./outgoing-connections.md#redis-specific-options) for more information.

!!! Warning
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,98 +64,186 @@ Checklist:
- the same cookie across all nodes (`vm.args` `-setcookie` parameter)
- each node should be able to ping other nodes using its sname
(ex. `net_adm:ping('mongoose@localhost')`)
- RDBMS backend is configured, so CETS could discover nodes

### Initial node

There is no action required on the initial node.
=== "CETS"

Just start MongooseIM using `mongooseim start` or `mongooseim live`.
Clustering is automatic. There is no difference between nodes.

=== "Mnesia"

There is no action required on the initial node.

Just start MongooseIM using `mongooseim start` or `mongooseim live`.

### New node - joining cluster

=== "CETS"

```bash
mongooseimctl start
mongooseimctl started #waits until MongooseIM starts
mongooseimctl join_cluster ClusterMember
```
Clustering is automatic.

`ClusterMember` is the name of a running node set in `vm.args` file, for example `mongooseim@localhost`.
This node has to be part of the cluster we'd like to join.
=== "Mnesia"

First, MongooseIM will display a warning and a question if the operation should proceed:
```bash
mongooseimctl start
mongooseimctl started #waits until MongooseIM starts
mongooseimctl join_cluster ClusterMember
```

```text
Warning. This will drop all current connections and will discard all persistent data from Mnesia. Do you want to continue? (yes/no)
```
`ClusterMember` is the name of a running node set in `vm.args` file, for example `mongooseim@localhost`.
This node has to be part of the cluster we'd like to join.

If you type `yes` MongooseIM will start joining the cluster.
Successful output may look like the following:
First, MongooseIM will display a warning and a question if the operation should proceed:

```text
You have successfully joined the node mongooseim2@localhost to the cluster with node member mongooseim@localhost
```
```text
Warning. This will drop all current connections and will discard all persistent data from Mnesia. Do you want to continue? (yes/no)
```

In order to skip the question you can add option `-f` which will perform the action
without displaying the warning and waiting for the confirmation.
If you type `yes` MongooseIM will start joining the cluster.
Successful output may look like the following:

```text
You have successfully joined the node mongooseim2@localhost to the cluster with node member mongooseim@localhost
```

In order to skip the question you can add option `-f` which will perform the action
without displaying the warning and waiting for the confirmation.

### Leaving cluster

To leave a running node from the cluster, call:
=== "CETS"

```bash
mongooseimctl leave_cluster
```
Stopping the node is enough to leave the cluster.
If you want to avoid the node joining the cluster again, you have to specify a different `cluster_name`
option in the CETS backend configuration. A different Erlang cookie is a good idea too.

It only makes sense to use it if the node is the part of a cluster, e.g `join_cluster` was called from that node before.
=== "Mnesia"

Similarly to `join_cluster` a warning and a question will be displayed unless the option `-f` is added to the command.
To leave a running node from the cluster, call:

The successful output from the above command may look like the following:
```bash
mongooseimctl leave_cluster
```

```text
The node mongooseim2@localhost has successfully left the cluster
```
It only makes sense to use it if the node is part of a cluster, e.g `join_cluster` was called on that node before.

### Removing a node from the cluster
Similarly to `join_cluster` a warning and a question will be displayed unless the option `-f` is added to the command.

To remove another node from the cluster, call the following command from one of the cluster members:
The successful output from the above command may look like the following:

```bash
mongooseimctl remove_from_cluster RemoteNodeName
```
```text
The node mongooseim2@localhost has successfully left the cluster
```

where `RemoteNodeName` is a name of the node that we'd like to remove from our cluster.
This command could be useful when the node is dead and not responding and we'd like to remove it remotely.
The successful output from the above command may look like the following:
### Removing a node from the cluster

```text
The node mongooseim2@localhost has been removed from the cluster
```
=== "CETS"

### Cluster status
A stopped node would be automatically removed from the node discovery table in RDBMS database after some time.
It is needed so other nodes would not try to connect to the stopped node.

You can use the following commands on any of the running nodes to examine the cluster
or to see if a newly added node is properly clustered:
=== "Mnesia"

```bash
mongooseimctl mnesia info | grep "running db nodes"
```
To remove another node from the cluster, call the following command from one of the cluster members:

This command shows all running nodes.
A healthy cluster should contain all nodes here.
For example:
```bash
running db nodes = [mongooseim@node1, mongooseim@node2]
```
To see stopped or misbehaving nodes following command can be useful:
```bash
mongooseimctl remove_from_cluster RemoteNodeName
```

```bash
mongooseimctl mnesia info | grep "stopped db nodes"
```
where `RemoteNodeName` is the name of the node that we'd like to remove from our cluster.
This command could be useful when the node is dead and not responding and we'd like to remove it remotely.
The successful output from the above command may look like the following:

```text
The node mongooseim2@localhost has been removed from the cluster
```

### Cluster status

This command shows which nodes are considered stopped.
This does not necessarily indicate that they are down but might be a symptom of a communication problem.
=== "CETS"

Run the command:

```bash
mongooseimctl cets systemInfo
```

`joinedNodes` should contain a list of properly joined nodes:

```json
"joinedNodes" : [
"mongooseim@node1",
"mongooseim@node2"
]
```

It should generally be equal to the list of `discoveredNodes`.

If it is not equal, you could have some configuration or networking issues.
You can check the `unavailableNodes`, `remoteNodesWithUnknownTables`,
and `remoteNodesWithMissingTables` lists for more information (generally, these lists should be empty).
You can read the description for other fields of `systemInfo` in the
[GraphQL API reference](../graphql-api/admin-graphql-doc.html#definition-CETSSystemInfo).

For a properly configured 2 nodes cluster the metrics would show something like that:

```json
mongooseimctl metric getMetrics --name '["global", "cets", "system"]'
{
"data" : {
"metric" : {
"getMetrics" : [
{
"unavailable_nodes" : 0,
"type" : "cets_system",
"remote_unknown_tables" : 0,
"remote_nodes_without_disco" : 0,
"remote_nodes_with_unknown_tables" : 0,
"remote_nodes_with_missing_tables" : 0,
"remote_missing_tables" : 0,
"name" : [
"global",
"cets",
"system"
],
"joined_nodes" : 2,
"discovery_works" : 1,
"discovered_nodes" : 2,
"conflict_tables" : 0,
"conflict_nodes" : 0,
"available_nodes" : 2
}
]
}
}
}
```

=== "Mnesia"

You can use the following commands on any of the running nodes to examine the cluster
or to see if a newly added node is properly clustered:

```bash
mongooseimctl mnesia info | grep "running db nodes"
```

This command shows all running nodes.
A healthy cluster should contain all nodes here.
For example:
```bash
running db nodes = [mongooseim@node1, mongooseim@node2]
```
To see stopped or misbehaving nodes the following command can be useful:

```bash
mongooseimctl mnesia info | grep "stopped db nodes"
```

This command shows which nodes are considered stopped.
This does not necessarily indicate that they are down but might be a symptom of a communication problem.

## Load Balancing

Expand Down
21 changes: 21 additions & 0 deletions doc/operation-and-maintenance/MongooseIM-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,27 @@ Metrics specific to an extension, e.g. Message Archive Management, are described
| `[global, data, dist]` | proplist | Network stats for an Erlang distributed communication. A proplist with values: `recv_oct`, `recv_cnt`, `recv_max`, `send_oct`, `send_max`, `send_cnt`, `send_pend`, `connections`. |
| `[global, data, rdbms, PoolName]` | proplist | For every RDBMS pool defined, an instance of this metric is available. It is a proplist with values `workers`, `recv_oct`, `recv_cnt`, `recv_max`, `send_oct`, `send_max`, `send_cnt`, `send_pend`. |

### CETS system metrics

| Metric name | Type | Description |
| ----------- | ---- | ----------- |
| `[global, cets, system]` | proplist | A proplist with a list of stats. Description is below. |

| Stat Name | Description |
| ----------- | ----------- |
| `available_nodes` | Available nodes (nodes that are connected to us and have the CETS disco process started). |
| `unavailable_nodes` | Unavailable nodes (nodes that do not respond to our pings). |
| `joined_nodes` | Joined nodes (nodes that have our local tables running). |
| `discovered_nodes` | Discovered nodes (nodes that are extracted from the discovery backend). |
| `remote_nodes_without_disco` | Nodes that have more tables registered than the local node. |
| `remote_nodes_with_unknown_tables` | Nodes with unknown tables. |
| `remote_unknown_tables` | Unknown remote tables. |
| `remote_nodes_with_missing_tables` | Nodes that are available, but do not host some of our local tables. |
| `remote_missing_tables` | Nodes that replicate at least one of our local tables to a different list of nodes. |
| `conflict_nodes` | Nodes that replicate at least one of our local tables to a different list of nodes. |
| `conflict_tables` | Tables that have conflicting replication destinations. |
| `discovery_works` | Returns 1 if the last discovery attempt is successful (otherwise returns 0). |

### VM metrics

| Metric name | Type | Description |
Expand Down
63 changes: 63 additions & 0 deletions doc/tutorials/CETS-configure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## CETS Config Example

[CETS](https://github.com/esl/cets/) is a library, which allows to replicate in-memory data
across the MongooseIM cluster. It could be used to store:

- information about online XMPP sessions;
- information about outgoung S2S connections;
- stream management session IDs;
- information about online MUC rooms.

If you want to use CETS instead of Mnesia, ensure that these options are set:

```toml
[general]
sm_backend = "cets"
component_backend = "cets"
s2s_backend = "cets"

[internal_databases.cets]

# The list of modules that use CETS
# You should enable only modules that you use
[modules.mod_stream_management]
backend = "cets"

[modules.mod_bosh]
backend = "cets"

[modules.mod_muc]
online_backend = "cets"

[modules.mod_jingle_sip]
backend = "cets"
```

Ensure that `outgoing_pools` are configured with RDBMS, so CETS could get a list of MongooseIM nodes, which use the same
relational database and cluster them together.

A preferred way to install MongooseIM is [Helm Charts](https://github.com/esl/MongooseHelm/) on Kubernetes, so it allows
to set `volatileDatabase` to `cets` and the values would be applied using Helm's templates


## CETS with the file discovery backend

It is possible to read a list of nodes to cluster from a file. But MongooseIM does not modify this file, so it is the task
for the operator to update the file. But MongooseIM would reread the file without the restart:

```toml
[internal_databases.cets]
backend = "file"
node_list_file = "/etc/mongooseim/mongooseim_nodes.txt"
```

And the format of the `node_list_file` file is a new line separated list of nodes:

```
[email protected]
[email protected]
[email protected]
```

File backend for CETS is only useful if you do not use an RDBMS database.
You could use some external script to get the list of nodes from the AWS CLI command or some other way.
Loading

0 comments on commit c5d0b24

Please sign in to comment.