Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase DRBD Net/ping-timeout #45

Open
wants to merge 133 commits into
base: 2.30.8-8.2-linstor-fixes-staging
Choose a base branch
from

Conversation

benjamreis
Copy link

This would avoid fake dead node assumption

MarkSymsCtx and others added 30 commits October 13, 2023 14:43
…probe calls

Signed-off-by: Mark Syms <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
This was a patch added to the sm RPM git repo before we had this
forked git repo for sm in the xcp-ng github organisation.
This was a patch added to the sm RPM git repo before we had this
forked git repo for sm in the xcp-ng github organisation.
The driver is needed to transition to the ext driver.
Users who upgrade from XCP-ng <= 8.0 need a working driver so that they
can move the VMs out of the ext4 SR and delete the SR.

Not keeping that driver would force such users to upgrade to 8.1 first,
convert their SR, then upgrade to a higher version.

However, like in XCP-ng 8.1, the driver will refuse any new ext4 SR
creation.
Some important points:

- linstor.KV must use an identifier name that starts with a letter (so it uses a "sr-" prefix).

- Encrypted VDI are supported with key_hash attribute (not tested, experimental).

- When a new LINSTOR volume is created on a host (via snapshot or create), the remaining diskless
devices are not necessarily created on other hosts. So if a resource definition exists without
local device path, we ask it to LINSTOR. Wait 5s for symlink creation when a new volume
is created => 5s is is purely arbitrary, but this guarantees that we do not try to access the
volume if the symlink has not yet been created by the udev rule.

- Can change the provisioning using the device config 'provisioning' param.

- We can only increase volume size (See: LINBIT/linstor-server#66),
it would be great if we could shrink volumes to limit the space used by the snapshots.

- Inflate/Deflate can only be executed on the master host, a linstor-manager plugin is present
to do this from slaves. The same plugin is used to open LINSTOR ports + start controller.

- Use a `total_allocated_volume_size` method to have a good idea of the reserved memory
Why? Because `physical_free_size` is computed using the LVM used size, in the case of thick provisioning it's ok,
but when thin provisioning is choosen LVM returns only the allocated size using the used block count. So this method
solves this problem, it takes the fixed virtual volume size of each node to compute the required size to store the
volume data.

- Call vhd-util on remote hosts using the linstor-manager when necessary, i.e. vhd-util is called to get vhd info,
the DRBD device can be in use (and unusable by external processes), so we must use the local LVM device that
contains the DRBD data or a remote disk if the DRBD device is diskless.

- If a DRBD device is in use when vhdutil.getVHDInfo is called, we must have no
errors. So a LinstorVhdUtil wrapper is now used to bypass DRBD layer when
VDIs are loaded.

- Refresh PhyLink when unpause in called on DRBD devices:
We must always recreate the symlink to ensure we have
the right info. Why? Because if the volume UUID is changed in
LINSTOR the symlink is not directly updated. When live leaf
coalesce is executed we have these steps:
"A" -> "OLD_A"
"B" -> "A"
Without symlink update the previous "A" path is reused instead of
"B" path. Note: "A", "B" and "OLD_A" are UUIDs.

- Since linstor python modules are not present on every XCP-ng host,
module imports are protected by try.. except... blocks.

- Provide a linstor-monitor daemon to check master changes
- Check if "create" doesn't succeed without zfs packages
- Check if "scan" failed if the path is not mounted (not a ZFS mountpoint)
Some QNAP devices do not provide ACL when fetching NFS mounts.
In this case the assumed ACL should be: "*".

This commit fixes the crash when attempting to access the non existing ACL.
Relevant issues:
- xapi-project#511
- xcp-ng/xcp#113
Co-authored-by: Piotr Robert Konopelko <[email protected]>
Signed-off-by: Aleksander Wieliczko <[email protected]>
Signed-off-by: Ronan Abhamon <[email protected]>
`umount` should not be called when `legacy_mode` is enabled, otherwise a mounted dir
used during SR creation is unmounted at the end of the `create` call (and also
when a PBD is unplugged) in `detach` block.

Signed-off-by: Ronan Abhamon <[email protected]>
A sm-config boolean param `subdir` is available to configure where to store the VHDs:
- In a subdir with the SR UUID, the new behavior
- In the root directory of the MooseFS SR

By default, new SRs are created with `subdir` = True.
Existing SRs  are not modified and continue to use the folder that was given at
SR creation, directly, without looking for a subdirectory.

Signed-off-by: Ronan Abhamon <[email protected]>
Ensure all shared drivers are imported in `_is_open` definition to register
them in the driver list. Otherwise this function always fails with a SRUnknownType exception.

Also, we must add two fake mandatory parameters to make MooseFS happy: `masterhost` and `rootpath`.
Same for CephFS with: `serverpath`. (NFS driver is directly patched to ensure there is no usage of
the `serverpath` param because its value is equal to None.)

`location` param is required to use ZFS, to be more precise, in the parent class: `FileSR`.

Signed-off-by: Ronan Abhamon <[email protected]>
SR_CACHING offers the capacity to use IntelliCache, but this
feature is only available using NFS SR.

For more details, the implementation of `_setup_cache` in blktap2.py
uses only an instance of NFSFileVDI for the shared target.

Signed-off-by: Ronan Abhamon <[email protected]>
The probe method is not implemented so we
shouldn't advertise it.

Signed-off-by: BenjiReis <[email protected]>
When static vdis are used there is no snapshots and we don't want to
call method from XAPI.

Signed-off-by: Guillaume <[email protected]>
This file is meant to remain unchanged and regularly updated along with
the SM component. Users can create a custom configuration file in
/etc/multipath/conf.d/ instead.

Signed-off-by: Samuel Verschelde <[email protected]>
(cherry picked from commit b44d3f5)
Meant to be installed as /etc/multipath/conf.d/custom.conf for users
to have an easy entry point for editing, as well as information on what
will happen to this file through future system updates and upgrades.

Signed-off-by: Samuel Verschelde <[email protected]>
(cherry picked from commit 18b79a5)
Update Makefile so that the file is installed along with sm.

Signed-off-by: Samuel Verschelde <[email protected]>
Otherwise the SIGALRM signal can be emitted after the execution
of the given user function.

Signed-off-by: Ronan Abhamon <[email protected]>
Details:
- vdi_attach and vdi_detach are now exclusive
- lock volumes on slaves (when vdi_xxx command is used) and avoid release if a timeout is reached
- load all VDIs only when necessary, so only if it exists at least a journal entry or if sr_scan/sr_attach is executed
- use a __slots__ attr in LinstorVolumeManager to increase performance
- use a cache directly in LinstorVolumeManager to reduce network request count with LINSTOR
- try to always use the same LINSTOR KV object to limit netwok usage
- use a cache to avoid a new JSON parsing when all VDIs are loaded in LinstorSR
- limit request count when LINSTOR storage pool info is fetched using a fetch interval
- avoid race condition in cleanup: check if a volume is locked in a slave or not before modify it
- ...

Signed-off-by: Ronan Abhamon <[email protected]>
…_from_config is executed

Signed-off-by: Ronan Abhamon <[email protected]>
Now, we can:
- Start a controller on any node
- Share the LINSTOR volume list using a specific volume "xcp-persistent-database"
- Use the HA with "xcp-persistent-ha-statefile" and "xcp-persistent-redo-log" volumes
- Create the nodes automatically during SR creation

Signed-off-by: Ronan Abhamon <[email protected]>
…mes when master satellite is down

Steps to reproduce:

- Ensure the linstor satellite is not running on the master host, otherwise stop it
- Then restart the controller on the right host where the LINSTOR database is mounted
- Run st_attach command => All volumes will be forgotten

To avoid this, it's possible to restart the satellite on the master before the sr_attach command.
Also it's funny to see you can start and stop the satellite juste before the sr_attach, and the volumes will not be removed.

Explanations:

In theory this bug is impossible because during the sr_attach execution, an exception is thrown
(so sr_scan should not be executed) BUT there is a piece of code that is executed
in SRCommand.py when sr_attach is called:

```python
try:
    return sr.attach(sr_uuid)
finally:
    if is_master:
        sr.after_master_attach(sr_uuid)
```

The exception is not immediately forwarded because a finally block must be executed before.
And what is the implementation of after_master_attach?

```python
def after_master_attach(self, uuid):
    """Perform actions required after attaching on the pool master
    Return:
    None
    """
    self.scan(uuid)
```

Oh! Of course, a scan is always executed after a attach... What's the purpose of a scan if we can't
execute correctly an attach command before? I don't know, but it's probably error-prone like this context.
When scan is called, we suppose the SR is attached and we have all VDIs loaded but it's not the case
because an exception has been thrown.

To solve this problem we forbid the execution of the scan if the attach failed.

Signed-off-by: Ronan Abhamon <[email protected]>
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch from 44f2ee3 to bf38210 Compare December 20, 2023 14:01
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch 2 times, most recently from 4222231 to 3499398 Compare January 23, 2024 13:17
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch 3 times, most recently from 150d510 to f87c3eb Compare February 12, 2024 19:56
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch 6 times, most recently from 7238799 to f36a7a2 Compare April 29, 2024 15:22
@Nambrok Nambrok force-pushed the 2.30.8-8.2-linstor-fixes-staging branch from a217ee4 to 89f927e Compare May 7, 2024 13:22
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch from 89f927e to 2b01dd1 Compare May 31, 2024 13:41
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch 3 times, most recently from 76209bf to 8249dcc Compare June 13, 2024 11:25
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch 2 times, most recently from f9e6a8e to e7ffbab Compare June 28, 2024 13:09
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch 7 times, most recently from 51d4f89 to 3f63f6a Compare July 26, 2024 12:49
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch 2 times, most recently from 028c295 to 31d150b Compare August 6, 2024 15:17
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch from 04c2c93 to 0722952 Compare September 24, 2024 08:29
@Wescoeur Wescoeur force-pushed the 2.30.8-8.2-linstor-fixes-staging branch from 0722952 to 119dc63 Compare October 3, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants