Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add switchdev-online service to hold off on network init #56

Merged
merged 4 commits into from
Mar 9, 2021

Conversation

rothcar
Copy link
Contributor

@rothcar rothcar commented Mar 9, 2021

Add switchdev-online service to hold off on network init

This addresses a race condition between the switchdev driver, udev
and ifupdown2. In some cases, ifupdown2 attempts to configure
switchdev interfaces (1) before they are ready, or (2) before any
udev rules have had a chance to rename them per udev configuration.
When this happens, not all interfaces are configured correctly.

The solution is to

  1. create a dummy systemd service for each switchdev interface
    (platform specific) that depends on the well-known systemd
    device path for that netdev (currently
    'sys-subsystem-net-devices-FOO.device').
  2. Make the system-wide network-pre.service depend on all
    switchdev devices coming online
  3. make ifupdown2 (via networking.service) depend on network-pre.service

See
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

The addresses a race condition between the switchdev driver, udev
and ifupdown2. In some cases, ifupdown2 attempts to configure
switchdev interfaces (1) before they are ready, or (2) before any
udev rules have had a chance to rename them per udev configuration.
When this happens, not all interfaces are configured correctly.

The solution is to

1. create a dummy systemd service for each switchdev interface
   (platform specific) that depends on the well-known systemd
   device path for that netdev (currently
   'sys-subsystem-net-devices-FOO.device'.
2. Make the system-wide network-pre.service depend on *all*
   switchdev devices coming online
3. make ifupdown2 (via networking.service) depend on network-pre.service

See
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Signed-off-by: Carl Roth <[email protected]>
- based on fixes for non-DN systems
- please check my work, I don't have physical access to these
  systems to verify port count

Signed-off-by: Carl Roth <[email protected]>
@paulmenzel
Copy link
Contributor

Two typos:

  1. This addresses
  2. Missing closing ) in item 2.

I wonder why no other ONL instances had these problems. Does this delay the startup somehow noticeably (systemd-analyze critical-chain)?

I guess it’s time to update to Debian 11 (bullseye) and switch to systemd-networkd soon.

@rothcar
Copy link
Contributor Author

rothcar commented Mar 9, 2021

I wonder why no other ONL instances had these problems. Does this delay the startup somehow noticeably (systemd-analyze critical-chain)?

Not sure why no one noticed this yet. The general issue is that (1) the switchdev driver initializes front panel ports much more slowly that e.g. the management ethernet on the SoC and (2) ifupdown2 does not explicitly wait for devices specified in the ENI file.

I guess it’s time to update to Debian 11 (bullseye) and switch to systemd-networkd soon.

I can't tell from looking at the online discussions how good the support is for mstpd in systemd-networkd.

# See e.g.
# https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Wants=network-pre.target
Before=network-pre.target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering about this line, as the drop-in above has

After=network-pre.target

Isn’t that a contradiction?

/cc @tjjh89017

Copy link

@tjjh89017 tjjh89017 Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulmenzel
Suppose you mean After=network-pre.target in networking.service.d/switchdev-online.conf conflicts with Before=network-pre.target in [email protected] (these are two different service description file)

In Rothcar's patch, he want to guarantee that networking.service will always be executed after ONLP insert ASIC driver.
he made [email protected] will always be before network-pre.target.
and networking.service (ifupdown2) will always be after network-pre.target

So you can see this order below in Rothcar's patch:

  1. execute [email protected] (load driver, let Linux can see those port)
  2. reach network-pre.target
  3. execute networking.service (load config. if ports isn't existing, it will fail)

and in my PR #65
try to guarantee this order below (from No. 3 to 5)

  1. execute [email protected] (load driver, let Linux can see those port)
  2. reach network-pre.target
  3. execute networking.service (load config. if ports and config are mismatch, it will fail)
  4. reach network.target
  5. execute other service which need network (e.g. isc-dhcp-server. if ports are not configured correctly, it will fail)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I totally screwed up, and in networking.service.d/switchdev-online.conf only read switchdev-online and associated it with the switchdev-online unit. But it belongs to to the unit networking.service. Sorry about that.

ideaship added a commit to bisdn/meta-open-network-linux that referenced this pull request Apr 25, 2022
This is probably _not_ something we want to keep, it seems to be a
workaround that may be needed only on Debian 10 (see the original
PR for dentOS):

Files from:
https://github.com/dentproject/dentOS/tree/main/builds/any/rootfs/stretch/common/overlay/etc/systemd/system

Enable swp (switch port) services based on information from:
https://github.com/dentproject/dentOS/blob/main/packages/platforms/delta/arm64/tn48m-dn/tn48m-dn/platform-config/r0/src/python/arm64_delta_tn48m_dn_r0/__init__.py

Original pull request in dentOS:
  Add switchdev-online service to hold off on network init
  dentproject/dentOS#56

Signed-off-by: Roger Luethi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants