Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDM should reorder UDP packets before ingesting them #74

Open
childofthewired opened this issue Sep 23, 2020 · 11 comments
Open

LDM should reorder UDP packets before ingesting them #74

childofthewired opened this issue Sep 23, 2020 · 11 comments

Comments

@childofthewired
Copy link

LDM Version 6.13.12.69

Environment:

Red Hat Enterprise Linux 7
RHV Hypervisor
Juniper Switch/Cisco Switch
Satellite Receiver

NOAAPORT UDP Packets that do not arrive in order are rejected and readnoaaport.c throws an error.

While this appears to be by design, this impacts utilization of LDM in a load balanced or virtual machine environment where Bonded NICs are used in either the hypervisor or on a physical host.

We have worked around this by disabling the bonded for the interface that the NOAAPORT data arrives on, but this greatly reduces the reliability of the hardware.

LDM cannot reorder the UDP packets, and so it drops the product, even though the entire product exists in the data.

RFC guidelines for UDP usage states that: "Applications that require ordered delivery MUST reestablish datagram ordering themselves."

https://tools.ietf.org/html/rfc8085#section-3.3

@semmerson
Copy link
Collaborator

Hi @childofthewired,

The LDM was designed assuming that the DVB-S receiver would have a hard wired connection (with some layer 2 switches, possibly) to the computer running the noaaportIngester(1) program. That's the case here, at our client universities, and at all of NOAA's WFO-s.

We run multiple instances of noaaportIngester(1) on separate computers for redundancy and have a reliability rate that's at least 99.999% AFAIK, the AWIPS system does the same.

WAN applications that require UDP packets to be delivered in order should, indeed, have a mechanism for re-ordering packets. There is a piece of software from the University of Wisconsin that performs this. We've successfully used this software between the NOAAPort receiver and the noaaportIngester(1) program over a WAN. Perhaps that could solve your problem.

@johnsimcall
Copy link

Thanks @semmerson , and sorry for resurrecting an old thread. I tried to search for the University of Wisconsin software you mentioned to re-order UDP packets, but couldn't find anything that looked right. Can you point me in the right direction, please?

@semmerson
Copy link
Collaborator

@johnsimcall Hang on. We're talking amongst ourselves about the best solution for you.

@semmerson semmerson reopened this Dec 21, 2022
@semmerson
Copy link
Collaborator

@johnsimcall A Novra can't have a bonded NIC. Would you please explain how one comes about and how it increases reliability when the Novra can't use one.

@semmerson
Copy link
Collaborator

@johnsimcall Would it be possible to use active-backup mode in the bonded interface? This would ensure redundancy, and with a sufficiently large receive buffer setting in noaaportIngester(1), should allow a VM to easily keep up with the maximum NOAAPort bit-rate of 60 MHz.

@johnsimcall
Copy link

Thanks @semmerson , you're right, the Novra has a single network connection. I'll attempt to better describe the environment where @childofthewired and I are seeing out-of-order packets.

The Novra is connected to a switch (switch1) which in turn connects to a pair of Juniper switches (switch2 & switch3) that are configured as a single logical unit / virtual chassis. The hypervisor server (Dell) is connected, via LACP/802.3ad bonding, to switch2 & switch3. A Virtual Machine on the hypervisor server, with a single virtual NIC, runs the LDM software and sees out-of-order packet delivery.

                   / -- switch2 -- \
Novra -- switch1 <        ||         > == Dell ~~ VM(LDM)
                   \ -- switch3 -- /

We have also tried to connect switch1 directly to the Dell server, but we still see out-of-order issues.

Thank you for suggesting to create a large receive buffer setting in noaaportIngester, we'll take a look at that.
I'm also going to see if the Juniper equipment being used supports the "strict-packet-order" configuration. The documentation says

strict-packet-order | You can use this command to maintain multicast traffic order and resolve packet drop issue

@semmerson
Copy link
Collaborator

semmerson commented Dec 22, 2022 via email

@johnsimcall
Copy link

If you would like to Google Meet to discuss this reordering issue, we're available.

Thank you @semmerson ! I'll reach out after the New Year to see if we can chat for a few minutes. Happy holidays!

@johnsimcall
Copy link

Oops, I forgot to post the resolution to this which was discovered by Sean Webb in Jan 2023. Sean discovered that having two NICs up/online resulted in duplicated, dropped, and out-of-order packet delivery. Shutting down the second NIC resolved the issue -- however the procedure for shutting down the NIC changed between RHEL7 (ifdown eth0) and RHEL8 (ip link set eth0 down) Please note that nmcli con down eth0 command in RHEL8 is not sufficient because that command removes the IP configuration from the NIC, but doesn't set the link status to down. A custom NetworkManager dispatcher script can be created to set the link status to down when the second/backup NIC is not being used.

Ok, when I was taking another look at this to get some RHEL7 vs RHEL8 packet captures, I found the issue
So one thing we didn't show is that we actually have 2 Novra DVB receivers. I think the two paths look like this:

Novra1 ---> | switch1, port 20 (vlan.101) |
            | switch1, port 21 (vlan.101) | ---> Dell Server1 eno3/sbn1
            | switch1, port 22 (vlan.101) | ---> Dell Server2 eno3/sbn1
            | switch1, port 23 (vlan.101) | ---> Dell Server3 eno3/sbn1 == linux-rhv-bridge == VM (rhel8-vm1 eth0)

Novra2 ---> | switch2, port 20 (vlan.201) |
            | switch2, port 21 (vlan.201) | ---> Dell Server1 eno4/sbn2
            | switch2, port 22 (vlan.201) | ---> Dell Server2 eno4/sbn2
            | switch2, port 23 (vlan.201) | ---> Dell Server3 eno4/sbn2 == linux-rhv-bridge == VM (rhel8-vm1 eth1)

If I change one of the VM's NIC link state to DOWN then the GAPS GO AWAY! So this issue hasn't been that the data is coming in out of order on the interface, the issue is that BOTH interfaces are simultaneously broadcasting their multicast data even when we run "nmcli con down eth0". So LDM was receiving the multicast data from BOTH interfaces and seeing them out of order and discarding most of the data.

The difference is how we are managing the interface between rhel7 and rhel8. In rhel7 we were using ifdown eth0 to shut down the inactive sbn interface, in which case the interface looked like this:
4: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisk noop state DOWN group default qlen 1000
link/ether 56:6f:5d:e2:00:26 brd ff:ff:ff:ff:ff:ff

In rhel8, we were using "nmcli con down eth0" but the interface was still UP, it just didn't have an IP assigned
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 56:6f:5d:e2:00:26 brd ff:ff:ff:ff:ff:ff

@stonecooper
Copy link
Contributor

stonecooper commented Jan 3, 2024 via email

@stonecooper
Copy link
Contributor

stonecooper commented Jan 3, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants