-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LDM should reorder UDP packets before ingesting them #74
Comments
Hi @childofthewired, The LDM was designed assuming that the DVB-S receiver would have a hard wired connection (with some layer 2 switches, possibly) to the computer running the noaaportIngester(1) program. That's the case here, at our client universities, and at all of NOAA's WFO-s. We run multiple instances of noaaportIngester(1) on separate computers for redundancy and have a reliability rate that's at least 99.999% AFAIK, the AWIPS system does the same. WAN applications that require UDP packets to be delivered in order should, indeed, have a mechanism for re-ordering packets. There is a piece of software from the University of Wisconsin that performs this. We've successfully used this software between the NOAAPort receiver and the noaaportIngester(1) program over a WAN. Perhaps that could solve your problem. |
Thanks @semmerson , and sorry for resurrecting an old thread. I tried to search for the University of Wisconsin software you mentioned to re-order UDP packets, but couldn't find anything that looked right. Can you point me in the right direction, please? |
@johnsimcall Hang on. We're talking amongst ourselves about the best solution for you. |
@johnsimcall A Novra can't have a bonded NIC. Would you please explain how one comes about and how it increases reliability when the Novra can't use one. |
@johnsimcall Would it be possible to use active-backup mode in the bonded interface? This would ensure redundancy, and with a sufficiently large receive buffer setting in noaaportIngester(1), should allow a VM to easily keep up with the maximum NOAAPort bit-rate of 60 MHz. |
Thanks @semmerson , you're right, the Novra has a single network connection. I'll attempt to better describe the environment where @childofthewired and I are seeing out-of-order packets. The Novra is connected to a switch (
We have also tried to connect Thank you for suggesting to create a large receive buffer setting in noaaportIngester, we'll take a look at that.
|
Hi John,
The Novra is connected to a switch (switch1) which in turn connects to a
pair of Juniper switches (switch2 & switch3) that are configured as a
single logical unit / virtual chassis
<https://www.juniper.net/documentation/us/en/software/junos/virtual-chassis-qfx/index.html>.
The hypervisor server (Dell) is connected, via LACP/802.3ad bonding, to
switch2 & switch3.
One of us here thinks that 802.3ad bonding should act similar to
"active-backup" mode -- which doesn't reorder packets -- but isn't certain.
We have also tried to connect switch1 directly to the Dell server, but we
still see out-of-order issues.
That's consistent with the Dell hypervisor reordering the packets.
Thank you for suggesting to create a large receive buffer setting in
noaaportIngester, we'll take a look at that.
If the hypervisor is reordering the packets, then that won't work. Please
let us know.
I'm also going to see if the Juniper equipment being used supports the
"strict-packet-order" configuration. The documentation
<https://www.juniper.net/documentation/us/en/software/junos/flow-packet-processing/topics/ref/statement/security-edit-flow-security-flow.html>
says
strict-packet-order | You can use this command to maintain multicast
traffic order and resolve packet drop issue
Same issue if the problem lies with the hypervisor.
We do use a utility here that sits between the NOAAPort stream and
noaaportIngester(1) and ensures that the NOAAPort frames are in strictly
monotonic order. It might need modification to fit your situation.
If you would like to Google Meet to discuss this reordering issue, we're
available.
…--Steve
Message ID: ***@***.***>
|
Thank you @semmerson ! I'll reach out after the New Year to see if we can chat for a few minutes. Happy holidays! |
Oops, I forgot to post the resolution to this which was discovered by Sean Webb in Jan 2023. Sean discovered that having two NICs
|
You can actively feed from two (or more) Novra modems at the same time to
create data feed redundancy by at least a couple of mechanisms.
The longest utilized methodology takes advantage of the NAT'ing capability
of the Novra modem itself. If using the Linux cmcs command, the first step
is to save off the working configuration of the modem using the following:
cmcs -ip <ip address> -pw <appropriate password> -save
working_configuration.xml
The resulting file can be hand-edited to create a NAT'ing of the multicast
IP address, such that the block that looks like this:
<CONTENT>
<TRANSPORT_STREAM PIDS="Selected">
<PID Number="101" Processing="MPE" />
<PID Number="102" Processing="MPE" />
<PID Number="103" Processing="MPE" />
<PID Number="104" Processing="MPE" />
<PID Number="105" Processing="MPE" />
<PID Number="106" Processing="MPE" />
<PID Number="107" Processing="MPE" />
<PID Number="108" Processing="MPE" />
<PID Number="150" Processing="MPE" />
<PID Number="151" Processing="MPE" />
<PID Number="NULL" Processing="RAW" />
</TRANSPORT_STREAM>
<IP_REMAP_TABLE Enabled="false" RemapSourceIP="false" />
</CONTENT>
Would be look like this:
<CONTENT>
<TRANSPORT_STREAM PIDS="Selected">
<PID Number="101" Processing="MPE" />
<PID Number="102" Processing="MPE" />
<PID Number="103" Processing="MPE" />
<PID Number="104" Processing="MPE" />
<PID Number="105" Processing="MPE" />
<PID Number="106" Processing="MPE" />
<PID Number="107" Processing="MPE" />
<PID Number="108" Processing="MPE" />
<PID Number="150" Processing="MPE" />
<PID Number="151" Processing="MPE" />
<PID Number="NULL" Processing="RAW" />
</TRANSPORT_STREAM>
<IP_REMAP_TABLE Enabled="true" RemapSourceIP="false">
<IP_Remap_Rule Original_IP="224.0.1.1" New_IP="224.3.2.1"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.2" New_IP="224.3.2.2"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.3" New_IP="224.3.2.3"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.4" New_IP="224.3.2.4"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.5" New_IP="224.3.2.5"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.6" New_IP="224.3.2.6"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.7" New_IP="224.3.2.7"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.8" New_IP="224.3.2.8"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.9" New_IP="224.3.2.9"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.10" New_IP="224.3.2.10"
Mask="255.255.255.255" TTL="0" Action="Forward" />
</IP_REMAP_TABLE>
</CONTENT>
And this would be for only one of the Novras, leaving the other the same.
This will shift the traffic from the altered Novra for the NMC channel, for
instance, from 224.0.1.1:1201 to 224.3.2.1:1201 . . . you could even have
both modems on the same NIC via a hub or switch, and use the "-m" flag for
the noaaportIngester to differentiate the shifted multicast address. Two
instances of noaaportIngester would be receiving the same PID, but from two
different modems, and deduplication would occur on the LDM queue.
Another method would be more complicated and use the newly released
"blender" that is now part of the LDM source. It does require a lot more
setup, but basically you would stream the multicast data directly into a
"fanout" service using socat to translate from the UDP multicast to a TCP
p2p stream, and then the "blender" would accept streams from multiple
channels to merge the streams at a frame level, with the purpose of have a
near-perfect stream.
*Stonie Cooper*, PhD
Software Engineer III
NSF Unidata Program Center
University Corporation for Atmospheric Research
*I acknowledge that the land I live and work on is the traditional home of
The **Chahiksichahiks (**Pawnee), The **Umoⁿhoⁿ (**Omaha), and The **Jiwere
(**Otoe).*
…On Tue, Jan 2, 2024 at 7:54 PM John Call ***@***.***> wrote:
Oops, I forgot to post the resolution to this which was discovered by Sean
Webb in Jan 2023. Sean discovered that having two NICs up/online resulted
in duplicated, dropped, and out-of-order packet delivery. Shutting down the
second NIC resolved the issue -- however the procedure for shutting down
the NIC changed between RHEL7 (ifdown eth0) and RHEL8 (ip link set eth0
down) Please note that nmcli con down eth0 command in RHEL8 is not
sufficient because that command removes the IP configuration from the NIC,
but doesn't set the link status to down. A custom NetworkManager
dispatcher script
<https://man.archlinux.org/man/NetworkManager-dispatcher.8.en> can be
created to set the link status to down when the second/backup NIC is not
being used.
Ok, when I was taking another look at this to get some RHEL7 vs RHEL8
packet captures, I found the issue
So one thing we didn't show is that we actually have 2 Novra DVB
receivers. I think the two paths look like this:
Novra1 ---> | switch1, port 20 (vlan.101) |
| switch1, port 21 (vlan.101) | ---> Dell Server1 eno3/sbn1
| switch1, port 22 (vlan.101) | ---> Dell Server2 eno3/sbn1
| switch1, port 23 (vlan.101) | ---> Dell Server3 eno3/sbn1 == linux-rhv-bridge == VM (rhel8-vm1 eth0)
Novra2 ---> | switch2, port 20 (vlan.201) |
| switch2, port 21 (vlan.201) | ---> Dell Server1 eno4/sbn2
| switch2, port 22 (vlan.201) | ---> Dell Server2 eno4/sbn2
| switch2, port 23 (vlan.201) | ---> Dell Server3 eno4/sbn2 == linux-rhv-bridge == VM (rhel8-vm1 eth1)
If I change one of the VM's NIC link state to DOWN then the GAPS GO AWAY!
So this issue hasn't been that the data is coming in out of order on the
interface, the issue is that BOTH interfaces are simultaneously
broadcasting their multicast data even when we run "nmcli con down eth0".
So LDM was receiving the multicast data from BOTH interfaces and seeing
them out of order and discarding most of the data.
The difference is how we are managing the interface between rhel7 and
rhel8. In rhel7 we were using ifdown eth0 to shut down the inactive sbn
interface, in which case the interface looked like this:
4: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisk noop state DOWN group
default qlen 1000
link/ether 56:6f:5d:e2:00:26 brd ff:ff:ff:ff:ff:ff
In rhel8, we were using "nmcli con down eth0" but the interface was still
UP, it just didn't have an IP assigned
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
UP group default qlen 1000
link/ether 56:6f:5d:e2:00:26 brd ff:ff:ff:ff:ff:ff
—
Reply to this email directly, view it on GitHub
<#74 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3LXQSNQ7TBKD5X6CIILAY3YMS26FAVCNFSM4RXILTL2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBXGQ3TMMRZHAYQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Additionally, the edited xml file will need to be loaded using the cmcs
-load edited_configuration.xml command.
*Stonie Cooper*, PhD
Software Engineer III
NSF Unidata Program Center
University Corporation for Atmospheric Research
*I acknowledge that the land I live and work on is the traditional home of
The **Chahiksichahiks (**Pawnee), The **Umoⁿhoⁿ (**Omaha), and The **Jiwere
(**Otoe).*
…On Tue, Jan 2, 2024 at 8:23 PM Stonie Cooper ***@***.***> wrote:
You can actively feed from two (or more) Novra modems at the same time to
create data feed redundancy by at least a couple of mechanisms.
The longest utilized methodology takes advantage of the NAT'ing capability
of the Novra modem itself. If using the Linux cmcs command, the first step
is to save off the working configuration of the modem using the following:
cmcs -ip <ip address> -pw <appropriate password> -save
working_configuration.xml
The resulting file can be hand-edited to create a NAT'ing of the multicast
IP address, such that the block that looks like this:
<CONTENT>
<TRANSPORT_STREAM PIDS="Selected">
<PID Number="101" Processing="MPE" />
<PID Number="102" Processing="MPE" />
<PID Number="103" Processing="MPE" />
<PID Number="104" Processing="MPE" />
<PID Number="105" Processing="MPE" />
<PID Number="106" Processing="MPE" />
<PID Number="107" Processing="MPE" />
<PID Number="108" Processing="MPE" />
<PID Number="150" Processing="MPE" />
<PID Number="151" Processing="MPE" />
<PID Number="NULL" Processing="RAW" />
</TRANSPORT_STREAM>
<IP_REMAP_TABLE Enabled="false" RemapSourceIP="false" />
</CONTENT>
Would be look like this:
<CONTENT>
<TRANSPORT_STREAM PIDS="Selected">
<PID Number="101" Processing="MPE" />
<PID Number="102" Processing="MPE" />
<PID Number="103" Processing="MPE" />
<PID Number="104" Processing="MPE" />
<PID Number="105" Processing="MPE" />
<PID Number="106" Processing="MPE" />
<PID Number="107" Processing="MPE" />
<PID Number="108" Processing="MPE" />
<PID Number="150" Processing="MPE" />
<PID Number="151" Processing="MPE" />
<PID Number="NULL" Processing="RAW" />
</TRANSPORT_STREAM>
<IP_REMAP_TABLE Enabled="true" RemapSourceIP="false">
<IP_Remap_Rule Original_IP="224.0.1.1" New_IP="224.3.2.1"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.2" New_IP="224.3.2.2"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.3" New_IP="224.3.2.3"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.4" New_IP="224.3.2.4"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.5" New_IP="224.3.2.5"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.6" New_IP="224.3.2.6"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.7" New_IP="224.3.2.7"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.8" New_IP="224.3.2.8"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.9" New_IP="224.3.2.9"
Mask="255.255.255.255" TTL="0" Action="Forward" />
<IP_Remap_Rule Original_IP="224.0.1.10" New_IP="224.3.2.10"
Mask="255.255.255.255" TTL="0" Action="Forward" />
</IP_REMAP_TABLE>
</CONTENT>
And this would be for only one of the Novras, leaving the other the same.
This will shift the traffic from the altered Novra for the NMC channel, for
instance, from 224.0.1.1:1201 to 224.3.2.1:1201 . . . you could even have
both modems on the same NIC via a hub or switch, and use the "-m" flag for
the noaaportIngester to differentiate the shifted multicast address. Two
instances of noaaportIngester would be receiving the same PID, but from two
different modems, and deduplication would occur on the LDM queue.
Another method would be more complicated and use the newly released
"blender" that is now part of the LDM source. It does require a lot more
setup, but basically you would stream the multicast data directly into a
"fanout" service using socat to translate from the UDP multicast to a TCP
p2p stream, and then the "blender" would accept streams from multiple
channels to merge the streams at a frame level, with the purpose of have a
near-perfect stream.
*Stonie Cooper*, PhD
Software Engineer III
NSF Unidata Program Center
University Corporation for Atmospheric Research
*I acknowledge that the land I live and work on is the traditional home of
The **Chahiksichahiks (**Pawnee), The **Umoⁿhoⁿ (**Omaha), and The **Jiwere
(**Otoe).*
On Tue, Jan 2, 2024 at 7:54 PM John Call ***@***.***> wrote:
> Oops, I forgot to post the resolution to this which was discovered by
> Sean Webb in Jan 2023. Sean discovered that having two NICs up/online
> resulted in duplicated, dropped, and out-of-order packet delivery. Shutting
> down the second NIC resolved the issue -- however the procedure for
> shutting down the NIC changed between RHEL7 (ifdown eth0) and RHEL8 (ip
> link set eth0 down) Please note that nmcli con down eth0 command in
> RHEL8 is not sufficient because that command removes the IP configuration
> from the NIC, but doesn't set the link status to down. A custom NetworkManager
> dispatcher script
> <https://man.archlinux.org/man/NetworkManager-dispatcher.8.en> can be
> created to set the link status to down when the second/backup NIC is not
> being used.
>
> Ok, when I was taking another look at this to get some RHEL7 vs RHEL8
> packet captures, I found the issue
> So one thing we didn't show is that we actually have 2 Novra DVB
> receivers. I think the two paths look like this:
>
> Novra1 ---> | switch1, port 20 (vlan.101) |
> | switch1, port 21 (vlan.101) | ---> Dell Server1 eno3/sbn1
> | switch1, port 22 (vlan.101) | ---> Dell Server2 eno3/sbn1
> | switch1, port 23 (vlan.101) | ---> Dell Server3 eno3/sbn1 == linux-rhv-bridge == VM (rhel8-vm1 eth0)
>
> Novra2 ---> | switch2, port 20 (vlan.201) |
> | switch2, port 21 (vlan.201) | ---> Dell Server1 eno4/sbn2
> | switch2, port 22 (vlan.201) | ---> Dell Server2 eno4/sbn2
> | switch2, port 23 (vlan.201) | ---> Dell Server3 eno4/sbn2 == linux-rhv-bridge == VM (rhel8-vm1 eth1)
>
> If I change one of the VM's NIC link state to DOWN then the GAPS GO AWAY!
> So this issue hasn't been that the data is coming in out of order on the
> interface, the issue is that BOTH interfaces are simultaneously
> broadcasting their multicast data even when we run "nmcli con down eth0".
> So LDM was receiving the multicast data from BOTH interfaces and seeing
> them out of order and discarding most of the data.
>
> The difference is how we are managing the interface between rhel7 and
> rhel8. In rhel7 we were using ifdown eth0 to shut down the inactive sbn
> interface, in which case the interface looked like this:
> 4: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisk noop state DOWN group
> default qlen 1000
> link/ether 56:6f:5d:e2:00:26 brd ff:ff:ff:ff:ff:ff
>
> In rhel8, we were using "nmcli con down eth0" but the interface was still
> UP, it just didn't have an IP assigned
> 4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
> UP group default qlen 1000
> link/ether 56:6f:5d:e2:00:26 brd ff:ff:ff:ff:ff:ff
>
> —
> Reply to this email directly, view it on GitHub
> <#74 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/A3LXQSNQ7TBKD5X6CIILAY3YMS26FAVCNFSM4RXILTL2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBXGQ3TMMRZHAYQ>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
|
LDM Version 6.13.12.69
Environment:
Red Hat Enterprise Linux 7
RHV Hypervisor
Juniper Switch/Cisco Switch
Satellite Receiver
NOAAPORT UDP Packets that do not arrive in order are rejected and readnoaaport.c throws an error.
While this appears to be by design, this impacts utilization of LDM in a load balanced or virtual machine environment where Bonded NICs are used in either the hypervisor or on a physical host.
We have worked around this by disabling the bonded for the interface that the NOAAPORT data arrives on, but this greatly reduces the reliability of the hardware.
LDM cannot reorder the UDP packets, and so it drops the product, even though the entire product exists in the data.
RFC guidelines for UDP usage states that: "Applications that require ordered delivery MUST reestablish datagram ordering themselves."
https://tools.ietf.org/html/rfc8085#section-3.3
The text was updated successfully, but these errors were encountered: