You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
igmpproxy sometimes removes the mroute if "Membership Report" message is received close to "Max Resp Time" value specified in the "Membership Query" message
#95
Open
tonusoo opened this issue
May 4, 2023
· 2 comments
I have noticed that occasionally igmpproxy removes the multicast route if "Membership Report" message is received close to "Max Resp Time" value specified in the "Membership Query" message. For example, please see the packet capture below:
root@r1:~# tshark -Y 'frame.number in {25..27}' -r br0_igmp.pcap
Running as user "root" and group "root". This could be dangerous.
25 190.655057 192.168.0.1 → 224.0.0.1 IGMPv2 46 Membership Query, general
26 196.659055 192.168.0.1 → 224.0.0.1 IGMPv2 46 Membership Query, general
27 200.665434 192.168.0.12 → 239.3.1.106 IGMPv2 60 Membership Report group 239.3.1.106
root@r1:~#
This packet capture was made on a LAN-facing interface named br0 in a router running igmpproxy. As seen above, the "Membership Report" message was received 10.010377 seconds after the initial "Membership Query" message while the "Max Resp Time" value in the "Membership Query" messages was 10.0 seconds. This caused igmpproxy to remove the multicast route:
Sent membership query from 192.168.0.1 ifIndex 11 to 224.0.0.1. Delay: 10
Created timeout 11 (#0) - delay 10 secs
(Id:11, Time:10)
Created timeout 12 (#1) - delay 115 secs
(Id:11, Time:10)
(Id:12, Time:115)
About to call timeout 11 (#0)
Aging routes in table.
Removing group 239.3.1.106. Died of old age.
Removed route entry for 239.3.1.106 from table.
Vif bits : 0x00000001
Setting TTL for Vif 0 to 1
Removing MFC: 10.0.2.28 -> 239.3.1.106, InpVIf: 1
One might think that this is perhaps an expected behavior as "Membership Report" message was received later than the "Max Resp Time" of 10.0 seconds, but according to rfc2236 Section 3 the host receiving the "General Membership Query" is allowed to answer at 10.0th second and when adding network/processing delays, then it is IMHO expected that occasionally the "Membership Report" messages will arrive tiny bit later than 10.0 seconds.
In order to troubleshoot this better, I built a small lab with following topology:
h3[ens3] <-> [ens3]gw1[ens4] <-> [ens3]mr1
h3 is a virtual machine in multicast receiver role running socat -u UDP4-RECV:10,ip-add-membership=239.3.5.3:ens3 -. gw1 is a multicast router running igmpproxy version 0.4 with configuration below:
mr1 is in the multicast sender role and running send(IP(src='10.0.2.28', dst='239.3.5.3')/UDP(sport=10, dport=10), iface='ens3', inter=1./2, loop=1) in scapy which sends UDP datagrams addressed to 239.3.5.3 twice a second via ens3 interface in an endless loop. All three virtual machines are running Debian 10.
I have observed the similar issue on this lab setup. One occurrence can be seen on the screenshot below:
Upper tmux panes are h3 running tcpdump and socat, respectively. Middle panes are src/igmpproxy -d -vv /etc/igmpproxy.conf 2>&1 piped through timestamp input utility and custom packet capture utility. Lower right pane is running the scapy command mentioned above in mr1 and lower left pane is printing a timestamp in gw1 if the multicast route is missing:
while true; do ip mroute | grep -q '(10.0.2.28,239.3.5.3) *Iif: ens4 *Oifs: ens3 State: resolved' || date +'%T.%3N'; done
As highlighted on the screenshot, the multicast route was removed 0.9 seconds before the "Membership Report" message was received. The "Membership Report" message was received 10.153 seconds after the initial "Memberhip Query". However, I have seen this happen even when the delay between the query and the report has been 9.6 seconds. In addition, the multicast route is not removed each time when the delay is >10.0 seconds.
In order to avoid the removal of the multicast route, I relaxed the pselect() timeout:
root@gw1:~/igmpproxy# diff -u src/igmpproxy.c{~,}
--- src/igmpproxy.c~ 2023-05-05 01:09:08.551415847 +0300
+++ src/igmpproxy.c 2023-05-05 01:09:27.418989595 +0300
@@ -296,7 +296,7 @@
int MaxFD, Rt, secs;
fd_set ReadFDS;
socklen_t dummy = 0;
- struct timespec curtime, lasttime, difftime, tv;
+ struct timespec curtime, lasttime, difftime, tv, pselect_timeout;
// The timeout is a pointer in order to set it to NULL if nessecary.
struct timespec *timeout = &tv;
@@ -339,8 +339,12 @@
FD_ZERO( &ReadFDS );
FD_SET( MRouterFD, &ReadFDS );
+ memcpy(&pselect_timeout, timeout, sizeof(struct timespec));
+ pselect_timeout.tv_sec *= 2;
+ my_log(LOG_DEBUG, 0, "Calling pselect() with %ld.%09ld seconds timeout", pselect_timeout.tv_sec, pselect_timeout.tv_nsec);
+
// wait for input
- Rt = pselect( MaxFD +1, &ReadFDS, NULL, NULL, timeout, NULL );
+ Rt = pselect( MaxFD +1, &ReadFDS, NULL, NULL, &pselect_timeout, NULL );
// log and ignore failures
if( Rt < 0 ) {
root@gw1:~/igmpproxy#
Alternatively, at least in a lab environment, one could send frequent "Membership Report" messages from the multicast receiver host, e.g send(IP(dst="239.3.5.3")/IGMP(type=0x16, gaddr="239.3.5.3"), iface="ens3", inter=1./1, loop=1) in scapy.
Has anyone else observed this behavior? Please let me know if I can provide additional details.
The text was updated successfully, but these errors were encountered:
This is an interesting issue. It looks like that you are right that we should wait more than 10s for reply.
But your approach with fixup is wrong. pselect timeout is always maximally 3s and it is max sleep timeout for waiting for some packet. So if some other host send igmp packet after 10th second but before the host for which we are waiting then we still age route.
What is needed is to increase 10s wait timeout for reply to some higher value, not the sleep timeout when we waiting for any packet.
So if some other host send igmp packet after 10th second but before the host for which we are waiting then we still age route.
I overlooked the fact that there can be more than one multicast receiver at the same time. However, out of curiosity, I tried to create a situation where some other host sent an IGMP "Membership Report" message after the 10th second, but before the host for which we are waiting for. I ran a small script in h3(multicast receiver host), which started to listen for IGMP "Membership Report" messages addressed to 239.3.5.3 when IGMPv2 "General Membership Query" message from gw1(multicast router) was received. If the OS network stack had not sent the IGMP "Membership Report" message to 239.3.5.3 within 10 seconds, then the script sent IGMP "Membership Report" message to 239.3.5.1 address instead acting as some other host.
Custom packet capture script mentioned in my previous post was also updated in a way that it is able to show IGMPv2 "Membership Report" messages to 239.3.5.1 address.
Now during a 22 hours long time period there were 42 occurrences where igmpproxy with relaxed pselect() timeout received IGMP "Membership Report" message from another host . Example:
21:45:26.753845 - 0.000 - IGMPv2 General Memberhip Query: 192.168.0.1 > 224.0.0.1: Max Resp Time: 10.00 s
21:45:36.782432 - NA - IGMPv2 Memberhip Report: 192.168.0.50 > 239.3.5.1: Group Address: 239.3.5.1
21:45:36.853614 - 10.100 - IGMPv2 Memberhip Report: 192.168.0.100 > 239.3.5.3: Group Address: 239.3.5.3
As seen above, the "Membership Report" message to 239.3.5.1 from 192.168.0.50 sent by the script was received 10.028587 seconds after the "General Membership Query" message was sent. "Membership Report" message to 239.3.5.3 from 192.168.0.100 was received 0.071182 seconds later.
None of those 42 occurrences removed the multicast route. Screenshot with another such occurrence can be seen below:
I'm not arguing that it's a good idea to keep using the relaxed pselect() timeout patch. However, at least based on my tests, the multicast route never aged out.
What is needed is to increase 10s wait timeout for reply to some higher value, not the sleep timeout when we waiting for any packet.
I think it's better if you create a patch yourself as you understand the source code much better than I do. I'm happy to test it out.
I have noticed that occasionally
igmpproxy
removes the multicast route if "Membership Report" message is received close to "Max Resp Time" value specified in the "Membership Query" message. For example, please see the packet capture below:This packet capture was made on a LAN-facing interface named
br0
in a router runningigmpproxy
. As seen above, the "Membership Report" message was received 10.010377 seconds after the initial "Membership Query" message while the "Max Resp Time" value in the "Membership Query" messages was 10.0 seconds. This causedigmpproxy
to remove the multicast route:One might think that this is perhaps an expected behavior as "Membership Report" message was received later than the "Max Resp Time" of 10.0 seconds, but according to rfc2236 Section 3 the host receiving the "General Membership Query" is allowed to answer at 10.0th second and when adding network/processing delays, then it is IMHO expected that occasionally the "Membership Report" messages will arrive tiny bit later than 10.0 seconds.
In order to troubleshoot this better, I built a small lab with following topology:
h3
is a virtual machine in multicast receiver role runningsocat -u UDP4-RECV:10,ip-add-membership=239.3.5.3:ens3 -
.gw1
is a multicast router runningigmpproxy
version 0.4 with configuration below:mr1
is in the multicast sender role and runningsend(IP(src='10.0.2.28', dst='239.3.5.3')/UDP(sport=10, dport=10), iface='ens3', inter=1./2, loop=1)
in scapy which sends UDP datagrams addressed to239.3.5.3
twice a second viaens3
interface in an endless loop. All three virtual machines are running Debian 10.I have observed the similar issue on this lab setup. One occurrence can be seen on the screenshot below:
Upper
tmux
panes areh3
runningtcpdump
andsocat
, respectively. Middle panes aresrc/igmpproxy -d -vv /etc/igmpproxy.conf 2>&1
piped through timestamp input utility and custom packet capture utility. Lower right pane is running thescapy
command mentioned above inmr1
and lower left pane is printing a timestamp ingw1
if the multicast route is missing:As highlighted on the screenshot, the multicast route was removed 0.9 seconds before the "Membership Report" message was received. The "Membership Report" message was received 10.153 seconds after the initial "Memberhip Query". However, I have seen this happen even when the delay between the query and the report has been 9.6 seconds. In addition, the multicast route is not removed each time when the delay is >10.0 seconds.
In order to avoid the removal of the multicast route, I relaxed the
pselect()
timeout:Alternatively, at least in a lab environment, one could send frequent "Membership Report" messages from the multicast receiver host, e.g
send(IP(dst="239.3.5.3")/IGMP(type=0x16, gaddr="239.3.5.3"), iface="ens3", inter=1./1, loop=1)
in scapy.Has anyone else observed this behavior? Please let me know if I can provide additional details.
The text was updated successfully, but these errors were encountered: