-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP (not TLS) RTT time report not measuring the RTT. #505
Comments
I would expect that most servers even when "saturated" will still return ping packets and probably also "SYNC+ACK" packets close to immediately. Sure there is an interrupt latency and that sort of stuff, but I exect that to be negligible compared to common networking delays. The TCP test case should measure how long it takes for the app to return from the "connect" call. That's what it is meant to measure. If that requires multiple back-and-forth, then so be it, and that's what should be measured. For a non TLS connection, I'd expect SEND: SYN RCV: SYN+ACK SEND: <SYN+ACK> ... so you would see the second packet going from your mtr-running machine going to the other machine, but that time would not be counted. I studied networking in detail when TLS didn't quite exist yet. And never had the need to investigate. So I know it exists, but I don't know the details of how it works. To securely establish a transaction key, indeed more than one RTT is required, and I think that should count. Use ping or non-TLS connections to test the RTT. To verify that it makes sense to measure the time the server takes to prepare the SYN+ACK packet, I'd like you to do a manual measurement of that metric. You have mtr, normal ping packets and wireshark as tools. |
Appreciate the fast response, the title has a mistake. Should be TCP, not TLS, please ignore anything about TLS. Running the command But looking at what mtr is measuring when
TCP handshake needs 3 packets, mtr produces about 20. I do not understand what mtr is actually measuring. Is this a bug or by design? |
I think you'll find in your Wireshark trace that each connection port has a differing TTL.
It some ways it would be nicer if it operated at a lower level, inspecting the TCP flags itself, though that would mean that the code would be much more involved and surely a higher maintenance burden. Instead the code was implemented in a simpler way where the kernel's TCP implementation is used, and the measurement is the time it takes for an asynchronous connect system call to complete. |
Put it that way... Certainly sounds like a bug. :-( One of the things that is "difficult" is to know when to stop listening for replies. So it is very possible that mtr runs for 3 seconds (waiting for possible replies) even though it got a quick reply on the first probe. Hmm. OK. I just tested... When using TCP, mtr will issue a connect to the destination host, and this triggers the kernel to do "its thing". It seems that it is possible to set options like ttl on the socket so that the kernel uses that. But the kernel will then not get a valid response (SYN+ACK) from the destination and issue retries. It is debatable if the kernel should be doing that if it's seen the "host unreachable, TTL exceeded". Shouldn't the whole connection be dropped if that happens? On the other hand, with mtr specifically listening for these ICMP messages, maybe they do not get processed normally by the kernel. Anyway..... I tested mtr with tcp probes to 8.8.8.8 and what I see is that the reported times are in line with the expected network delays: (So far I was testing to 8.8.8.8: google's public DNS server. But this test is done with 142.251.36.36 as the target: One of the addresses that www.google.com resolves to for me. Anyway, there should be something running on port 80 on 142.251.36.36...)
TLDR: Cannot reproduce, works as intended here. |
If @cveto really only cares about the RTT to his intended destination to compare results with ICMP vs TCP, perhaps
That won't send a bunch of extra packets - it won't check every hop, only the final destination round trip time. Note that |
You should be able to get such behaviour from the commandline by setting min-ttl and max-ttl to values that are very close (not sure if they need to differ by 0 or 1) and sufficiently large. OK. Tested. Difference needs to be 0. So it'd be: |
I see. I will have to update the mtr to 0.95 (running on 0.92 at the moment) to get the @matt-kimball , when you say "destination round trip time", I will have to check what time is reported back. Below the Wireshark capture of the
|
I would expect the time measured to be close to the time between packet 1 and packet 3, since this is when the initial TCP three-packet handshake is complete. And, indeed, when I measure locally, my Are you seeing something different in your tests? |
Nope. Just some "sufficiently large" number. As long as it is the actual number of hops or larger. 32 was considered "should be high enough in all cases" until a few years ago it wasn't. Most OSes upgraded to 64, Google is not taking any chances and using 128. (*) (that's wat I learned from this thread. :-) ) I'd say the time between packet 1 and 2. That's when the kernel should signal to mtr-packet: Your socket opeened successfully, and also "prepare for sending out packet 3". Those two things could run in parallel, but probably don't. The difference is only 20 microseconds, so not much. Anyway. Matt and I are expecting a measurement of about 0.32 milliseconds (time from 1 to 2 or 3) for the time and not "1.1" milliseconds (time from 1-6). (*) 128 is a bad choice. Sure it pays to be sure, but if say "10 hops out" a routing loop crops up. The intent was that a packet would then go back and forth say 20 about times. This has worsened to 50 times because everybody uses 60, but google packets end up going back and forth over the loop 110 times! |
Thank you both for your time. The environment where I was testing this with mtr 0.92 was returning me number 201, which corresponded to the the time between about 9 packages. Wireshark showed that ACK package actually rook about 0.2ms. Doing test with mtr 0.95 right now in my own lab, mtr result matches what wireshark is telling me nicely, even if not using the I am putting this to a test in the original environment with mtr 0.95 to see if I can replicate it. Will respond when done. |
I can not reproduce my previous observations with mtr. I will continue to use it, but I have bundled it with tshark in a script in order to compare it with the TLS timestamps. Results with using:
Results with using
The results differ, but to about the same relative difference. This was run from a VM, I will run it from a physical server next. |
It's a feature connected with wait time calculation between packet sending (like |
Using command:
mtr --tcp --report -c 10 --port <port> <server>
I was expecting to get the time between the initial SYNC packet and the time when "SYNC+ACK" returns.
But looking with Wireshark, I get a time that included a few extra packages going back and forth.
Any way to get what I want? Trying to establish the difference between a RTT for TLS and RTT for non-TLS to see why server are taking this extra time to process.
The text was updated successfully, but these errors were encountered: