Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing host process on socket timeout #1690

Closed
kenkendk opened this issue Dec 9, 2024 · 9 comments
Closed

Crashing host process on socket timeout #1690

kenkendk opened this issue Dec 9, 2024 · 9 comments

Comments

@kenkendk
Copy link

kenkendk commented Dec 9, 2024

FTP Server OS: Linux

FTP Server Type: Unknown

Client Computer OS: Linux

FluentFTP Version: 50.1.0

Framework: .NET8

During a timeout it looks like the shutdown process causes an unhandled exception.

I think the cause is this line:
https://github.com/robinrodricks/FluentFTP/blob/master/FluentFTP/Streams/FtpSocketStream.cs#L560

In here it will attempt to close the connection due to a timeout, but if the socket is already faulty, the close call will fail as well.
Since this happens in the token callback, the uncaught exception has nowhere to go.

Logs :

2024-11-28T10:01:04.428515378Z Unhandled exception. System.TimeoutException: Timed out trying to read data from the socket stream!
2024-11-28T10:01:04.428549934Z    at FluentFTP.FtpSocketStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken token)
2024-11-28T10:01:04.428554434Z    at FluentFTP.FtpSocketStream.ReadLineAsync(Encoding encoding, CancellationToken token)
2024-11-28T10:01:04.428565138Z    at FluentFTP.Client.BaseClient.BaseFtpClient.FluentFTP.IInternalFtpClient.GetReplyInternal(CancellationToken token, String command, Boolean exhaustNoop, Int32 timeOut, Boolean useSema)
2024-11-28T10:01:04.428569601Z    at FluentFTP.Client.BaseClient.BaseFtpClient.FluentFTP.IInternalFtpClient.GetReplyInternal(CancellationToken token, String command)
2024-11-28T10:01:04.428572767Z    at FluentFTP.Client.BaseClient.BaseFtpClient.FluentFTP.IInternalFtpClient.CloseDataStreamInternal(FtpDataStream stream, CancellationToken token)
2024-11-28T10:01:04.428576045Z    at FluentFTP.Client.BaseClient.BaseFtpClient.FluentFTP.IInternalFtpClient.CloseDataStreamInternal(FtpDataStream stream, CancellationToken token)
2024-11-28T10:01:04.428579008Z    at FluentFTP.FtpDataStream.CloseAsync(CancellationToken token)
2024-11-28T10:01:04.428581767Z    at FluentFTP.FtpSocketStream.<>c__DisplayClass74_0.<<ReadAsync>b__0>d.MoveNext()
2024-11-28T10:01:04.428585082Z --- End of stack trace from previous location ---
2024-11-28T10:01:04.428587693Z    at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
2024-11-28T10:01:04.428590767Z    at System.Threading.QueueUserWorkItemCallback.Execute()
2024-11-28T10:01:04.428593527Z    at System.Threading.ThreadPoolWorkQueue.Dispatch()
2024-11-28T10:01:04.428596693Z    at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
@kenkendk
Copy link
Author

kenkendk commented Dec 9, 2024

I will see if the upgrade to 52.0.0 fares better. I can see that the close sequence was updated in the new version.
Note that the ReadAsync method that takes the Span<byte> input is still using the previous logic with the close happening in the CancellationToken.Register callback, where the byte[] method is updated.

I think there is still some logic related to handling the close, because a cancelled (outer) token will cause a different cancellation error than what is intended (will throw from within the close call, not the if statement later).
And in the case the socket is faulty, it will also throw a different exception, hiding the original error.

But at least the call chain is now preserved, and the exception should bubble to the caller.

@FanDjango
Copy link
Collaborator

Yes, please check this with V52.

As we should be ending up here:

https://github.com/FanDjango/FluentFTP/blob/0087db909a7e1ebac4b43e30cc8bbbb1f334ed1c/FluentFTP/Streams/FtpSocketStream.cs?plain=1#L1822C1-L1840C1

and I would be interested in how that "fares".

@kenkendk
Copy link
Author

It appears to still happen with 52.0.0, but I have not collected enough details yet:
duplicati/duplicati#5730 (comment)

@FanDjango
Copy link
Collaborator

FanDjango commented Dec 22, 2024

Ok, this one will be difficult. Stack trace might be helpful as a starter. I wonder about the exact order of events and methods called on this timeout. Oh, and we do now have some good logging in there also, so perhaps if you can get logs?

@kenkendk
Copy link
Author

@FanDjango , could it be something with multithreading? Duplicati will have multiple instances of AsyncFtpClient created assuming they are separate instances. If there is some cross-talk between the instances, that could explain perhaps one connection closing another instance?

There is a full stack trace in the issue I linked above, and I cannot see any retry happening inside FluentFTP.
But the error message would make a lot of sense if the connection was somehow closed prior to making the call.

I will try to see if the current "LogToConsole" is sufficient to get deeper into the problem, and add some additional logging options in case it is not.

@FanDjango
Copy link
Collaborator

FanDjango commented Dec 31, 2024

You have reported a number of isssues. I need to be sure that we can keep them separate.

#1686 This is classified as a bug, it is pending someone working on it, there is a workaround

#1694 This is classified as a bug, has been fixed in master, pending release, there is a workaround

#1690 (This one) This is not classified yet. Deity only knows what is going on here.


This one:

could it be something with multithreading?

I don't really think we still have holes in thread safety. We did have some in the FluentFTP.GnuTLS component, but only in the beginning and they were discovered right away. And without any logs I cannot even discern if you are using SslStream or GnuTLSStream.

...and I cannot see any retry happening inside FluentFTP.

#1694 was the issue which relates to "retry" activity.

But this issue, for me to try to help: I would need MORE information, especially logging. This would show up the sequence of events, across all client sessions. Note that FluentFTP logs of your clients should have a logger that prefixes the client-identificator in front of the log lines, so that in a multi threaded FTP environment, one can correlate log entries to their client.

So I must reiterate:

#1690 (comment)

During a timeout it looks like the shutdown process causes an unhandled exception.

I will try to recreate this somehow in my own environment.

My analysis of the stack trace you provided (at the top of this) and looking at the code:

(Presumably) a transfer ends. Upload or Download (it is important) is not known to me. A log would tell.
FtpDataStream is closed, higher logic has decided to do this. Why? When we decide a transfer has ended, or it needs to end, we close the data stream.
This data stream close ENTAILS a GetReply read on the socket of the control connection.
And you are right: If this control connection is also closed/broken, we get a failure on that read.

That part of the code you could call "faulty" and this needs to be fixed/addressed (I will do so, actually, but not this year :-) ).

But it would also be very very nice to find out what is going wrong in the first place - unless it is a total eclipse of the net - both connections breaking down simultaneously, maybe.

And that will be the first scenario I will test in order to reproduce this. Not this year, though.

Until then, I hope you can find out more, but I am sure we can make progress in this in the end.

@FanDjango
Copy link
Collaborator

Specifically, I can see the problem in CloseDataStream.cs:

 			try {
				if (IsConnected) {
					// Because the data connection was closed, if the command that required the data
					// connection was not successful then there will be no reply from the server,
					// however if the command was successful the server will send a reply .
					if (stream.CommandStatus.Type == FtpResponseType.PositivePreliminary) {
						if (!(reply = ((IInternalFtpClient)this).GetReplyInternal(LastCommandExecuted)).Success) {
							throw new FtpCommandException(reply);
						}
					}
				}
			}
			finally {
				// if this is a clone of the original control connection we should Dispose() the entire client
				if (IsClone) {
					((IInternalFtpClient)this).DisconnectInternal();
					((IInternalFtpClient)this).DisposeInternal();
				}
			}

The call to IsConnected needs to be enhanced or probably better, the exception needs to be handled.

Lemme look at this some more in 2025, ok?

@kenkendk
Copy link
Author

kenkendk commented Jan 1, 2025

Thanks for taking the time to sort out the mixed report threads.
For this one, my workaround is to simply handle unobserved exceptions, but good if it can get fixed.

@FanDjango
Copy link
Collaborator

Fixed in master CloseDataStream.cs. Will be released eventually.

@FanDjango FanDjango added the fixed label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants