Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SocketWrapper MbedClient debugging readSocket #912

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JAndrassy
Copy link
Contributor

@JAndrassy JAndrassy commented Jun 29, 2024

  1. In connect we have to delete the socket object, because all other functions test it for null and then use mutex which is null, because configureSocket was not invoked. And there are too few sockets in total.

  2. In readSocket the condition of the inner loop was always true and the inner loop never exited to wait for event or timeout on event. The inner loop only paused for yield(). With the PR if no data are available the inner loop does break to outer loop where the thread waits for event or timeout.

3) I think we have to do mutex in write too

@andreagilardoni
Copy link
Contributor

Hi @JAndrassy, I agree with the changes you made for point 1.
For point 2 are you trying to fix a particular issue that is affecting you? Apart from missing mutex locks and unlocks and the nice code reordering, I don't see any huge changes in code that may impact functionality on MbedClient.
For point 3, I expect it to work without the need of mutex, since I think tx and rx buffer are separated, I will take a look at the lower level implementation in order to be sure

@JAndrassy
Copy link
Contributor Author

JAndrassy commented Jul 1, 2024

@andreagilardoni so should we add everywhere to if (sock == NULL), && mutex == NULL?

https://os.mbed.com/docs/mbed-os/v6.16/mbed-os-api-doxy/class_socket.html#abc36eff670a3bee145f1c629a7eb7ee3
has

If connect() fails it is recommended to close the Socket and create a new one before attempting to reconnect.

I didn't look in the source but I think the unconnected socket still counts as used one from socket-max

@andreagilardoni
Copy link
Contributor

I think the only method missing the check for sock==nullptr is read(), I think it is enough to return -1 there. I think it is just enough to check for sock, since mutex should be != nullptr when the sock is != nullptr after the changes you proposed here.

The changes on lines 123:124 are necessary.
The changes on line 220, 224 I don't think that may affect anything except performances.

I only need to understand why you performed changes on readSocket(), since to me everything seems to be a rearrangement on if statements (with some additional actions on mutex). Did you experience any kind of issue that I can try to replicate?

@JAndrassy
Copy link
Contributor Author

JAndrassy commented Jul 1, 2024

changes on readSocket()

continue change to brreak if no data are available to read as I write in the description of the PR

@JAndrassy JAndrassy force-pushed the mbedclient_readsocket_fixes branch 2 times, most recently from 00b5add to a5fc13f Compare July 2, 2024 04:20
@JAndrassy
Copy link
Contributor Author

JAndrassy commented Jul 2, 2024

changes:

  • for unsuccessful connect socket->close (it was delete sock in the first version of the PR)
  • using mutex == null as a check for a not configured client

@JAndrassy JAndrassy force-pushed the mbedclient_readsocket_fixes branch 2 times, most recently from 444339a to 00bc011 Compare July 2, 2024 05:07
@andreagilardoni
Copy link
Contributor

@JAndrassy after looking at that everything seems ok, I need to try to run some examples and then I think we can merge this PR. I like how you reorganized the thread function, thanks for your contribution!

@JAndrassy JAndrassy force-pushed the mbedclient_readsocket_fixes branch from 00bc011 to a40c63f Compare July 2, 2024 10:37
@JAndrassy
Copy link
Contributor Author

There is one more problem with the readSocket thread, but if I patch the solution in, it screams for a rewrite of the whole inner loop.

the problem is that on err < 0 the thread ends. it can be just that the peer closed the connection. on a new connect without stop(), the connection is established, but the readSocket thread is not started. It is not possible to restart a thread, so it should not end until sock is null.

andreagilardoni
andreagilardoni previously approved these changes Jul 3, 2024
@andreagilardoni
Copy link
Contributor

I quickly tested this PR and I don't see any issues with this. About the other issue you are describing, what err < 0 are you talking about?

@JAndrassy
Copy link
Contributor Author

sorry ret not err

      if (ret < 0 && ret != NSAPI_ERROR_WOULD_BLOCK) {
        mutex->unlock();
        goto cleanup;
      }

@andreagilardoni
Copy link
Contributor

If the peer closed the connection than, I think, it is expected that the socket has to be closed and needs to be restarted. In connect I can see that this check is performed

if (sock && reader_th) {
// trying to reuse a connection, let's call stop() to cleanup the state
char c;
if (sock->recv(&c, 1) < 0) {
stop();
}

Did I get your point?

@JAndrassy
Copy link
Contributor Author

then it is ok

Copy link
Contributor

@maidnl maidnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to analyse and test the proposed changes and I have a few comments (I hope you can find them meaningful).

One thing I found really confusing is the use of the _status flag which is set / reset in a lot of different places (and this make difficult to understand the underling logic).
In the readSocket() thread is appears that is not really necessary to set _status=true at the end of each innermost do-while cicle.
There is no need to set _status=true after the call of the configureSocket() function since is the configureSocket() function itself sets _status=true;. This happens twice: in the connect() and the connectSSL() function.
As additional simplification I would reset _status to false at the beginning of those 2 functions (if all goes well the configureSocket() function will set it to true and in case of problem it will remain set to false removing the need of setting if to false in case of errors). This removes the need to set _status to false in case of problem and ensure it is false if something wrong happened.

About the mutex lock/unlock in the write() function, my understanding is that the mutex is used to prevent access to the rxBuffer from different threads, so it appears to be not necessary here. One point that is certainly wrong is that the changes removed the check about the sock pointer: it is necessary to reintroduce here the check

if (sock == nullptr) {
   return 0;
}

this is very important because a user can call write() with an invalid sock and this would crash the program.

Since the mutex is used to prevent "common" access to rxBuffer I noticed that the peek() function that uses rxBuffer is unprotected: I would add the mutex lock/unlock mechanism to the peek() function.

A possible improvement is related to the setSocket() function: this function is called when client is create by a server. However the server allocates a Socket only if there is an incoming request, in case no request is made the server will always call setSocket(nullptr) (please check the EthernetServer::available() function in the EthernetServer.cpp file and verify if my understanding is correct).
In case like this it is probably pointless to call configureSocket() so I would add a check and execute the body of the function only if _sock is different from nullptr.

My last remark is more a doubt: in the connect() and connectSSL() function almost at the end of the function in case ret is not 1 it has been added the statement sock->close(), but this only closes the Socket. Would not be better to call directly the MbedClient stop() function? This would reset all the variables held by the client and not only "close the socket".

libraries/SocketWrapper/src/MbedClient.cpp Show resolved Hide resolved
@JAndrassy
Copy link
Contributor Author

_status only exists for status(). there is no other simple way. It has no internal use in MbedClient. This PR is not about _status.

I avoid doing unnecessary changes in my PR. Where would I stop if I begin to cleanup the code. So I don't even start. So the superfluous _status=true stay there for this PR.

The readSocket() function runs in a separate thread so it can't switch _status to false if it isn't definitive.

I can remove the lock from write. (btw if socket is null, then mutex is null too)

yes. peek() needs the lock. I add it.

My last remark is more a doubt: in the connect() and connectSSL() function almost at the end of the function in case ret is not 1 it has been added the statement sock->close(), but this only closes the Socket. Would not be better to call directly the MbedClient stop() function? This would reset all the variables held by the client and not only "close the socket".

as I understand it, the idea is that the socket can be reused for next try to connect, that is why it is not deleted. calling stop() would delete it. all other fields are not initialized because configureSocket didn't run

@JAndrassy JAndrassy force-pushed the mbedclient_readsocket_fixes branch from a40c63f to dbe0b07 Compare July 24, 2024 17:59
@JAndrassy JAndrassy force-pushed the mbedclient_readsocket_fixes branch from dbe0b07 to 3428d75 Compare July 24, 2024 18:04
@JAndrassy
Copy link
Contributor Author

I can remove the lock from write. (btw if socket is null, then mutex is null too)
yes. peek() needs the lock. I add it.

I made these changes ^^^

@facchinm
Copy link
Member

facchinm commented Sep 5, 2024

Maybe this patch could help fixing #937 ?

@@ -22,28 +22,30 @@ void arduino::MbedClient::readSocket() {
int ret = NSAPI_ERROR_WOULD_BLOCK;
do {
mutex->lock();
if (sock != nullptr && rxBuffer.availableForStore() == 0) {
if (sock == nullptr) {
Copy link

@schnoberts1 schnoberts1 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this subject to the same issues as "Another racy example" here: https://queue.acm.org/detail.cfm?id=2088916? sock is loop invariant but changed in another thread. mutex is the same isn't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants