You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are experiencing an issue where occasionally, messages are taking just over 30 seconds to send on a simple request socket. This seems to occur when the system has many messages sent out at the same time. When we eventually DO send out the message, the response is received almost immediately.
We are running on a non-Oracle JVM that is compatible with jdk 1.8.
We previously were using JMQ v.3.4 and upgraded to v5.2 to be compatible with the rest of the system.
We are using 1 Zmq.Context in our application
When we want to send a message, we create a ClientSocketThread that gets a REQ socket from the context on instantiation and invokes the send after we call start() on the thread.
I've put in logging to try and determine if the send() or receive() call was taking a long time, but, due to the fact that the send() is non-blocking, I was able to determine the process running at the endpoint didn't receive our message for almost 30 due to their logging and when the send back their response.
Here is some code to give you an idea of what we're doing.
Lastly, the send request function looks like this (with some debug logging added)
void sendRequestAndDispatchResponse(byte[] msg) {
long timeBeforeSend = System.currentTimeMillis();
long timeAfterSend;
long timeAfterReceive;
if (requester != null) {
try {
// requester.setReceiveTimeout(ZMQ_TIMEOUT);
requester.sendMessage(msg);
timeAfterSend = System.currentTimeMillis();
checkSendTime(timeBeforeSend, timeAfterSend);
byte[] responseMessage = requester.receiveMessage();
timeAfterReceive = System.currentTimeMillis();
checkReceiveTime(timeAfterSend, timeAfterReceive);
notifyHandler(responseMessage);
} catch (ZMQException e) {
throw new IllegalStateException("ZMQ Error when sending message. \n" + e);
}
}
}
The checkTimeout functions just check if the time difference is >30 seconds and log a message. When we were setting the ReceiveTimeout(30000), the socket would bail and return a null message to the handler but by letting the socket wait just a little longer, we were getting the response in 30150 milliseconds or so. Sometimes it takes up to 40 seconds, but, most instances of this issue show a time just a few hundred milliseconds longer than 30 seconds. The handler, in this case, calls back into the rest of our code to process the message response.
Some questions:
Are we doing things horribly wrong by putting our asynchronous messages on a thread we create ourselves? My understanding was that the receive() call DOES block and so we don't want to tie down our main application code on that blocking call.
Should our system have only 1 zmq context? We have it configured to use 2 IO threads, but, given the specs, I can't imagine it requires more.
If the above are 'ok' but not ideal, what could possibly be causing the message to sit in the ZMQ code for 30+ seconds? This is where I'm stuck and can't seem to figure out why our messages take so long to get to their destination.
Thanks for any help you can provide and I'll happily share more information if possible.
The text was updated successfully, but these errors were encountered:
Further investigation and paying attention to when this issue occurs revealed the following info:
The issue happens when our system goes to 'sleep'
We currently run our java code on two different platforms, both of which go to 'sleep' differently.
In one case, the CPU is completely stopped, but in the other case, a sig stop (I believe) is sent to the Java process and its
children processes.
I believe that if the Jeromq code used System.nanoTime, we would not encounter this issue on the first platform.
However, a simple java test showed that if a sig stop is sent, nanoTime and currentTimeMillis stay in sync.
Bottom line, is there a way to have a reasonable timeout set on a socket in JeroMQ that can account for the entire CPU going to sleep or an OS Sig stop sent to the java process?
We are experiencing an issue where occasionally, messages are taking just over 30 seconds to send on a simple request socket. This seems to occur when the system has many messages sent out at the same time. When we eventually DO send out the message, the response is received almost immediately.
Here is some code to give you an idea of what we're doing.
createClientSocketThread(serverPort, context, message, this).start();
Create the clientThread and start the thread. The instantiation looks like this:
Once the thread starts, we do something like this:
Lastly, the send request function looks like this (with some debug logging added)
The checkTimeout functions just check if the time difference is >30 seconds and log a message. When we were setting the ReceiveTimeout(30000), the socket would bail and return a null message to the handler but by letting the socket wait just a little longer, we were getting the response in 30150 milliseconds or so. Sometimes it takes up to 40 seconds, but, most instances of this issue show a time just a few hundred milliseconds longer than 30 seconds. The handler, in this case, calls back into the rest of our code to process the message response.
Some questions:
Thanks for any help you can provide and I'll happily share more information if possible.
The text was updated successfully, but these errors were encountered: