You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote a tls server, but occasionally the program gets stuck after a few days of actual operation.
The tcp listening port is 50011.
I found that Recv-Q is always 1025.
$ ss -ltnp
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 1025 0 0.0.0.0:50011 0.0.0.0:* LISTEN -
tcp6 0 0 :::111 :::* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 ::1:631 :::* LISTEN -
tcp6 0 0 :::5900 :::* LISTEN -
$ netstat -anp |grep 50011 |grep CLOSE_WAIT
878
The server code:
impl DiscoveryServer {
pub async fn new(s: &Settings, n: &Arc<RwLock<HashMap<String, NodeDescription>>>) -> Self {
log::info!(
"create discovery server listener on {:?}",
format!("{}:{}", "0.0.0.0", s.server.listen_port)
);
DiscoveryServer {
tcp_socket: new_listener(format!("{}:{}", "0.0.0.0", s.server.listen_port), false)
.await
.unwrap(),
settings: s.clone(),
nodes: n.clone(),
}
}
pub async fn start(self) -> ResultType<()> {
log::info!("start discovery server");
let tls_acceptor = new_tls_acceptor();
tokio::spawn(async move {
loop {
match self.tcp_socket.accept().await {
Ok((stream, addr)) => {
let acceptor = tls_acceptor.clone();
let res_servers = self.nodes.clone();
let res_cities = self.settings.config_item.city_list.clone();
tokio::spawn(async move {
match TlsFrameStream::from(stream, acceptor).await {
Ok(mut tls_stream) => {
if let Some(Ok(bytes)) = tls_stream.next_timeout(MESSAGE_TIMEOUT).await {
if let Ok(msg_in) = DiscoveryMessage::parse_from_bytes(&bytes) {
match msg_in.union {
Some(discovery_message::Union::request(req)) => {
log::info!("msg from client:{}, request:{}", addr, req);
handle_request(&res_servers, res_cities, tls_stream, req).await;
}
_ => {
log::warn!("unknown union type from msg_in, type:{:?}", msg_in.union);
}
}
}
}
},
Err(e) => log::error!("error accept client, err: {}", e),
}
});
}
Err(err) => {
log::error!("error accept tcp socket, err: {}", err);
}
}
}
});
Ok(())
}
}
you might be missing a timeout on the TLS handshake (which seems to happen in TlsFrameStream::from), which then leads to leaked sockets if clients disconnect via a TCP reset. If that happens for a while you will be running out of filedescriptors on the host, and no further connections can be accepted.
There might also be other places (I didn't check further).
You might want to add metrics to your code which output the number of active connections and tasks to handle them.
you might be missing a timeout on the TLS handshake (which seems to happen in TlsFrameStream::from), which then leads to leaked sockets if clients disconnect via a TCP reset. If that happens for a while you will be running out of filedescriptors on the host, and no further connections can be accepted.
There might also be other places (I didn't check further). You might want to add metrics to your code which output the number of active connections and tasks to handle them.
thank you for your reply. TlsFrameStream::from should need to add timeout and i will check other block call.But I think it would be better if there should be a timeout in the internal implementation of the TLS handshake?
I wrote a tls server, but occasionally the program gets stuck after a few days of actual operation.
The tcp listening port is 50011.
I found that Recv-Q is always 1025.
The server code:
Tls wrapper:
Is it wrong with my code?
The text was updated successfully, but these errors were encountered: