Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"closing bad idle connection: unexpected read from socket" errors on MySQL 8.0.24 #1392

Open
owbone opened this issue Feb 10, 2023 · 13 comments · May be fixed by #1456
Open

"closing bad idle connection: unexpected read from socket" errors on MySQL 8.0.24 #1392

owbone opened this issue Feb 10, 2023 · 13 comments · May be fixed by #1456
Milestone

Comments

@owbone
Copy link
Contributor

owbone commented Feb 10, 2023

Issue description

MySQL 8.0.24 changed the server-side behaviour when encountering a wait_timeout. The client now receives an error packet with ER_CLIENT_INTERACTION_TIMEOUT rather than just finding an EOF.

Another GitHubber previously implemented a connection liveness check in #934. This was implemented with MySQL 5 in mind, and prints a "closing bad idle connection: unexpected read from socket" error when it reads anything from the socket of a connection that it expects to be idle.

With MySQL 8.0.24, this message is now really confusing, because it's no longer totally unexpected for there to be data to read if the idle connection has reached the wait_timeout. It fails to expose the underlying error that's been sent to the client, and makes it difficult to diagnose.

It would be nice if the driver could check if the data that it's read is an error packet and, if so, log the error message instead of "unexpected read from socket".

Example code

docker run -e MYSQL_ROOT_PASSWORD=test -p 3306:3306 mysql:8.0.24
package main

import (
	"database/sql"
	"fmt"
	"time"

	"github.com/go-sql-driver/mysql"
)

func main() {
	mysql.SetLogger(logger{})

	db, err := sql.Open("mysql", "root:test@tcp(127.0.0.1:3306)/mysql")
	if err != nil {
		panic(err)
	}
	defer db.Close()

	if _, err := db.Exec("SET GLOBAL wait_timeout = 5"); err != nil {
		panic(err)
	}

	time.Sleep(10 * time.Second)

	rows, err := db.Query("SHOW TABLES")
	if err != nil {
		panic(err)
	}
	defer rows.Close()
}

type logger struct{}

func (logger) Print(v ...interface{}) {
	fmt.Printf("mysql: ")
	fmt.Println(v...)
}

Error log

closing bad idle connection:  unexpected read from socket

Configuration

Driver version (or git SHA): 1.7.0

Go version: 1.19.5

Server version: MySQL 8.0.24+

Server OS: E.g. Debian 8.1 (Jessie), Windows 10

@methane
Copy link
Member

methane commented Jun 21, 2023

I now regret about we merged #934. It is too difficult, unmaintainable to fix this issue.

We supports some connections. We support TLS. And I want to add compression support too.
Since Go doesn't support nonblocking readable test, current conncheck uses 1-byte syscall read from raw fd. To read an error packet from here, we need to pass that 1-byte into TLS, compression layer. It makes our code very complicated.

In general, user should avoid connection closed from server by using SetConnMaxLifetime (and SetConnMaxIdleTime in some situation). User shouldn't rely on CheckConnLiveness, because it means their application is very likely to have timing issue.

Now I conclude that CheckConnLiveness have very bad cost/benefit ratio. At least until Go supports nonblocking readable testing.

@methane
Copy link
Member

methane commented Jun 21, 2023

I created #1451 and want to hear opinion from Githubbers.

@dchaofei
Copy link

dchaofei commented Jul 3, 2023

I call SetConnMaxLifetime and set it to 600s, and the wait_timeout of the mysql server is set to 900s. I have avoided the server closing the connection, but I still see "closing bad idle connection: unexpected read from socket" in the log occasionally, my mysql service The client version is 8.0.28.

@methane
Copy link
Member

methane commented Jul 3, 2023

@dchaofei Your issue is not relating to this issue. If SetConnMaxLifetime(600s) didn't solve it, you should try 300s or 150s.
Or consult with your network engineer. We can not/don't want to investigate your environment.

@dchaofei
Copy link

dchaofei commented Jul 5, 2023

@dchaofei Your issue is not relating to this issue. If SetConnMaxLifetime(600s) didn't solve it, you should try 300s or 150s. Or consult with your network engineer. We can not/don't want to investigate your environment.

@methane
First of all thank you for your suggestion.

I've searched all issues and only this one encountered "unexpected read from socket", so commented.

I don't quite understand why it is recommended to set 300s or 150s, the current 600s is much smaller than the mysql server, and it achieves the purpose of the client actively closing the connection.

	err = rawConn.Read(func(fd uintptr) bool {
		var buf [1]byte
		n, err := syscall.Read(int(fd), buf[:])
		switch {
		case n == 0 && err == nil:
			sysErr = io.EOF
		case n > 0:
			sysErr = errUnexpectedRead
		case err == syscall.EAGAIN || err == syscall.EWOULDBLOCK:
			sysErr = nil
		default:
			sysErr = err
		}
		return true
	})

From the above code, I get errUnexpectedRead only after I read the data of n>0. Under what circumstances will I read the data of n>0? At present, all I can think of is that the mysql server actively closed the connection and sent ER_CLIENT_INTERACTION_TIMEOUT to the client. Is there any other situation that will cause data to exist in the socket?

@methane
Copy link
Member

methane commented Jul 5, 2023

I've searched all issues and only this one encountered "unexpected read from socket", so commented.

Error is not a bug. Same error doesn't directly mean same issue.

I don't quite understand why it is recommended to set 300s or 150s, the current 600s is much smaller than the mysql server, and it achieves the purpose of the client actively closing the connection.

the current 600s is much smaller than the mysql server, and it achieves the purpose of the client actively closing the connection.

At present, all I can think of is that the mysql server actively closed the connection and sent ER_CLIENT_INTERACTION_TIMEOUT to the client. Is there any other situation that will cause data to exist in the socket?

If 600s is much smaller than mysql server, why server send ER_CLIENT_INTERACTION_TIMEOUT?

@methane
Copy link
Member

methane commented Jul 5, 2023

I don't quite understand why it is recommended to set 300s or 150s, the current 600s is much smaller than the mysql server, and it achieves the purpose of the client actively closing the connection.

The MySQL server is not only source of closing connection. OS, router, middlewares may close your connection. I don't know about your enviornment, and I can not investigate it.

300sec is the smallest timeout I have seen. It was Docker network, not MySQL server.
That is why I recommend 300sec or lower. If you want to use longer timeout, you must fix your network issue by yourself, or your infra engineer. I won't help you.

@dchaofei
Copy link

dchaofei commented Jul 6, 2023

I understand that what you said may be that other network middleware has closed the connection, but as far as I know, the connCheck() func will only check after the connection is obtained from the connection pool and before the first write operation. Normally, the connection at this time is There is no data, because this is an idle connection, if this connection has been closed by any other network middleware, the connCheck method should be able to get io.EOF error instead of errUnexpectedRead, is my above understanding correct? If not, thanks for pointing it out.

@methane
Copy link
Member

methane commented Jul 6, 2023

@dchaofei You are correct. If you want to investigate it, you should:

  • packet capture (Wireshark understand MySQL protocol. I use it often to maintain MySQL drivers).
  • Disable connCheck.

@dchaofei
Copy link

dchaofei commented Jul 6, 2023

I'm trying to find a way to reproduce this problem, because I only see a log like this every few days on my server, I will update if there is progress.

@methane methane linked a pull request Jul 6, 2023 that will close this issue
5 tasks
@Bisstocuz
Copy link

Bisstocuz commented Oct 9, 2023

I created #1451 and want to hear opinion from Githubbers.

I support your viewpoint that it is not possible for everything to be perfect and flawless, with strong robustness. It is evidently more costly to rely on go-sql-driver to solve this issue, and users can ensure the absence of such problems by simply making some other configurations.

@owbone
Copy link
Contributor Author

owbone commented Oct 9, 2023

I call SetConnMaxLifetime and set it to 600s, and the wait_timeout of the mysql server is set to 900s. I have avoided the server closing the connection, but I still see "closing bad idle connection: unexpected read from socket" in the log occasionally, my mysql service The client version is 8.0.28.

ER_CLIENT_INTERACTION_TIMEOUT is just one example of the possible errors that the server can send. The other one that I've seen this happen with is, I believe, ER_DISCONNECTING_REMAINING_CLIENTS when a replica is taken down.

I do consider this to be the same issue. The only reason I included ER_CLIENT_INTERACTION_TIMEOUT in the reproduction is because it was the easiest error to reproduce.

I support your viewpoint that it is not possible for everything to be perfect and flawless, with strong robustness. It is evidently more costly to rely on go-sql-driver to solve this issue, and users can ensure the absence of such problems by simply making some other configurations.

I think @dchaofei has already shown that timeouts don't prevent it, so it's not clear to me what other simple configurations would ensure the absence of all other server errors.

The problem stated in this issue is not that the server sometimes disconnects the client with an error. The problem is they are much, much more difficult to debug because the driver hides the error.

@mjonss
Copy link

mjonss commented Oct 9, 2023

Not sure it helps, but there is a change in MySQL 8.0 that was contributed by me, https://bugs.mysql.com/bug.php?id=93240 / mysql/mysql-server@14508bb where, instead of just closing the connection, it tries to send an error and then close the connection, so clients have a possibility to distinguish from a server closing a connection due to wait_timeout or server crashed/went away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants