Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query hanging using the async api and different buffer size #588

Open
mfmmq opened this issue Sep 5, 2024 · 4 comments
Open

Query hanging using the async api and different buffer size #588

mfmmq opened this issue Sep 5, 2024 · 4 comments

Comments

@mfmmq
Copy link

mfmmq commented Sep 5, 2024

I think I have found a bug related to #325. I am using thepg gem in a rails app running against YugabyteDb, a distributed database. Specific queries are hanging when executing against Yugabyte, which has a different buffer size than that of vanilla postgres.

If either the async api is disabled or the query is slightly updated (e.g. select is changed or limit is changed), it returns. This issue also seems to only occur with specific response sizes (81830 bytes < send bytes < 81920 bytes) over SSL/TLS.

I've pushed a repro here, let me know if it works on your end

git clone [email protected]:mfmmq/ruby-pg-async.git && cd ruby-pg-async && make

@larskanis
Copy link
Collaborator

Thank you @mfmmq, that should be a great help! I can reproduce the issue with your docker composition. A smaller number at generate_series succeeds, but the 1279 count blocks infinitely. I'll investigate the issue, but it will probably take some days.

Do you know if it blocks also without TLS?

@mfmmq
Copy link
Author

mfmmq commented Sep 6, 2024

Do you know if it blocks also without TLS?

Thanks for getting back to me ! No, it doesn't block with SSL. Have pushed an disabled SSL version to branch no-ssl if you want to test it

git pull && git checkout no-ssl && docker compose rm && docker compose up --build

Our vendor has pushed the following change to a fork which stops the query from hanging -- thought I would mention in case it's helpful

if (PQsslInUse(conn)) {
			for (int i = 0; i < 15; i++) {
				if ( PQconsumeInput(conn) == 0 ) {
					pgconn_close_socket_io(self);
					pg_raise_conn_error(rb_eConnectionBad, self, "PQconsumeInput() #%d - %s", i, PQerrorMessage(conn));
				}
			}
		}

@larskanis
Copy link
Collaborator

I can reproduce this issue easily and I came to the same conclusion:

  1. It doesn't happen without SSL/TLS.
  2. Multiple calls to PQconsumeInput fixes the issue, but there is no indication how many calls are required.

Calling PQconsumeInput in a loop until is_readable/PQisBusy changes, would cause a busy waiting loop, so this is no good option.

For me it looks like a bug or design failure in libpq. I'll get in touch with the Postgres people.

@larskanis
Copy link
Collaborator

I wrote a simple patch for libpq that fixes this issue and proposed it to the Postgres hackers. They confirmed that this is a bug in libpq (and not in ruby-pg), but the patch is probably incomplete and needs more investigation and discussion. So I added the patch to the next Postgres commitfest as a placeholder:

https://commitfest.postgresql.org/50/5251/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants