No nodes available to run query #13388

spiremak · 2022-07-05T15:54:48Z

spiremak
Jul 5, 2022

Hey

Every so often we get intermittent presto errors on our queries saying io.prestosql.spi.PrestoException: No nodes available to run query. On average we get this error every 2 days.

When looking at the coordinators server.log at around that time we can see the following:
node-state-poller-0 io.prestosql.metadata.DiscoveryNodeManager Previously active node is missing

When running this query: "SELECT node_id,state,coordinator FROM system.runtime.nodes"

At one minute we can see our coordinator and all our worker nodes. The next minute we can only see the coordinator node
and then the minute after that we see all our coordinator and worker nodes again

our presto version is 336.

Anyone have any suggestions on what could be going on or where to look?

Thanks

PApostol · 2022-07-13T13:46:43Z

PApostol
Jul 13, 2022

Also seeing the same issue on version 336.

0 replies

hashhar · 2022-07-18T12:30:20Z

hashhar
Jul 18, 2022
Collaborator

This generally suggests that workers are not sending heartbeats to co-ordinator frequently enough. This can happen due to GC pauses or network connectivity issues.

Try to see if your workers are often at 100% CPU and going through GC pauses (via the GC logs or JMX metrics).

0 replies

spiremak · 2022-07-19T12:16:01Z

spiremak
Jul 19, 2022
Author

The CPU usage doesn't look out of the ordinary. What would be the best way to check if its a network issue? Is there something specific I should be looking for in the http-request.log

0 replies

spiremak · 2022-08-01T17:40:21Z

spiremak
Aug 1, 2022
Author

When looking at the http-request.log on the worker nodes during the time of the error we see the following is missing from the coordinator nodes ip:
GET /v1/info/state

However the logs still show this request from all the other worker nodes ips.

0 replies

tooptoop4 · 2022-08-25T00:13:15Z

tooptoop4
Aug 25, 2022

@hashhar any chance of making https://github.com/trinodb/trino/blob/393/core/trino-main/src/main/java/io/trino/metadata/DiscoveryNodeManager.java#L151-L158 and https://github.com/trinodb/trino/blob/393/core/trino-main/src/main/java/io/trino/metadata/DiscoveryNodeManager.java#L265-L267 only define worker state as 'missing' if 2 consecutive polls in a row failed? this would reduce query failures when there is once off temporary network blip

2 replies

hashhar Aug 25, 2022
Collaborator

cc: @dain

tooptoop4 Sep 5, 2022

prestodb/presto#15791 (comment) might be a fix to copy into trino

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No nodes available to run query #13388

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

No nodes available to run query #13388

spiremak Jul 5, 2022

Replies: 5 comments · 2 replies

PApostol Jul 13, 2022

hashhar Jul 18, 2022 Collaborator

spiremak Jul 19, 2022 Author

spiremak Aug 1, 2022 Author

tooptoop4 Aug 25, 2022

hashhar Aug 25, 2022 Collaborator

tooptoop4 Sep 5, 2022

spiremak
Jul 5, 2022

Replies: 5 comments 2 replies

PApostol
Jul 13, 2022

hashhar
Jul 18, 2022
Collaborator

spiremak
Jul 19, 2022
Author

spiremak
Aug 1, 2022
Author

tooptoop4
Aug 25, 2022

hashhar Aug 25, 2022
Collaborator