Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns_resolver: new IPs aren't added or no DNS refresh #2847

Open
cageyv opened this issue Nov 12, 2024 · 2 comments
Open

dns_resolver: new IPs aren't added or no DNS refresh #2847

cageyv opened this issue Nov 12, 2024 · 2 comments

Comments

@cageyv
Copy link

cageyv commented Nov 12, 2024

Problem description

  • dns_resolver doesn't refresh the DNS over time, or we don't know how to make this configuration.
  • We are using round_robin lb_policy_name. In the case of TRANSIENT_FAILURE DNS, get refreshed after 1.11.3 bug fix.
  • If we remove 1 server instance and add it back with a new IP, then it will not be used because dns_resolver will not refresh the DNS
  • This blog post: https://arpittech.medium.com/grpc-and-connection-pooling-49a4137095e7 describes that problem with scaling. Only "The problem" part.

Reproduction steps

const grpc = require("@grpc/grpc-js");
const { GrpcTransport } = require("@protobuf-ts/grpc-transport");
tc = require("./proto-gen/demo/demoapp/v1/demoapp.client")

// Define a function to create a gRPC client with round_robin load balancing
function createGrpcClient(url) {
    const rpcTransport = new GrpcTransport({
        host: url,
        channelCredentials: grpc.credentials.createInsecure(),
        clientOptions: {
            'grpc.lb_policy_name': 'round_robin',
            'grpc.service_config': JSON.stringify({ loadBalancingConfig: [{ round_robin: {} }] })
        }
      });

  return new tc.TaxpnlgraphServiceClient(rpcTransport)
}

For the server-side backends, it is DNS-based load balancing.
In our case, this is golang gRPC server. 5 instances, different IP and multivalve DNS.
Locally, we could use docker-compose aliases

version: '3'
services:
  server:
    image: grpc/java-example-hostname:1.68.1
    restart: always
    networks:
      default:
        aliases:
          - grpc-server.local
  server:
    image: grpc/java-example-hostname:1.68.1
    restart: always
    networks:
      default:
        aliases:
          - grpc-server.local

Environment

  • OS name, version and architecture: public.ecr.aws/docker/library/node:v16-alpine3.17 Apline x64
  • Node version: ^16
  • Node installation method: npm ci
  • Package name and versio: "@grpc/grpc-js": "1.11.3" , "@protobuf-ts/grpc-transport": "2.9.4"

Additional context

What we did:

  • First it was 8 instances backend and we ran the script. Script reach all of them.
  • We redeployed the service and deployment, replaced all 8 backends and script able to reconnect (it was fixed in 1.11.3)
  • We stop/remove 1 instance. Script still connected to 7 of them
  • We run/add 1 instance. Script still connected to 7 of them.
  • Here we expect the DNS refresh and +1 connection
  • Wait additional 5 min. No results.
  • More notes. New tasks: 10.0.172.74, 10.0.131.158; -1 task was 10.0.159.49; +1 task was 10.0.185.163

log.txt

@murgatroid99
Copy link
Member

Our general recommendation is that servers should drop connections periodically to signal to clients that they should update name resolution information. In grpc-js, you can do this by using the grpc.max_connection_age_ms and grpc.max_connection_age_grace_ms options on the server. The grpc.max_connection_age_ms should be tuned based on how frequently you expect clients to need to get new DNS resolution information. The grpc.max_connection_age_grace_ms controls how long a server will spend processing requests on a connection after telling the client to stop using that connection, so you should set that based on the longest it generally takes the server to process a request.

As a side note, the grpc.lb_policy_name is obsolete, and the grpc.service_config option is its replacement. There is no reason to specify both.

@cageyv
Copy link
Author

cageyv commented Nov 12, 2024

Thanks for the recommendation. Sounds good.
I will try it. As soon as I have any news I will update that issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants