Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ready event not emitted if shard reconnects after final shard initialization #10485

Open
pkdev08 opened this issue Sep 5, 2024 · 13 comments
Open

Comments

@pkdev08
Copy link
Contributor

pkdev08 commented Sep 5, 2024

Which package is this bug report for?

discord.js

Issue description

If a shard attempts to reconnect but doesn't fully do so before the final shard becomes ready, the ready event won't be emitted. For example, with 10 shards, if shard 6 reconnects and becomes ready after shard 10 is ready, the ready event won’t fire, although all shards are ready. The only way to resolve this issue is to reboot the bot and hope the same situation doesn't occur.

Code sample

No response

Versions

djs 14.16.1
node LTS

Issue priority

Medium (should be fixed soon)

Which partials do you have configured?

Not applicable

Which gateway intents are you subscribing to?

Not applicable

I have tested this issue on a development release

yes

@Qjuh
Copy link
Contributor

Qjuh commented Sep 5, 2024

Can you elaborate on what exactly happens when you encounter that issue? Especially why the shard needs to reconnect. Because regular reconnects should emit the WebSocketShardEvents.Closed in @discordjs/ws which in turn should cause mainlib discord.jsto handle the waiting for ready event on reconnect gracefully.

If you by any chance have debug logs from listening to the debug event during such a botched reconnect or a way to consistently reproduce the issue I'll look into it.

@pkdev08
Copy link
Contributor Author

pkdev08 commented Sep 5, 2024

This happens on the initial boot up. If a shard closes and attempts to resume, but becomes marked as ready after the final shard (shard 70), then the ready event is never emitted as the check to emit the ready event happens when shard 70 is marked as ready.

2024-09-05 11:49:18 [info]: [WS => Shard 68] The gateway closed with an unexpected code 1006, attempting to resume. 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Destroying shard
	Reason: none
	Code: 1006
	Recover: Resume 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Connection status during destroy
	Needs closing: false
	Ready state: 3 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Was waiting for an identify, but the shard closed in the meantime 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Connecting to wss://gateway.discord.gg?v=10&encoding=json 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Waiting for event hello for 60000ms 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Preparing first heartbeat of the connection with a jitter of 0.31467293125223994; waiting 12980ms 
2024-09-05 11:49:31 [info]: [WS => Shard 70] Identifying
	shard id: 70
	shard count: 71
	intents: 38531
	compression: none 
2024-09-05 11:49:31 [info]: [WS => Shard 70] Waiting for event ready for 15000ms 
2024-09-05 11:49:36 [info]: [WS => Shard 70] Shard received all its guilds. Marking as fully ready. 
2024-09-05 11:49:42 [info]: [WS => Shard 68] Shard received all its guilds. Marking as fully ready. 

@Qjuh
Copy link
Contributor

Qjuh commented Sep 6, 2024

I would be interested to see all logs for that shard 68 that happened before those you showed too. Because it’s rather weird to have a resume happening that early and I‘d need to know what the shard did before that to trace the status changes in code.

@pkdev08
Copy link
Contributor Author

pkdev08 commented Sep 6, 2024

I would be interested to see all logs for that shard 68 that happened before those you showed too. Because it’s rather weird to have a resume happening that early and I‘d need to know what the shard did before that to trace the status changes in code.

I tend to get a few resumes during each shard initialization process, but it doesn't seem to cause any issues since the ready event still emits successfully.

Nothing happened before with Shard 68 except just the normal: [WS => Shard 68] First heartbeat sent, starting to beat every 41250ms

All the shards were ready, it just didn't emit ready, and there were no errors.

image

@Qjuh
Copy link
Contributor

Qjuh commented Sep 6, 2024

Where does that readyShards number and the other two values come from? Because it‘s not a djs internal. How did you define their values? So I can trace what I‘m looking at here.

@pkdev08
Copy link
Contributor Author

pkdev08 commented Sep 6, 2024

Where does that readyShards number and the other two values come from? Because it‘s not a djs internal. How did you define their values? So I can trace what I‘m looking at here.

 readyShards: client.ws.shards.filter(
      (shard) => shard.status === Status.Ready
    ).size,
    clientStatus: client.ws.status,
    clientReady: client.isReady(),

@Qjuh
Copy link
Contributor

Qjuh commented Sep 6, 2024

[...] the ready event is never emitted as the check to emit the ready event happens when shard 70 is marked as ready.

2024-09-05 11:49:18 [info]: [WS => Shard 68] The gateway closed with an unexpected code 1006, attempting to resume. 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Destroying shard
	Reason: none
	Code: 1006
	Recover: Resume 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Connection status during destroy
	Needs closing: false
	Ready state: 3 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Was waiting for an identify, but the shard closed in the meantime 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Connecting to wss://gateway.discord.gg?v=10&encoding=json 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Waiting for event hello for 60000ms 
2024-09-05 11:49:18 [info]: [WS => Shard 68] Preparing first heartbeat of the connection with a jitter of 0.31467293125223994; waiting 12980ms 
2024-09-05 11:49:31 [info]: [WS => Shard 70] Identifying
	shard id: 70
	shard count: 71
	intents: 38531
	compression: none 
2024-09-05 11:49:31 [info]: [WS => Shard 70] Waiting for event ready for 15000ms 
2024-09-05 11:49:36 [info]: [WS => Shard 70] Shard received all its guilds. Marking as fully ready. 
2024-09-05 11:49:42 [info]: [WS => Shard 68] Shard received all its guilds. Marking as fully ready. 

That last line there comes from

this.debug(['Shard received all its guilds. Marking as fully ready.']);
this.status = Status.Ready;
/**
* Emitted when the shard is fully ready.
* This event is emitted if:
* * all guilds were received by this shard
* * the ready timeout expired, and some guilds are unavailable
* @event WebSocketShard#allReady
* @param {?Set<string>} unavailableGuilds Set of unavailable guilds, if any
*/
this.emit(WebSocketShardEvents.AllReady);
which, as you can see, also emits the internal AllReady event which gets handled here

shard.on(WebSocketShardEvents.AllReady, unavailableGuilds => {
/**
* Emitted when a shard turns ready.
* @event Client#shardReady
* @param {number} id The shard id that turned ready
* @param {?Set<Snowflake>} unavailableGuilds Set of unavailable guild ids, if any
*/
this.client.emit(Events.ShardReady, shard.id, unavailableGuilds);
this.checkShardsReady();

running the check if all shards are ready. That check is run independent of if the id of the shard is the one of the last shard.

There's something missing in the information you provide us with here. Would help if your log there also included client.ws.shards.size and client.ws.totalShards or maybe even client.ws.shards.map(s => s.status).

@pkdev08
Copy link
Contributor Author

pkdev08 commented Sep 6, 2024

There's something missing in the information you provide us with here. Would help if your log there also included client.ws.shards.size and client.ws.totalShards or maybe even client.ws.shards.map(s => s.status).

I'm not really sure what happened, but I do have a little more data if this helps. The guildCount, shardCount and shardStatus all seem to be accurate. Just not sure why the the ready event was never emitted. I waited an additional 10 minutes or so as well.

image

shardCount: client.options.shardCount.toLocaleString()
shardStatus: client.ws.shards.map((shard) => shard.status)

@Qjuh
Copy link
Contributor

Qjuh commented Sep 6, 2024

You have 81 shards in that shardStatus array there. Way more than the 71 your shardCount says. That will indeed cause the check in

if (this.shards.size !== this.totalShards || this.shards.some(shard => shard.status !== Status.Ready)) {
to fail. Now we'll have to find out where those additional shards come from... nevermind, apparently I can‘t count… it‘s 71 shards in that array…

@vladfrangu
Copy link
Member

Can you send your client options here too?

@pkdev08
Copy link
Contributor Author

pkdev08 commented Sep 6, 2024

Can you send your client options here too?

const client = new Client({
  shards: 'auto',
  rest: {
    retries: 0,
    rejectOnRateLimit: (rateLimitData) => {
      return rateLimitData.route.endsWith('/emojis');
    }
  },
  makeCache: Options.cacheWithLimits({
    MessageManager: {
      maxSize: 10
    },
    GuildInviteManager: { maxSize: 0 },
    GuildEmojiManager: { maxSize: 0 },
    GuildStickerManager: { maxSize: 0 }
  }),
  sweepers: {
    ...Options.DefaultSweeperSettings,
    messages: {
      interval: 120,
      lifetime: 120
    }
  },
  intents: [
    GatewayIntentBits.Guilds,
    GatewayIntentBits.GuildMembers,
    GatewayIntentBits.GuildMessages,
    GatewayIntentBits.GuildMessageReactions,
    GatewayIntentBits.DirectMessages,
    GatewayIntentBits.GuildVoiceStates,
    GatewayIntentBits.MessageContent
  ],
  partials: [
    Partials.Message,
    Partials.Reaction,
    Partials.Channel,
    Partials.GuildMember
  ],
  presence: {
    activities: [
      {
        name: '/help',
        type: ActivityType.Listening
      }
    ]
  }
});

@Qjuh
Copy link
Contributor

Qjuh commented Sep 6, 2024

Do you have any other code that uses Client#ws apart from the logs you showed here already?

@pkdev08
Copy link
Contributor Author

pkdev08 commented Sep 6, 2024

Do you have any other code that uses Client#ws apart from the logs you showed here already?

Nope. I did reboot the bot several times within a short window of about 15-20 minutes because I forgot to make an important change, so I'm not sure if that had something to do with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants