Apphash mismatch (IAVL 1.2.0, been able to reproduce it once) #22253

Pitasi · 2024-10-14T11:30:26Z

Pitasi
Oct 14, 2024

Hey everyone, I'm asking for help investigating an apphash mismatch we have seen in one of our internal testnet.

We are using:

Cosmos SDK: v0.50.9 (Evmos fork)
IAVL: v1.2.0

Our testnet was composed of two nodes, sharing the same Horcrux remote signer (i.e. there was only one validator).

One of the two nodes halted due to a apphash mismatch, the chain didn't halt because the other node continued acting as the validator. By the time we noticed, the pruning settings already wiped the interested block from the node that continued running, hence I wasn't able to do some proper investigation.

The interesting this I noticed from the application.db of the node that halted, is that it completely wiped an entire module storage. The previous block height was there, the next it doesn't have any keys:

go run . data ~/Downloads/chiado-1-dump.db "s/k:act/" 17698
Got version: 19520
Printing all keys with hashed values (to detect diff)
  000000000000000001
    1F878D20753082B2905FCFE18F98D3B7D4E8C866EF7A1781D7BC9D455767A11E
... [omitted for brevity]
  action/count
    7A42E3892368F826928202014A6CA95A3D8D846DF25088DA80018663EDF96B1C
Hash: 389087F05766BD6552DF3FC40AACD8A1B5A9FC16927CBCFFAA5EC73D79128A35
Size: 34

❯ go run . data ~/Downloads/chiado-1-dump.db "s/k:act/" 17699
Got version: 19520
Printing all keys with hashed values (to detect diff)
Hash: E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855
Size: 0

Both nodes had an interesting line in the logs, just before this error occured.

12:09PM ERR iavl set error error="Value missing for key [0 0 0 0 0 0 0 1 0 0 0 1] corresponding to nodeKey 73000000000000000100000001" module=server

At this point, one thing I noticed is that the node wasn't shutting down properly on SIGTERM/SIGINT, it was panicking right before closing the databases. So my initial thought was that the db got corrupted because of that.

We fixed the bug causing the panic and restarted a new chain from scratch using the fixed version, with the same setup.

The issue happened again, and the same module data was gone, just like the first time. (weird coincidence?)

And an identical log entry for both nodes appeared before the apphash mismatch:

2024-10-07T01:09:22.79390127Z stdout F 1:09AM ERR iavl set error error=“Value missing for key [0 0 0 0 0 0 0 1 0 0 0 1] corresponding to nodeKey 73000000000000000100000001” module=server”

We disabled pruning and ran yet another chain from scratch, but the issue haven't happened yet and it's been a week now. Could the issue be related to pruning?
FWIW we've been using the same IAVL version for months in our public testnet, with the exact same pruning parameters, without any issues.

If anyone has any idea on how to investigate further the root cause, I'd be more than happy. Thanks!

julienrbrt · 2024-10-21T19:04:19Z

julienrbrt
Oct 21, 2024
Maintainer

cc @cool-develope

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apphash mismatch (IAVL 1.2.0, been able to reproduce it once) #22253

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Apphash mismatch (IAVL 1.2.0, been able to reproduce it once) #22253

Pitasi Oct 14, 2024

Replies: 1 comment

julienrbrt Oct 21, 2024 Maintainer

Pitasi
Oct 14, 2024

julienrbrt
Oct 21, 2024
Maintainer