Multiple peers rebroadcast messages simultaneously during Pubsub Flood #9748

gvelez17 · 2023-03-24T09:04:01Z

Checklist

This is a bug report, not a question. Ask questions on discuss.ipfs.io.
I have searched on the issue tracker for my bug.
I am running the latest kubo version or have an issue updating.

Installation method

ipfs-update or dist.ipfs.tech

Version

0.18.1 on most nodes, and on several involved nodes.  Some nodes on the network may be running earlier versions, since they are not all under our control.

Config

Note, this is only for one node, however it is the one reflected in the cpu graph below.  Other nodes may have different configs.

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/0.0.0.0/tcp/5011",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/0.0.0.0/tcp/9011",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4010",
      "/ip4/0.0.0.0/tcp/4011/ws"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "accessKey": "AKIA6CODUIKZYDOFFI4H",
            "bucket": "ceramic-prod-cas-cpc-node",
            "keyTransform": "next-to-last/2",
            "region": "us-east-2",
            "rootDirectory": "ipfs/blocks",
            "secretKey": "wuIfgHwr7pRUXwsaADrWVphU4F2tyE26GK+rx1Ws",
            "type": "s3ds"
          },
          "mountpoint": "/blocks",
          "prefix": "s3.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": [
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
        ],
        "ID": "QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
        ],
        "ID": "QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-1-external.3boxlabs.com/tcp/4011/ws/p2p/QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
        ],
        "ID": "QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-2-external.3boxlabs.com/tcp/4011/ws/p2p/QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
        ],
        "ID": "QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-clay-external.3boxlabs.com/tcp/4011/ws/p2p/QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
        ],
        "ID": "QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
      }
    ]
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Enabled": true,
    "Router": "",
    "SeenMessagesTTL": "10m"
  },
  "Reprovider": {},
  "Routing": {},
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {
      "Enabled": false
    },
    "RelayService": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  },
  "algorithm": "rsa"
}

Description

Related to the Pubsub Flood issue https://github.com/ipfs/kubo/issues/9665, this is specifically to note that when the flood begins, multiple peers become involved simulaneously though with different seqnos, different messages and different origin from peers.

The result of the flood is a near-max of the CPU on our critical IPFS node for Ceramic Anchor Service

Is there some network condition that would simultaneously trigger upwards of 20 different nodes to engage in rebroadcasting of different messages? Is there a setting that would help tune this back?

We greatly appreciate the nonce validator added by @vyzo in https://github.com/libp2p/go-libp2p-pubsub/releases/tag/v0.9.2

Noted that it does not appear to be used yet by the latest kubo https://github.com/ipfs/kubo/blob/master/go.mod#L74 , is it safe to simply include this module version in a source build?

Is there perhaps a backoff setting that should also be used when the activity is happening across the network? If a node detects that it is receiving the identical message from >3 peers, should it be pruning its peer list?

Any advice or suggestions for what to try to turn off the pubsub flood very welcome, we can experiment with individual nodes and if a solution is found we can communicate with our user base to at least get it across much of the network.

The text was updated successfully, but these errors were encountered:

gvelez17 · 2023-03-24T09:29:51Z

This may not mean anything, but an analysis of about 30 minutes of data seems to show a different pattern for the messages that begin a chain of duplicates than for other messages.

These were determined by finding message groups by seqno, then filtering for ones where the original peer (From:) matched the receivedFrom header, which we exposed in a slightly modified version of kubo just to output this field.

The messages that kick off a chain of rebroadcasts are majority RESPONSE type messages. (In Ceramic, messages are UPDATE, QUERY, RESPONSE or KEEPALIVE)

# counts from the messages that are received from the original peer
# and later result in rebroadcasts
(Pdb) de.typ.value_counts()
2    3903
0     176
3     164
1       8

# all the messages seen in 30 minutes
Name: typ, dtype: int64
(Pdb) df.typ.value_counts()
2    48306
0    22566
1    21393
3     4011

BigLep · 2023-03-25T04:11:53Z

Noted that it does not appear to be used yet by the latest kubo https://github.com/ipfs/kubo/blob/master/go.mod#L74 , is it safe to simply include this module version in a source build?

Per #9665 (comment) , https://github.com/libp2p/go-libp2p-pubsub/releases/tag/v0.9.2 isn't going to make it into master.

I'm not aware of any issues if you include this module version into your own Kubo build though. I'll let @Jorropo comment. I know we didn't (and aren't planning) to bring it into Kubo master because it brakes interop with the JS stack. We're instead deprecated the pubsub commands: #9717

gvelez17 added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Mar 24, 2023

gvelez17 changed the title ~~Multiple peers rebroadcast messages during "flood"~~ Multiple peers rebroadcast messages simultaneously during Pubsub Flood Mar 24, 2023

BigLep assigned Jorropo Mar 25, 2023

BigLep added this to IPFS Shipyard Team Mar 25, 2023

BigLep moved this to 🥞 Todo in IPFS Shipyard Team Mar 25, 2023

aschmahmann added need/analysis Needs further analysis before proceeding and removed need/triage Needs initial labeling and prioritization labels May 22, 2023

Jorropo removed their assignment Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple peers rebroadcast messages simultaneously during Pubsub Flood #9748

Multiple peers rebroadcast messages simultaneously during Pubsub Flood #9748

gvelez17 commented Mar 24, 2023

gvelez17 commented Mar 24, 2023 •

edited

Loading

BigLep commented Mar 25, 2023

Multiple peers rebroadcast messages simultaneously during Pubsub Flood #9748

Multiple peers rebroadcast messages simultaneously during Pubsub Flood #9748

Comments

gvelez17 commented Mar 24, 2023

Checklist

Installation method

Version

Config

Description

gvelez17 commented Mar 24, 2023 • edited Loading

BigLep commented Mar 25, 2023

gvelez17 commented Mar 24, 2023 •

edited

Loading