Raise an error in scatter
when broadcast
and AMM are incompatible
#8796
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I spent a long time debugging a hang in XGBoost before I noticed this doc-string note about disabling AMM.
It turns out that
xgboost.dask.predict(...)
usesclient.scatter(..., broadcast=True)
to replicate theBooster
object on all workers. In some cases, the replication process seems to conflict with the active-memory-manager'sReduceReplicas
policy - resulting in a hang.This PR proposes that a clear error be raised by
Client.scatter
whenbroadcast=True
and AMM is enabled in the config. It also seems fine to produce a warning instead. However, I definitely think it makes sense to be "loud" when the user is likely to run into a problem like this.