Background tree replacement in AdaptiveRandomForestClassifier using a feature-based drift detection algorithm (such as ADWIN) #1234

Wael-BHY-BNP · 2023-05-09T08:39:28Z

Wael-BHY-BNP
May 9, 2023

Hi,

First of all, I'd like to thank you for this package and for the great support you're putting out for it !

I have a question surrounding the drift detection part in the ARFClassifier.

I read the article referenced for the algorithm, and on paper, the idea seems straightforward :

An ARFClassifier is basically a bagging of Hoeffding Trees
Each Hoeffding Tree is supplemented with a drift detector (with a warning and a drift threshold)
If a warning escalates to a drift for a given tree, it is replaced with the background tree that was initiated when the warning occured

However, in the specific case that we have a purely feature-based drift detection algorithm such as ADWIN (i.e where we monitor changes in the distribution of X and not Y|X), wouldn't this mean that the warning and/or drift thresholds will always be hit at the same time for all Hoeffding trees simultaneously ? In other words, since all trees see the same observation x when learn_one() is called and with the same weight (since the Poisson sampling is done at forest level and not on the tree level if i understand correctly), wouldn't this mean that whenever a drift threshold is hit, it is in fact hit for all trees, and thus the whole forest is reset (by replacing every tree with their respective background trees) rather than "selectively replacing trees" as claimed by the paper ?

If that's not the case, could you kindly elaborate on the reasons why ? My intuition leads me to belive that it has something to do with the fact that each tree only uses a randomly subsampled set of features to build the splits, but I'm not sure if the drift detection part is made once these features have been defined, or at a "higher level", and I wasn't able to find the answer in the package's code after a modest search.

I hope the wording of the question is not too confusing, and I obviously remain at your disposal to clear up any doubt.

Thanks !

Answered by smastelini

May 9, 2023

Hi @Wael-BHY-BNP, you are not far from the final answer.

In fact ARF uses sampling with replacement for each tree, by relying on Poison sampling. Each tree ends up monitoring a different sample. Therefore even the background trees will monitor a different sample compared to the foreground ones. There is also feature subsampling per leaf (as you mention).

However, I disagree that ADWIN is a feature-based detector in this context. It is, of course, univariate, but in ARF and other algorithms ADWIN is applied in a supervised fashion. Each instance of ADWIN (two per foreground tree) monitors the tree error (1-0 error in the classification case). Each time a new instance arrives at learn_one, …

View full answer

MaxHalford · 2023-05-09T09:50:59Z

MaxHalford
May 9, 2023
Maintainer

@smastelini can you take this when you have time?

1 reply

smastelini May 9, 2023
Maintainer

Of course! Thanks for the ping, @MaxHalford !

smastelini · 2023-05-09T11:42:44Z

smastelini
May 9, 2023
Maintainer

Hi @Wael-BHY-BNP, you are not far from the final answer.

In fact ARF uses sampling with replacement for each tree, by relying on Poison sampling. Each tree ends up monitoring a different sample. Therefore even the background trees will monitor a different sample compared to the foreground ones. There is also feature subsampling per leaf (as you mention).

However, I disagree that ADWIN is a feature-based detector in this context. It is, of course, univariate, but in ARF and other algorithms ADWIN is applied in a supervised fashion. Each instance of ADWIN (two per foreground tree) monitors the tree error (1-0 error in the classification case). Each time a new instance arrives at learn_one, each tree first calls its own predict_one to check out whether the instance is misclassified. This binary input (0 implies no error and 1 implies a classification error) is passed to the drift detectors.

Historically, ADWIN was broadly used to monitor errors in classifiers. For that reason I've seen papers referring to it as a supervised drift detector. I personally believe there is still a lack for a better terminology. ADWIN is univariate and that it. It is up to the user choose which input series to use.

By monitoring the tree error not every tree will be reset at the same time. Unless, per chance, even after online bagging and feature subsampling all trees are affected equally by a drift.

Finally, the warning and drift detectors work at different confidence levels in the case of ADWIN. Therefore, the warning level is prone to trigger a concept drift earlier although it could be also a false detection.

I hope I was able to clarify a bit your question. Please, do not hesitate to send a follow up question if my answer was not clear or other new questions pop up :)

3 replies

Wael-BHY-BNP May 9, 2023
Author

Hi @smastelini,

Thank you for your quick and thorough answer, it makes much more sense to me now.

My confusion came from the fact that I thought ADWIN was used directly on the input features, rather than the trees' classification errors.

So for example, if we had a classification problem with 10 features, we would have 10 ADWIN instances (1 for each input feature) since the algorithm is univariate, and we would consider that a drift occurs when at least one ADWIN instance goes over the warning/drift threshold, hence my confusion.

I am obviously marking your reply as the answer to the original question, but I do have a quick (and potentially naïve) follow-up question, if I may :

At the beginning of a tree's life, it is likely that it won't be very accurate in its predictions, resulting in lots of "1"'s being passed to ADWIN. However, as the tree grows with new samples arriving, the tree is likely to start getting better at classifying, and start passing more "0"s to ADWIN. Won't this change cause the drift detector to go over the warning/drift thresholds, despite this desirable change being likely to occur in every tree's early life ? Is there a clause that adresses this particularily in the early learning stages of the process, or is it something that has to be offset by a smart yet not-so-trivial choice of the warning/drift thresholds ? In a very extreme case, we could imagine that the trees are constantly being reset as soon as they start getting better at the classification task (assuming that their first and near-random guesses are incorrect).

smastelini May 9, 2023
Maintainer

It is not naive by any means, @Wael-BHY-BNP! It is actually a fair and important point.

Your observation is correct. It would be possible to check the actual errors to avoid resetting the models, but in practice, the confidence levels of ADWIN are the countermeasure.

Wael-BHY-BNP May 10, 2023
Author

Understood, thanks a lot for clearing that up !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Background tree replacement in AdaptiveRandomForestClassifier using a feature-based drift detection algorithm (such as ADWIN) #1234

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Background tree replacement in AdaptiveRandomForestClassifier using a feature-based drift detection algorithm (such as ADWIN) #1234

Wael-BHY-BNP May 9, 2023

Replies: 2 comments · 4 replies

MaxHalford May 9, 2023 Maintainer

smastelini May 9, 2023 Maintainer

smastelini May 9, 2023 Maintainer

Wael-BHY-BNP May 9, 2023 Author

smastelini May 9, 2023 Maintainer

Wael-BHY-BNP May 10, 2023 Author

Wael-BHY-BNP
May 9, 2023

Replies: 2 comments 4 replies

MaxHalford
May 9, 2023
Maintainer

smastelini May 9, 2023
Maintainer

smastelini
May 9, 2023
Maintainer

Wael-BHY-BNP May 9, 2023
Author

smastelini May 9, 2023
Maintainer

Wael-BHY-BNP May 10, 2023
Author