Background tree replacement in AdaptiveRandomForestClassifier using a feature-based drift detection algorithm (such as ADWIN) #1234
-
Hi, First of all, I'd like to thank you for this package and for the great support you're putting out for it ! I have a question surrounding the drift detection part in the ARFClassifier. I read the article referenced for the algorithm, and on paper, the idea seems straightforward :
However, in the specific case that we have a purely feature-based drift detection algorithm such as ADWIN (i.e where we monitor changes in the distribution of X and not Y|X), wouldn't this mean that the warning and/or drift thresholds will always be hit at the same time for all Hoeffding trees simultaneously ? In other words, since all trees see the same observation x when learn_one() is called and with the same weight (since the Poisson sampling is done at forest level and not on the tree level if i understand correctly), wouldn't this mean that whenever a drift threshold is hit, it is in fact hit for all trees, and thus the whole forest is reset (by replacing every tree with their respective background trees) rather than "selectively replacing trees" as claimed by the paper ? If that's not the case, could you kindly elaborate on the reasons why ? My intuition leads me to belive that it has something to do with the fact that each tree only uses a randomly subsampled set of features to build the splits, but I'm not sure if the drift detection part is made once these features have been defined, or at a "higher level", and I wasn't able to find the answer in the package's code after a modest search. I hope the wording of the question is not too confusing, and I obviously remain at your disposal to clear up any doubt. Thanks ! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
@smastelini can you take this when you have time? |
Beta Was this translation helpful? Give feedback.
-
Hi @Wael-BHY-BNP, you are not far from the final answer. In fact ARF uses sampling with replacement for each tree, by relying on Poison sampling. Each tree ends up monitoring a different sample. Therefore even the background trees will monitor a different sample compared to the foreground ones. There is also feature subsampling per leaf (as you mention). However, I disagree that ADWIN is a feature-based detector in this context. It is, of course, univariate, but in ARF and other algorithms ADWIN is applied in a supervised fashion. Each instance of ADWIN (two per foreground tree) monitors the tree error (1-0 error in the classification case). Each time a new instance arrives at Historically, ADWIN was broadly used to monitor errors in classifiers. For that reason I've seen papers referring to it as a supervised drift detector. I personally believe there is still a lack for a better terminology. ADWIN is univariate and that it. It is up to the user choose which input series to use. By monitoring the tree error not every tree will be reset at the same time. Unless, per chance, even after online bagging and feature subsampling all trees are affected equally by a drift. Finally, the warning and drift detectors work at different confidence levels in the case of ADWIN. Therefore, the warning level is prone to trigger a concept drift earlier although it could be also a false detection. I hope I was able to clarify a bit your question. Please, do not hesitate to send a follow up question if my answer was not clear or other new questions pop up :) |
Beta Was this translation helpful? Give feedback.
Hi @Wael-BHY-BNP, you are not far from the final answer.
In fact ARF uses sampling with replacement for each tree, by relying on Poison sampling. Each tree ends up monitoring a different sample. Therefore even the background trees will monitor a different sample compared to the foreground ones. There is also feature subsampling per leaf (as you mention).
However, I disagree that ADWIN is a feature-based detector in this context. It is, of course, univariate, but in ARF and other algorithms ADWIN is applied in a supervised fashion. Each instance of ADWIN (two per foreground tree) monitors the tree error (1-0 error in the classification case). Each time a new instance arrives at
learn_one
, …