Minor improvements to consistency model interface #256

paul-buerkner · 2024-11-21T16:40:50Z

I have just played around a bit with consistency models, and gather some minor observations here. I may add more points later on.

ConsistencyModel has an argument total_steps without default, which is suggested to be set to epochs * num_batches in the doc. Can we set this argument this internally automatically by default? This would make the code a bit cleaner. Also it should be clarified what happens if we set it to a different value than epochs * num_batches.
ContinuousConsistencyModel is a dangerous name for an architecture different to ConsistencyModel. Since we also have a ContinuousApproximator, users may think they need to combine it ContinuousConsistencyModel or that ConsistencyModel is not for continuous variables. Can we perhaps rename ContinuousConsistencyModel or even make it part of the ConsistencyModel class to be activated via an argument like continuous_time = TRUE?

The text was updated successfully, but these errors were encountered:

vpratz · 2024-11-22T08:30:09Z

Thanks for taking a look at this.

Regarding total_steps: This controls the discretization schedule during training, so we have to know where the scheduling ends. The last step of the training is the natural choice, but you could also end earlier. I don't know whether we want to suggest this though, but we certainly can document it. Determining this automatically would have to happen in fit, which would add some complexity to the approximators. I'm a bit hesitant to do this, because it would affect all our networks and training methods, and communicating what happens implicitly is a lot harder than to explain what to do to specify it explicitly. As a side-note: we do not need this parameter for the continuous-time CMs. If they should perform sufficiently well (which we still have to test), we could propose them as default and only power-users will encounter this difficulty.
Regarding the naming: Yes, I share that concern. The hard part here is that it mainly concerns the way it is trained (discrete-time vs continuous-time), and not all the other meanings discrete and continuous can apply to (not even the sampling, which is discrete-time for both of them). We could also use a different axis of distinction (like ConsistencyModel and SimplifiedConsistencyModel/ImprovedConsistencyModel), but I did not yet come up with one that is intuitive, so any ideas are welcome :) Combining them in one class might work, but might only move the problems one level lower. We still have to explain what it means to activate continuous_time, and in addition make it explicit in the doc string which arguments are used for which method. Code-wise, having two classes instead of if-clauses in most functions improves readability a lot.

paul-buerkner · 2024-11-22T08:44:05Z

You arguments make sense, thank you! Let's leave the total_steps as is for now. I do see the issue it will create internally.

We might also think about renaming the ContinuousConsistencyModel to just ConsistencyModel if it turns out to be working well. And then rename the "old" consistency model to something else(?). We are in a dev version after all so have the freedom to play around with it until we are happy :-D Just hypothetically, if we were to rename the old consistency model, do you have an idea on how we might want to call that one?

vpratz · 2024-11-22T08:50:13Z

If we do focus on scheduled vs not scheduled during training, we could use ScheduledConsistencyModel. Again, this is not something someone unfamiliar with the internals will be able to make much sense of. On the other hand, we probably will have a hard time finding something that will, as the difference is in the training internals.

paul-buerkner · 2024-11-22T08:57:07Z

It does sound to me as if we really might want to just have a single class from a user-interface perspective. That way, we can "hide" the difference better and give people a single entry point to consistency models. We could also change the default of continuous_time (and other future argument) at some point as we improve the underlying architectures.

I understand that it makes the internal code on some level less clean and I won't push too hard for this change. I just want to spark the discussion to ideally hide more technicalities from the user.

paul-buerkner added this to the BayesFlow 2.0 milestone Nov 21, 2024

paul-buerkner added the user interface Changes to the user interface and improvements in usability label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor improvements to consistency model interface #256

Minor improvements to consistency model interface #256

paul-buerkner commented Nov 21, 2024 •

edited

Loading

vpratz commented Nov 22, 2024

paul-buerkner commented Nov 22, 2024 •

edited

Loading

vpratz commented Nov 22, 2024

paul-buerkner commented Nov 22, 2024 •

edited

Loading

Minor improvements to consistency model interface #256

Minor improvements to consistency model interface #256

Comments

paul-buerkner commented Nov 21, 2024 • edited Loading

vpratz commented Nov 22, 2024

paul-buerkner commented Nov 22, 2024 • edited Loading

vpratz commented Nov 22, 2024

paul-buerkner commented Nov 22, 2024 • edited Loading

paul-buerkner commented Nov 21, 2024 •

edited

Loading

paul-buerkner commented Nov 22, 2024 •

edited

Loading

paul-buerkner commented Nov 22, 2024 •

edited

Loading