Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor improvements to consistency model interface #256

Open
2 tasks
paul-buerkner opened this issue Nov 21, 2024 · 4 comments
Open
2 tasks

Minor improvements to consistency model interface #256

paul-buerkner opened this issue Nov 21, 2024 · 4 comments
Labels
user interface Changes to the user interface and improvements in usability
Milestone

Comments

@paul-buerkner
Copy link
Contributor

paul-buerkner commented Nov 21, 2024

I have just played around a bit with consistency models, and gather some minor observations here. I may add more points later on.

  • ConsistencyModel has an argument total_steps without default, which is suggested to be set to epochs * num_batches in the doc. Can we set this argument this internally automatically by default? This would make the code a bit cleaner. Also it should be clarified what happens if we set it to a different value than epochs * num_batches.
  • ContinuousConsistencyModel is a dangerous name for an architecture different to ConsistencyModel. Since we also have a ContinuousApproximator, users may think they need to combine it ContinuousConsistencyModel or that ConsistencyModel is not for continuous variables. Can we perhaps rename ContinuousConsistencyModel or even make it part of the ConsistencyModel class to be activated via an argument like continuous_time = TRUE?
@paul-buerkner paul-buerkner added this to the BayesFlow 2.0 milestone Nov 21, 2024
@paul-buerkner paul-buerkner added the user interface Changes to the user interface and improvements in usability label Nov 21, 2024
@vpratz
Copy link
Collaborator

vpratz commented Nov 22, 2024

Thanks for taking a look at this.

  • Regarding total_steps: This controls the discretization schedule during training, so we have to know where the scheduling ends. The last step of the training is the natural choice, but you could also end earlier. I don't know whether we want to suggest this though, but we certainly can document it. Determining this automatically would have to happen in fit, which would add some complexity to the approximators. I'm a bit hesitant to do this, because it would affect all our networks and training methods, and communicating what happens implicitly is a lot harder than to explain what to do to specify it explicitly. As a side-note: we do not need this parameter for the continuous-time CMs. If they should perform sufficiently well (which we still have to test), we could propose them as default and only power-users will encounter this difficulty.

  • Regarding the naming: Yes, I share that concern. The hard part here is that it mainly concerns the way it is trained (discrete-time vs continuous-time), and not all the other meanings discrete and continuous can apply to (not even the sampling, which is discrete-time for both of them). We could also use a different axis of distinction (like ConsistencyModel and SimplifiedConsistencyModel/ImprovedConsistencyModel), but I did not yet come up with one that is intuitive, so any ideas are welcome :) Combining them in one class might work, but might only move the problems one level lower. We still have to explain what it means to activate continuous_time, and in addition make it explicit in the doc string which arguments are used for which method. Code-wise, having two classes instead of if-clauses in most functions improves readability a lot.

@paul-buerkner
Copy link
Contributor Author

paul-buerkner commented Nov 22, 2024

You arguments make sense, thank you! Let's leave the total_steps as is for now. I do see the issue it will create internally.

We might also think about renaming the ContinuousConsistencyModel to just ConsistencyModel if it turns out to be working well. And then rename the "old" consistency model to something else(?). We are in a dev version after all so have the freedom to play around with it until we are happy :-D Just hypothetically, if we were to rename the old consistency model, do you have an idea on how we might want to call that one?

@vpratz
Copy link
Collaborator

vpratz commented Nov 22, 2024

If we do focus on scheduled vs not scheduled during training, we could use ScheduledConsistencyModel. Again, this is not something someone unfamiliar with the internals will be able to make much sense of. On the other hand, we probably will have a hard time finding something that will, as the difference is in the training internals.

@paul-buerkner
Copy link
Contributor Author

paul-buerkner commented Nov 22, 2024

It does sound to me as if we really might want to just have a single class from a user-interface perspective. That way, we can "hide" the difference better and give people a single entry point to consistency models. We could also change the default of continuous_time (and other future argument) at some point as we improve the underlying architectures.

I understand that it makes the internal code on some level less clean and I won't push too hard for this change. I just want to spark the discussion to ideally hide more technicalities from the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user interface Changes to the user interface and improvements in usability
Projects
None yet
Development

No branches or pull requests

2 participants