-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce stop_at_shortest in sample and round_robin #76
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
If I had to nit, I'd says that I feed it weird to have the logic spread out in 6 files.
sampled_data_source and round_robin_data_source, may not need to be their own class anymore.
But it's also consistent with the rest of the codebase, where we have two files per operator.
@@ -76,6 +77,28 @@ def test_op_works_when_pipelines_have_different_lengths(self) -> None: | |||
|
|||
pipeline.reset() | |||
|
|||
@pytest.mark.skipif( | |||
python_devel_only(), | |||
reason="New fairseq2n API in Python-only installation. Skipping till v0.2.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR:
I'm not fond of the behavior of this python devel only
First it seems to be about not disturbing Python only developer that didn't build fairseq2n locally. So fairseq2n_dev_only
would be more clear.
Second I'd prefer if it was annotation @fairseq2n_dev_only
with a standard reason.
Third when are we supposed to removed those annotations ? Will we have to do this manually ?
I don't really see who would want to run our unit tests from github main branch but can't build fairseq2n.
Can someone explain me the use case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First it seems to be about not disturbing Python only developer that didn't build fairseq2n locally. So fairseq2n_dev_only would be more clear.
I think if you convert it to a standalone annotation, a name like @fairseq2n_dev_only
makes sense, but as a parameter to skip_if
, fairseq2n_dev_only
does not sound right, because it is just the opposite: you want to skip the test if this is a python-only installation. I am open to having a standalone annotation though.
Third when are we supposed to removed those annotations ? Will we have to do this manually ?
There are two ways to resolve this issue. We can either have a version range check, like @skipif(fairseq2n.__version__< 0.2)
which would not require us to remove the annotation in the future. The disadvantage is, those checks would not be relevant after a stable release and will start to pile up with time for no good reason. The other alternative is what we have right now. Just checking whether the versions do not match @skipif(fairseq2.__version__ != fairseq2n.__version__)
. The advantage is that this forces you to remove the annotations before a stable release (otherwise the tests won't run), the disadvantage is that it is a manual work that needs to be done as part of the version bump. Having said that I think we won't need this annotation that often in the future once the native library becomes more mature. The manual removal is not ideal, but we might add some extra measures like forcibly fail any test that has this annotation and if the latest version is already past the one in the annotation.
Can someone explain me the use case ?
Pretty much anyone who wants to contribute to the Python code base of fairseq2, but does not want to deal (or have experience) with C++. The whole reason why we ended up with fairseq2n in last minute was because people had quite a bit of trouble with building the library (e.g. while working on seq-gen and UnitY upstreaming). The original feedback was to completely separate the C++ parts to a separate repo, but we then decided to keep it here and offer pre-built binaries so that people who want to contribute to fairseq2 won't have to build the native parts if they don't want to. By the way, I got this approach pretty much from jax where they similarly separate the library into two parts: jax (python only) and jaxlib (pre-built).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, just gave it a try in #83. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, let's follow on that PR, and merge this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work! Just left a few nit comments. There is only one line where we should leverage move-semantics for performance, but otherwise most of the my comments are optional suggestions. Let me know once you at least address that performance related issue, and I can stamp it right away.
@gwenzek I agree that there is too many files now but we must have a class for round_robin and sample since we have different logic for the |
@cbalioglu Thanks for the detailed review! I did all the changes requested and the PR should be ready by now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two very nit comments. Thanks again for this work! The implementation has become much cleaner! Feel free to merge it whenever you want.
What does this PR do? Please describe:
Following this PR review we decided to do the following:
sample
andround_robin
into separate classsample
operator useful for up_samplingstop_at_shortest
flag to both operator to give user more flexibility on how the operator works.Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.
Check list: