Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR contains several important extensions and improvements of Batch. 1. A method for setting a new sequence. This is a first step towards using code that's less reliant on setting attributes directly. The method also permits setting a subsequence and filling up with default values. This will help ensuring that all entries in a batch are of the same length, something that we should start enforcing soon. 2. A new method for applying arbitrary transformations to "leaves" in a Batch. This simplifies already existing code and is generally very handy for users 3. The arbitrary transformations allowed implementing `isnull`, `hasnull` and `dropnull`, which helps finding errors early. 4. The arbitrary transformations now also allow extracting a `schema` from a batch. This was used in `Batch.cat_` to perform additional input validation (we now make sure there that the structures are the same when concatenating batches). This input validation is a **breaking change**! Some tests that concatenated incompatible batches were removed. Eventually, we can add a `get_schema` method to the batch that will retrieve metainfo like shapes and datatypes. For now, this is delegated to the user who can use the new `apply_values_transform` 5. New feature: slicing of torch distributions. Previously several batches contained instances of `Distribution` which were not properly sliced, inviting bugs and errors in user code. This is now fixed - albeit not in a pretty way (torch doesn't allow slicing the objects natively) 6. Stricter typing and some further minor extensions and simplifications The new code was extensively tested and documented.
- Loading branch information