This repository has been archived by the owner on Jul 4, 2023. It is now read-only.
Release 0.4.0 - Encoder rewrite, variable sequence collate support, reduced memory usage, doctests, removed SRU
Major updates
- Rewrote encoders to better support more generic encoders like a
LabelEncoder
. Furthermore, added broad support forbatch_encode
,batch_decode
andenforce_reversible
. - Rearchitected default reserved tokens to ensure configurability while still providing the convenience of good defaults.
- Added support to collate sequences with
torch.utils.data.dataloader.DataLoader
. For example:
from functools import partial
from torchnlp.utils import collate_tensors
from torchnlp.encoders.text import stack_and_pad_tensors
collate_fn = partial(collate_tensors, stack_tensors=stack_and_pad_tensors)
torch.utils.data.dataloader.DataLoader(*args, collate_fn=collate_fn, **kwargs)
- Added doctest support ensuring the documented examples are tested.
- Removed SRU support, it's too heavy of a module to support. Please use https://github.com/taolei87/sru instead. Happy to accept a PR with a better tested and documented SRU module!
- Update version requirements to support Python 3.6 and 3.7, dropping support for Python 3.5.
- Updated version requirements to support PyTorch 1.0+.
- Merged #66 reducing the memory requirements for pre-trained word vectors by 2x.
Minor Updates
- Formatted the code base with YAPF.
- Fixed
pandas
andcollections
warnings. - Added invariant assertion to
Encoder
viaenforce_reversible
. For example:Ensuringencoder = Encoder().enforce_reversible()
Encoder.decode(Encoder.encode(object)) == object
- Fixed the accuracy metric for PyTorch 1.0.