Skip to content
This repository has been archived by the owner on Jul 4, 2023. It is now read-only.

Release 0.4.0 - Encoder rewrite, variable sequence collate support, reduced memory usage, doctests, removed SRU

Compare
Choose a tag to compare
@PetrochukM PetrochukM released this 03 Apr 02:08
· 129 commits to master since this release
e852dae

Major updates

  • Rewrote encoders to better support more generic encoders like a LabelEncoder. Furthermore, added broad support for batch_encode, batch_decode and enforce_reversible.
  • Rearchitected default reserved tokens to ensure configurability while still providing the convenience of good defaults.
  • Added support to collate sequences with torch.utils.data.dataloader.DataLoader. For example:
from functools import partial
from torchnlp.utils import collate_tensors
from torchnlp.encoders.text import stack_and_pad_tensors

collate_fn = partial(collate_tensors, stack_tensors=stack_and_pad_tensors)
torch.utils.data.dataloader.DataLoader(*args, collate_fn=collate_fn, **kwargs)
  • Added doctest support ensuring the documented examples are tested.
  • Removed SRU support, it's too heavy of a module to support. Please use https://github.com/taolei87/sru instead. Happy to accept a PR with a better tested and documented SRU module!
  • Update version requirements to support Python 3.6 and 3.7, dropping support for Python 3.5.
  • Updated version requirements to support PyTorch 1.0+.
  • Merged #66 reducing the memory requirements for pre-trained word vectors by 2x.

Minor Updates

  • Formatted the code base with YAPF.
  • Fixed pandas and collections warnings.
  • Added invariant assertion to Encoder via enforce_reversible. For example:
    encoder = Encoder().enforce_reversible()
    Ensuring Encoder.decode(Encoder.encode(object)) == object
  • Fixed the accuracy metric for PyTorch 1.0.