Skip to content

Releases: webis-de/small-text

v2.0.0.dev1

24 Nov 19:15
Compare
Choose a tag to compare

This intermediate release serves as a preliminary version of the upcoming v2.0.0. Consider it an alpha release, where interface changes are still possible.

Added

  • General
    • Python requirements raised to Python 3.8 since Python 3.7 has reached end of life on 2023-06-27.
    • Dropped torchtext as an integration dependency. For individual use cases it can of course still be used.
    • Added environment variables SMALL_TEXT_PROGRESS_BARS and SMALL_TEXT_OFFLINE to control the default behavior for progress bars and model downloading.
  • PoolBasedActiveLearner:
    • initialize_data() has been replaced by initialize() which can now also be used to provide an initial model in cold start scenarios. (#10)
  • Classification:
    • All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support torch.compile() which can be enabled on demand. (Requires PyTorch >= 2.0.0).
    • All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support Automatic Mixed Precision.
    • SetFitClassification.__init__() now has a verbosity parameter (similar to TransformerBasedClassification) through which you can control the progress bar output of SetFitClassification.fit().
    • TransformerBasedClassification:
      • Removed unnecessary token_type_ids keyword argument in model call.
      • Additional keyword args for config, tokenizer, and model can now be configured.
  • Embeddings:
    • Prevented unnecessary gradient computations for some embedding types and unified code structure.
  • Pytorch:
    • Added an inference_mode() context manager that applies torch.inference_mode or torch.no_grad for older Pytorch versions.
  • Query Strategies:
  • Vector Index Functionality:
    • A new vector index API provides implementations over a unified interface to use different implementations for k-nearest neighbor search.
    • Existing strategies that used a hard-coded vector search ([ContrastiveActiveLearning][contrastive_active_learning], [SEALS][seals], [AnchorSubsampling][anchor_subsampling]) have been adapted and can now be used with different vector index implementations.

Fixed

  • Fixed a bug where the clone() operation wrapped the labels, which then raised an error. This affected the single-label scenario for PytorchTextClassificationDataset and TransformersDataset. (#35)
  • Fixed a bug where the batching in greedy_coreset() and lightweight_coreset() resulted in incorrect batch sizes. (#50)
  • Fixed a bug where lightweight_coreset() failed when computing the norm of the elementwise mean vector.

Changed

  • General
    • Moved split_data() method from small_text.data.datasets to small_text.data.splits.
  • Dependencies
    • Raised setfit version to 1.1.0.
  • Classification:
    • The initialize() methods of all PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) are now more unified. (#57)
    • KimCNNClassifier / TransformerBasedClassification: model selection is now disabled by default. Also, it no longer saves models when disabled, thereby greatly reducing the runtime.
  • Utils
    • init_kmeans_plusplus_safe() now supports weighted kmeans++ initialization for scikit-learn>=1.3.0.

Removed

  • Deprecated functionality
    • Removed default_tensor_type() method.
    • Removed small_text.utils.labels.get_flattened_unique_labels().
    • Removed small_text.integrations.pytorch.utils.labels.get_flattened_unique_labels().
    • Classification
      • Removed early stopping legacy arguments in __init__() for KimCNN and TransformerBasedClassification. (Use fit() keyword arguments instead.)
      • Removed model selection legacy argument in TransformerBasedClassification.__init__().
  • The explicit installation instruction for conda was removed, but the small-text conda-forge package will remain.

v1.4.1

18 Aug 16:02
Compare
Choose a tag to compare

Bugfix release.

Fixed

  • Fixed an out of bounds error that occurred when DiscriminativeActiveLearning queries all remaining unlabeled data.
  • Fixed typos/wording in PoolBasedActiveLearner docstrings.
  • Pinned SetFit version in notebook example. (#64)
  • Fixed an out of bounds error that could occur in SetFitClassification for both 32bit systems and Windows. (#66)
  • Fixed errors in notebook examples that occurred with more recent seaborn / matplotlib versions.

Changed

  • Documentation: added links to bibliography. (#65)

v1.4.0

09 Jun 12:14
Compare
Choose a tag to compare

Fixes SetFit seed control and adds the AnchorSubsampling query strategy.

Added

Fixed

  • Changed the way how the seed is controlled in SetFitClassification since the seed was fixed unless explicitly set via the respective trainer keyword argument.

Changed

  • Documentation: Added a section where compatible transformer models are listed.
  • Documentation: Updated showcase section.

v1.3.3

29 Dec 21:23
Compare
Choose a tag to compare

Bugfix release.

Changed

  • An errata section was added to the documentation.

Fixed

  • Fixed a deviation from the paper, where DeltaFScore also considered negative label predictions for the agreement. (#51)
  • Fixed a bug in KappaAverage that affected the stopping behavior. (#52)

Contributors

@zakih2 @vmanc

v1.3.2

19 Aug 18:16
Compare
Choose a tag to compare

Bugfix release.

Fixed

  • Fixed a bug in TransformerBasedClassification where validations_per_epoch>=2 left the model in eval mode. (#40)

v1.3.1

22 Jul 19:55
Compare
Choose a tag to compare

Bugfix release.

Fixed

  • Fixed a bug where parameter groups were omitted when using TransformerBasedClassification's layer-specific fine-tuning functionality. (#36, #38)
  • Fixed a bug where class weighting resulted in nan values. (#39)

Contributors

@JP-SystemsX

v1.3.0

21 Feb 21:15
Compare
Choose a tag to compare

SetFitClassification now also supports dropout sampling (like KimCNNClassifier and TransformerBasedClassification).

Added

Fixed

  • Fixed broken link in README.md.
  • Fixed typo in README.md. (#26)

Changed

Stopping Criteria

Documentation

  • Updated the active learning setup figure.
  • The documentation of integrations has been reorganized.

Contributors

@rmitsch

v1.2.0

04 Feb 21:44
Compare
Choose a tag to compare

This release adds a SetFit classifier, the BALD query strategy, and two new example notebooks.

Added

Active Learning

Classification

Examples

  • Revised both existing notebook examples.
  • Added a notebook example for active learning with SetFit classifiers.
  • Added a notebook example for cold start initialization with SetFit classifiers.

Documentation

  • A showcase section has been added to the documentation.

Fixed

  • Distances in lightweight_coreset were not correctly projected onto the [0, 1] interval (but ranking was unaffected).

Changed

v1.1.1

14 Oct 20:42
Compare
Choose a tag to compare

Minor bug fix release.

Fixed

  • Fixed model selection which could raise an error under certain circumstances (#21).

v1.1.0

01 Oct 10:50
Compare
Choose a tag to compare

This release adds a conda package, more convenient imports, and improves many aspects of the classifcation functionality. Moreover, one new query strategy and three stopping criteria have been added.

Added

General

  • Small-Text package is now available via conda-forge.
  • Imports have been reorganized. You can import all public classes and methods from the top-level package (small_text):
    from small_text import PoolBasedActiveLearner
    

Classification

  • All classifiers now support weighting of training samples.
  • Early stopping has been reworked, improved, and documented (#18).
  • Model selection has been reworked and documented.
  • [!] KimCNNClassifier.__init()__: The default value of the (now deprecated) keyword argument early_stopping_acc has been changed from 0.98 to -1 in order to match TransformerBasedClassification.
  • [!] Removed weight renormalization after gradient clipping.

Datasets

  • The target_labels keyword argument in __init()__ will now raise a warning if not passed.
  • Added from_arrays() to SklearnDataset, PytorchTextClassificationDataset, and TransformersDataset to construct datasets more conveniently.

Query Strategies

Stopping Criteria

Deprecated

  • small_text.integrations.pytorch.utils.misc.default_tensor_type() is deprecated without replacement (#2).
  • TransformerBasedClassification and KimCNNClassifier:
    The keyword arguments for early stopping (early_stopping / early_stopping_no_improvement, early_stopping_acc) that are passed to __init__() are now deprecated. Use the early_stopping
    keyword argument in the fit() method instead (#18).

Fixed

Classification

  • KimCNNClassifier.fit() and TransformerBasedClassification.fit() now correctly
    process the scheduler keyword argument (#16).

Removed

  • Removed the strict check that every target label has to occur in the training data.
    (This is intended for multi-label settings with many labels; apart from that it is still recommended to make sure that all labels occur.)