diff --git a/CHANGELOG.md b/CHANGELOG.md index a56251d4..e4efe09f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,39 @@ and this project adheres to [Semantic Versioning](http://semver.org/). +## 1.1.0 - 2023-10-10 + + +### Features + +- A couple of easy, non-behaviour changing performance improvements that on their own sped up Kazu around 10% (but other changes in this release will affect this too, and speedup will be workload dependent) +- Added new OpsinStep which maps IUPAC drug strings to canonical SMILES - see the API docs for details. + This functionality is currently experimental and may be changed without making a new major release. + Please [open a GitHub issue](https://github.com/AstraZeneca/KAZU/issues/new) if you wish to use this functionality. +- Ensembl Gene IDs are now grouped by HGNC approved symbols, eliminating disambiguation problems for gene IDs belonging to the same gene. +- Entity produced by TransformersModelForTokenClassificationNerStep but without Mappings will be dropped by default now, in the same way as for other NER steps. + This was an exception to handle an AstraZeneca internal use case that wanted this different, but it could cause issues with MergeOverlappingEntsStep in some cases, + so it is safer to have this off by default. +- New SpacyPipelines abstraction, which allows using the same spacy pipeline in different places, but only load it once and prevent uncontrolled memory growth. + On the uncontrolled memory growth, see https://github.com/explosion/spaCy/discussions/10015 for why this was happening - the 'fix' is to reload a spacy pipeline after a certain number of calls. +- Slimmed down base dependencies by removing dependencies for steps not in the base pipeline. + These can be added back in manually in user projects, or use the new `kazu[all_steps]` dependency + group to install dependencies for all steps as before. The docs reflect this, and informative errors + are raised when trying to use these steps when dependencies aren't installed. +- Very large memory savings from an overhaul of the string matching process. + The new version should also be faster in general, but the priority was memory rather than speed (since previously, this step accounted for the majority of kazu's memory usage but only a fraction of its runtime) + +### Bugfixes + +- Curated terms that drop the same normalised version of the term no longer report erroneous warnings. + +### Deprecations and Removals + +- The API for building custom model packs has changed to be more flexible, and more simple. + This is a backwards-incompatible change, but we don't currently expect/know of any non-AstraZeneca users of this script, so won't do a major version bump for it. + Please let us know (in a [GitHub issue](https://github.com/AstraZeneca/KAZU/issues/new)) if you are using this and this change was problematic for you. + + ## 1.0.3 - 2023-08-15 diff --git a/docs/_changelog.d/+allstepsdeps.feature.md b/docs/_changelog.d/+allstepsdeps.feature.md deleted file mode 100644 index f4d62b7b..00000000 --- a/docs/_changelog.d/+allstepsdeps.feature.md +++ /dev/null @@ -1,4 +0,0 @@ -Slimmed down base dependencies by removing dependencies for steps not in the base pipeline. -These can be added back in manually in user projects, or use the new `kazu[all_steps]` dependency -group to install dependencies for all steps as before. The docs reflect this, and informative errors -are raised when trying to use these steps when dependencies aren't installed. diff --git a/docs/_changelog.d/+improvedgenegrouping.feature.md b/docs/_changelog.d/+improvedgenegrouping.feature.md deleted file mode 100644 index d45d0e0d..00000000 --- a/docs/_changelog.d/+improvedgenegrouping.feature.md +++ /dev/null @@ -1 +0,0 @@ -Ensembl Gene IDs are now grouped by HGNC approved symbols, eliminating disambiguation problems for gene IDs belonging to the same gene. diff --git a/docs/_changelog.d/+modelpackbuilding.removal.md b/docs/_changelog.d/+modelpackbuilding.removal.md deleted file mode 100644 index 3c299ebe..00000000 --- a/docs/_changelog.d/+modelpackbuilding.removal.md +++ /dev/null @@ -1,3 +0,0 @@ -The API for building custom model packs has changed to be more flexible, and more simple. -This is a backwards-incompatible change, but we don't currently expect/know of any non-AstraZeneca users of this script, so won't do a major version bump for it. -Please let us know (in a [GitHub issue](https://github.com/AstraZeneca/KAZU/issues/new)) if you are using this and this change was problematic for you. diff --git a/docs/_changelog.d/+opsinstep.feature.md b/docs/_changelog.d/+opsinstep.feature.md deleted file mode 100644 index 3ce72bb7..00000000 --- a/docs/_changelog.d/+opsinstep.feature.md +++ /dev/null @@ -1,3 +0,0 @@ -Added new OpsinStep which maps IUPAC drug strings to canonical SMILES - see the API docs for details. -This functionality is currently experimental and may be changed without making a new major release. -Please [open a GitHub issue](https://github.com/AstraZeneca/KAZU/issues/new) if you wish to use this functionality. diff --git a/docs/_changelog.d/+perfimprovements.feature.md b/docs/_changelog.d/+perfimprovements.feature.md deleted file mode 100644 index 5bdde2d1..00000000 --- a/docs/_changelog.d/+perfimprovements.feature.md +++ /dev/null @@ -1 +0,0 @@ -A couple of easy, non-behaviour changing performance improvements that on their own sped up Kazu around 10% (but other changes in this release will affect this too, and speedup will be workload dependent) diff --git a/docs/_changelog.d/+removeerroneouswarnings.bugfix.md b/docs/_changelog.d/+removeerroneouswarnings.bugfix.md deleted file mode 100644 index f86924fa..00000000 --- a/docs/_changelog.d/+removeerroneouswarnings.bugfix.md +++ /dev/null @@ -1 +0,0 @@ -Curated terms that drop the same normalised version of the term no longer report erroneous warnings. diff --git a/docs/_changelog.d/+spacyvocabmanagement.feature.md b/docs/_changelog.d/+spacyvocabmanagement.feature.md deleted file mode 100644 index b28b83c5..00000000 --- a/docs/_changelog.d/+spacyvocabmanagement.feature.md +++ /dev/null @@ -1,2 +0,0 @@ -New SpacyPipelines abstraction, which allows using the same spacy pipeline in different places, but only load it once and prevent uncontrolled memory growth. -On the uncontrolled memory growth, see https://github.com/explosion/spaCy/discussions/10015 for why this was happening - the 'fix' is to reload a spacy pipeline after a certain number of calls. diff --git a/docs/_changelog.d/+stringmatchingv2.feature.md b/docs/_changelog.d/+stringmatchingv2.feature.md deleted file mode 100644 index 4af2e3fc..00000000 --- a/docs/_changelog.d/+stringmatchingv2.feature.md +++ /dev/null @@ -1,2 +0,0 @@ -Very large memory savings from an overhaul of the string matching process. -The new version should also be faster in general, but the priority was memory rather than speed (since previously, this step accounted for the majority of kazu's memory usage but only a fraction of its runtime) diff --git a/docs/_changelog.d/+unmappedtinybernentsdropped.feature.md b/docs/_changelog.d/+unmappedtinybernentsdropped.feature.md deleted file mode 100644 index 97cae185..00000000 --- a/docs/_changelog.d/+unmappedtinybernentsdropped.feature.md +++ /dev/null @@ -1,3 +0,0 @@ -Entity produced by TransformersModelForTokenClassificationNerStep but without Mappings will be dropped by default now, in the same way as for other NER steps. -This was an exception to handle an AstraZeneca internal use case that wanted this different, but it could cause issues with MergeOverlappingEntsStep in some cases, -so it is safer to have this off by default. diff --git a/kazu/__init__.py b/kazu/__init__.py index 976498ab..6849410a 100644 --- a/kazu/__init__.py +++ b/kazu/__init__.py @@ -1 +1 @@ -__version__ = "1.0.3" +__version__ = "1.1.0"