diff --git a/CHANGELOG.md b/CHANGELOG.md index 11e9b49f..48c76b43 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,30 @@ and this project adheres to [Semantic Versioning](http://semver.org/). +## 1.4.0 - 2023-12-01 + + +### Features + +- Added new curation_report.py to assist in upgrading ontologies between versions +- New disambiguation strategy to prefer mappings that have a default label that matches an entity. +- The OpenTargetsDiseaseOntologyParser has been heavily reworked, so that it uses the therapeutic_area concept to decide what records should be included. This has in turn yielded the subsets: measurement, medical_procedure, biological_process and phenotype. The measurement configuration is currently disabled as it requires heavy curation of the underlying strings. In addition, the OpenTargetsDiseaseOntologyParser now supports a custom ID grouping method, to make use of cross references. + +### Bugfixes + +- MemoryEfficientStringMatchingStep now only produces a single entity per class where multiple curations exist with different cases. +- Previously, the `tested_dependencies.txt` file in the model packs included an editable install of kazu, which wasn't intended. + We now exclude kazu from that output. +- Speed up model pack builds for model packs using `ExplosionStringMatchingStep`, by fixing a bug that caused the parsers to be populated twice in this case. + +### Deprecations and Removals + +- Removed pytorch-lightning as a dependency. The signatures of SapbertStringSimilarityScorer and TransformersModelForTokenClassificationNerStep have changed +- Renamed `create_phrasematchers_using_curations` method of `OntologyMatcher` to `create_phrasematchers`. The old name will continue to work until kazu 1.6, but using it will produce a `DeprecationWarning`. +- `MetadataDatabase.add_parser` now requires an `entity_class`. + This enables correct string normalisation in the `MappingStep` for the new disambiguation strategy. + + ## 1.3.2 - 2023-11-21 diff --git a/docs/_changelog.d/+create_phrasematchers_rename.removal.md b/docs/_changelog.d/+create_phrasematchers_rename.removal.md deleted file mode 100644 index 33806053..00000000 --- a/docs/_changelog.d/+create_phrasematchers_rename.removal.md +++ /dev/null @@ -1 +0,0 @@ -Renamed `create_phrasematchers_using_curations` method of `OntologyMatcher` to `create_phrasematchers`. The old name will continue to work until kazu 1.6, but using it will produce a `DeprecationWarning`. diff --git a/docs/_changelog.d/+curationreport.feature.rst b/docs/_changelog.d/+curationreport.feature.rst deleted file mode 100644 index 9ddd57e1..00000000 --- a/docs/_changelog.d/+curationreport.feature.rst +++ /dev/null @@ -1 +0,0 @@ -Added new curation_report.py to assist in upgrading ontologies between versions diff --git a/docs/_changelog.d/+doubleparserpopulate.bugfix.md b/docs/_changelog.d/+doubleparserpopulate.bugfix.md deleted file mode 100644 index 34c29c97..00000000 --- a/docs/_changelog.d/+doubleparserpopulate.bugfix.md +++ /dev/null @@ -1 +0,0 @@ -Speed up model pack builds for model packs using `ExplosionStringMatchingStep`, by fixing a bug that caused the parsers to be populated twice in this case. diff --git a/docs/_changelog.d/+excludekazutested_dependencies.bugfix.md b/docs/_changelog.d/+excludekazutested_dependencies.bugfix.md deleted file mode 100644 index 03825782..00000000 --- a/docs/_changelog.d/+excludekazutested_dependencies.bugfix.md +++ /dev/null @@ -1,2 +0,0 @@ -Previously, the `tested_dependencies.txt` file in the model packs included an editable install of kazu, which wasn't intended. -We now exclude kazu from that output. diff --git a/docs/_changelog.d/+metadbaddparser.removal.rst b/docs/_changelog.d/+metadbaddparser.removal.rst deleted file mode 100644 index 21ee352c..00000000 --- a/docs/_changelog.d/+metadbaddparser.removal.rst +++ /dev/null @@ -1,2 +0,0 @@ -`MetadataDatabase.add_parser` now requires an `entity_class`. -This enables correct string normalisation in the `MappingStep` for the new disambiguation strategy. diff --git a/docs/_changelog.d/+otdisease.feature.rst b/docs/_changelog.d/+otdisease.feature.rst deleted file mode 100644 index 2fcbae21..00000000 --- a/docs/_changelog.d/+otdisease.feature.rst +++ /dev/null @@ -1 +0,0 @@ -The OpenTargetsDiseaseOntologyParser has been heavily reworked, so that it uses the therapeutic_area concept to decide what records should be included. This has in turn yielded the subsets: measurement, medical_procedure, biological_process and phenotype. The measurement configuration is currently disabled as it requires heavy curation of the underlying strings. In addition, the OpenTargetsDiseaseOntologyParser now supports a custom ID grouping method, to make use of cross references. diff --git a/docs/_changelog.d/+pl.removal.rst b/docs/_changelog.d/+pl.removal.rst deleted file mode 100644 index 5cc26643..00000000 --- a/docs/_changelog.d/+pl.removal.rst +++ /dev/null @@ -1 +0,0 @@ -Removed pytorch-lightning as a dependency. The signatures of SapbertStringSimilarityScorer and TransformersModelForTokenClassificationNerStep have changed diff --git a/docs/_changelog.d/+prefdefaultlabel.feature.rst b/docs/_changelog.d/+prefdefaultlabel.feature.rst deleted file mode 100644 index dafac030..00000000 --- a/docs/_changelog.d/+prefdefaultlabel.feature.rst +++ /dev/null @@ -1 +0,0 @@ -New disambiguation strategy to prefer mappings that have a default label that matches an entity. diff --git a/docs/_changelog.d/+stringmapper.bugfix.rst b/docs/_changelog.d/+stringmapper.bugfix.rst deleted file mode 100644 index 621520b1..00000000 --- a/docs/_changelog.d/+stringmapper.bugfix.rst +++ /dev/null @@ -1 +0,0 @@ -MemoryEfficientStringMatchingStep now only produces a single entity per class where multiple curations exist with different cases. diff --git a/kazu/__init__.py b/kazu/__init__.py index f708a9b2..3e8d9f94 100644 --- a/kazu/__init__.py +++ b/kazu/__init__.py @@ -1 +1 @@ -__version__ = "1.3.2" +__version__ = "1.4.0"