Releases · AstraZeneca/KAZU

17 Dec 16:10

github-actions

v2.3.0

573f9e4

v2.3.0 Latest

Latest

2.3.0 - 2024-12-17

Features

Release new multilabel biomedBERT model trained on LLM (Gemini) synthetically generated NER data. The model was trained on over 7000 LLM annoted documents with a total of 295822 samples.
The model was trained for 21 epochs and achieved an F1 score of 95.6% on a held out test set. (multilabel_bert)
added multilabel NER training example and config.
added scaling kazu with Ray docs and example.

Bugfixes

Fix issue with TransformersModelForTokenClassificationNerStep when processing large amounts of documents. The fix offloads tensors onto cpu before performin the torch.cat operation which lead to a zero tensor before. (pytorch_memory_issue)

Assets 5

21 Oct 14:42

github-actions

v2.2.1

bad8132

v2.2.1

2.2.1 - 2024-10-21

Features

Update ontologies to later versions (ontology_updates)

Bugfixes

Fix synonym generator to only check if strings exist in original synonyms. Update tests (combinatorial_synonym_generator)
Remove save/reset button not belonging on page 1 (krt)

Assets 5

18 Sep 10:38

github-actions

v2.2.0

2532afb

v2.2.0

2.2.0 - 2024-09-18

Features

New LLMNERStep, for performing NER with LLMs

Bugfixes

Fix bug with Chromosome X being converted to Chromosome 10 raised in #42 (chromosomeX)
Fix pip install command in docs raised in #56 (docs_pip_command)
Added new multiword AutoCurationAction, and adjusted some curations as per #58.

Assets 5

08 Jul 16:43

github-actions

v2.1.1

9f9741f

v2.1.1

2.1.1 - 2024-07-08

Bug fixes

Fixed bug in model pack build where build config acceptance test param was not respected.
Fixed an issue where hydra configs were not properly converting builtins in the KRT.
Made some minor adjustments to curations.
build_and_test_model_packs.py will now throw an assertion error if multiple packs are built with debug mode on.

Assets 5

04 Jul 08:57

github-actions

v2.1.0

2aa260c

v2.1.0

2.1.0 - 2024-07-04

Features

Added new Kazu Resource Tool UI to ease the process of updating resources and resource configuration.
New OntologyDownloader abstraction to assist with resource updating.
Updated resources for June 2024.

Assets 5

04 Jun 13:01

EFord36

v2.0.0

902ebe4

v2.0.0

2.0.0 - 2024-06-04

Features

(De)serialization has been greatly improved, simplified, made correct, and given a slightly more compact serialized representation.
This does mean there are some small changes in (de)serialization behaviour since the previous release.
Curation process has been significantly improved and simplified for the end user, including introducing the AutoCurator concept to aid in this. This will enable us to build out better documentation and an interactive tool in future releases, which are currently in draft. Overally, this will greatly simplify upgrading ontology versions, adding curations for a new ontology etc.
Datamodel has been substantially revised in a backwards incompatible manner to clear up confusing concepts, fix longstanding issues etc.
New Zero shot NER model with GLiNER

Deprecations and Removals

Remove deprecated GildaUtils.replace_dashes. This was superceded by GildaUtils.split_on_dashes_or_space and was already deprecated pending removal.
Remove deprecated SpacyToKazuObjectMapper, as this was renamed to KazuToSpacyObjectMapper, and the old name already deprecated pending removal.
Remove deprecated create_phrasematchers_using_curations method of OntologyMatcher. This was renamed to create_phrasematchers and was already deprecated pending removal.
Rename Document.json to to_json, and remove optional arguments.
The previous name was inconsistent with naming on other classes, as the function signature were parallel to to_json methods.
The argument drop_unmapped_ents had functionality that was duplicated with DropUnmappedEntityFilter within the CleanupStep,
and it made sense to add the drop_terms behaviour to a new LinkingCandidateRemovalCleanupAction to collect this behaviour together
and significantly simplify the Document serialization code.
Rename ParserActions.from_json and GlobalParserActions.from_json to from_dict.
The previous names were misleading, as the function signature were parallel to the from_dict methods on other classes, not to their from_json methods.
Renamed SynonymDatabase.add to SynonymDatabase.add_parser, for consistency with MetadataDatabase.add_parser.

Assets 5

29 Jan 15:34

RichJackson

v1.5.1

fc5cef3

v1.5.1

1.5.1 - 2024-01-29

Bugfixes

Pinned scipy to <1.12.0 due to breaking API change.

Assets 5

29 Jan 11:03

RichJackson

v1.5.0

ccf0bdb

v1.5.0

1.5.0 - 2024-01-19

Features

Added new cleanup action: DropMappingsByParserNameRankAction
Added new disambiguation strategy: PreferNearestEmbeddingToDefaultLabelDisambiguationStrategy.
DefinedElsewhereInDocumentDisambiguationStrategy has slightly changed, so that it will only return mappings that were found elsewhere in the document, rather than the whole EquivalentIdSet where those ids were contained
New disambiguation methodology GildaTfIdfDisambiguationStrategy.
OpenTargetsTargetOntologyParser now has a biotype filter parameter.

Deprecations and Removals

Deprecated GildaUtils.replace_dashes in favour of GildaUtils.split_on_dashes_or_space, as the latter improves efficiency in Kazu.
GildaUtils.replace_dashes will continue to work until kazu 1.6, but using it will produce a DeprecationWarning.
Please open a GitHub issue if you wish this to remain.

Assets 4

05 Dec 11:50

EFord36

v1.4.0

ddca57f

v1.4.0

1.4.0 - 2023-12-01

Features

Added new curation_report.py to assist in upgrading ontologies between versions
New disambiguation strategy to prefer mappings that have a default label that matches an entity.
The OpenTargetsDiseaseOntologyParser has been heavily reworked, so that it uses the therapeutic_area concept to decide what records should be included. This has in turn yielded the subsets: measurement, medical_procedure, biological_process and phenotype. The measurement configuration is currently disabled as it requires heavy curation of the underlying strings. In addition, the OpenTargetsDiseaseOntologyParser now supports a custom ID grouping method, to make use of cross references.

Bugfixes

MemoryEfficientStringMatchingStep now only produces a single entity per class where multiple curations exist with different cases.
Previously, the tested_dependencies.txt file in the model packs included an editable install of kazu, which wasn't intended.
We now exclude kazu from that output.
Speed up model pack builds for model packs using ExplosionStringMatchingStep, by fixing a bug that caused the parsers to be populated twice in this case.

Deprecations and Removals

Removed pytorch-lightning as a dependency. The signatures of SapbertStringSimilarityScorer and TransformersModelForTokenClassificationNerStep have changed
Renamed create_phrasematchers_using_curations method of OntologyMatcher to create_phrasematchers. The old name will continue to work until kazu 1.6, but using it will produce a DeprecationWarning.
MetadataDatabase.add_parser now requires an entity_class.
This enables correct string normalisation in the MappingStep for the new disambiguation strategy.

Assets 5

05 Dec 11:50

EFord36

v1.3.2

6cb7257

v1.3.2

1.3.2 - 2023-11-21

Bugfixes

Hits with scores of 0.0 are no longer returned by DictionaryIndex
Pin lightning-utilities dependency, a new version of which completely broke the model inference, despite lightning itself being pinned (they didn't pin lightning-utilities appropriately in the version we're using).

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.3.0 - 2024-12-17

Features

Bugfixes

2.2.1 - 2024-10-21

Features

Bugfixes

2.2.0 - 2024-09-18

Features

Bugfixes

2.1.1 - 2024-07-08

Bug fixes

2.1.0 - 2024-07-04

Features

2.0.0 - 2024-06-04

Features

Deprecations and Removals

1.5.1 - 2024-01-29

Bugfixes

1.5.0 - 2024-01-19

Features

Deprecations and Removals

1.4.0 - 2023-12-01

Features

Bugfixes

Deprecations and Removals

1.3.2 - 2023-11-21

Bugfixes

Releases: AstraZeneca/KAZU

v2.3.0

2.3.0 - 2024-12-17

Features

Bugfixes

v2.2.1

2.2.1 - 2024-10-21

Features

Bugfixes

v2.2.0

2.2.0 - 2024-09-18

Features

Bugfixes

v2.1.1

2.1.1 - 2024-07-08

Bug fixes

v2.1.0

2.1.0 - 2024-07-04

Features

v2.0.0

2.0.0 - 2024-06-04

Features

Deprecations and Removals

v1.5.1

1.5.1 - 2024-01-29

Bugfixes

v1.5.0

1.5.0 - 2024-01-19

Features

Deprecations and Removals

v1.4.0

1.4.0 - 2023-12-01

Features

Bugfixes

Deprecations and Removals

v1.3.2

1.3.2 - 2023-11-21

Bugfixes