Skip to content

Releases: vmenger/deduce

v2.3.1

01 Nov 13:33
426c08a
Compare
Choose a tag to compare

2.3.1 (2023-11-01)

Fixed

  • include data files recursively in package

v2.3.0

25 Oct 10:52
66f0e5e
Compare
Choose a tag to compare

2.3.0 (2023-10-25)

Added

  • lookup lists (and logic) for Dutch provinces, regions, municipalities and streets

Changed

  • name of residences annotator to placenames, now includes provinces, regions and municipalities
  • lookup lists (and logic) for residences
  • logic for streets, housenumber and housenumber letters

v2.2.0

28 Sep 09:47
3ccd61c
Compare
Choose a tag to compare

2.2.0 (2023-09-28)

Changed

  • tokenizer logic:
    • a token is now a sequence of alphanumeric characters, a single newline, or a single special character.
    • whitespaces are no longer considered tokens
  • moved token pattern logic to config, using a new TokenPatternAnnotator
  • moved context pattern logic to config, using a new ContextAnnotator
  • many updates to name detection logic
    • lookup list optimizations
    • added, removed and simplified patterns

v2.1.0

07 Aug 12:46
02349b8
Compare
Choose a tag to compare

2.1.0 (2023-08-07)

Added

  • a component for deidentifying BSN-nummers

Changed

  • updated dependencies
  • by default, deduce now recognizes and tags bsn nummers
  • by default, deduce now recognizes all other 7+ digit numbers as identifiers
  • improved regular expressions for e-mail address and url matching, with separate tags
  • logic for detecting phone numbers (improvements for hyphens, whitespaces, false positive identifiers)
  • improved regular expression for age matching
  • date detection logic:
    • now only recognizes combinations of day, month and year (day/month combinations caused many false positives)
    • detects year-month-day format in addition to (day-month-year)
  • loading a custom config now only replaces the config options that are explicitly set, using defaults for those not included in the custom config

Fixed

  • annotations can no longer be counted as adjacent when separated by newline or tab (and will thus not be merged)

Removed

  • a separate patient identifier tag, now superseded by a generic tag
  • detection of day/month combinations for dates, as this caused many false positives (e.g. lab values, numeric scores)

Deprecated

  • backwards compatibility, which was temporary added to transition from v1 to v2

v2.0.3

06 Apr 08:47
53616e7
Compare
Choose a tag to compare

2.0.3 (2023-04-06)

Fixed

  • removed 'decibutus' from list of institutions as it caused many false positives

v2.0.2

28 Mar 14:56
1d2d37c
Compare
Choose a tag to compare

2.0.2 (2023-03-28)

Changed

  • upgraded dependencies, including markdown-it-py which had a vulnerability

v2.0.1

09 Dec 11:22
4c70a5f
Compare
Choose a tag to compare

2.0.1 (2022-12-09)

Changed

  • updated dependencies

v2.0.0

05 Dec 09:38
b78eb9b
Compare
Choose a tag to compare

2.0.0 (2022-12-05)

Changed

  • major refactor that touches pretty much every line of code
  • use docdeid package for logic
  • speedups: now 973% faster
  • use lookup sets instead of lookup lists
  • refactor tokenizer
  • refactor annotators into separate classes, using structured annotations
  • guidelines for contributing

Added

  • introduced new interface for deidentification, using Deduce() class
  • a separate documentation page, with tutorial and migration guide
  • support for python 3.10 and 3.11

Removed

  • the annotate_text and deidentify_annotations functions
  • all in-text annotation (under the hood) and associated functions
  • support for given names. given names can be added as another first name in the Person class.
  • support for python 3.7 and 3.8

Fixed

  • < and > are no longer replaced by ( and ) respectively
  • deduce does not strip text (whitespaces, tabs at beginning/end of text) anymore

Release 1.0.8

23 Dec 13:42
ded1cf3
Compare
Choose a tag to compare

1.0.8 (2021-11-29)

Fixed

  • various modifications related to adding or subtracting spaces in annotated texts
  • remove the lowercasing of institutions' names
  • therefore, all structured annotations have texts matching the original text in the same span

Added

  • warn if there are any structured annotations whose annotated text does not match the original text in the span denoted by the structured annotation

Version 1.0.7

03 Nov 11:33
a8d5894
Compare
Choose a tag to compare

1.0.7 (2021-11-03)

Changed

  • Internal code formatting improvements

Added

  • Contributing guidelines