Releases: vmenger/deduce
Releases · vmenger/deduce
v2.3.1
v2.3.0
2.3.0 (2023-10-25)
Added
- lookup lists (and logic) for Dutch provinces, regions, municipalities and streets
Changed
- name of
residences
annotator toplacenames
, now includes provinces, regions and municipalities - lookup lists (and logic) for residences
- logic for streets, housenumber and housenumber letters
v2.2.0
2.2.0 (2023-09-28)
Changed
- tokenizer logic:
- a token is now a sequence of alphanumeric characters, a single newline, or a single special character.
- whitespaces are no longer considered tokens
- moved token pattern logic to config, using a new
TokenPatternAnnotator
- moved context pattern logic to config, using a new
ContextAnnotator
- many updates to name detection logic
- lookup list optimizations
- added, removed and simplified patterns
v2.1.0
2.1.0 (2023-08-07)
Added
- a component for deidentifying BSN-nummers
Changed
- updated dependencies
- by default, deduce now recognizes and tags bsn nummers
- by default, deduce now recognizes all other 7+ digit numbers as identifiers
- improved regular expressions for e-mail address and url matching, with separate tags
- logic for detecting phone numbers (improvements for hyphens, whitespaces, false positive identifiers)
- improved regular expression for age matching
- date detection logic:
- now only recognizes combinations of day, month and year (day/month combinations caused many false positives)
- detects year-month-day format in addition to (day-month-year)
- loading a custom config now only replaces the config options that are explicitly set, using defaults for those not included in the custom config
Fixed
- annotations can no longer be counted as adjacent when separated by newline or tab (and will thus not be merged)
Removed
- a separate patient identifier tag, now superseded by a generic tag
- detection of day/month combinations for dates, as this caused many false positives (e.g. lab values, numeric scores)
Deprecated
- backwards compatibility, which was temporary added to transition from v1 to v2
v2.0.3
v2.0.2
v2.0.1
v2.0.0
2.0.0 (2022-12-05)
Changed
- major refactor that touches pretty much every line of code
- use
docdeid
package for logic - speedups: now 973% faster
- use lookup sets instead of lookup lists
- refactor tokenizer
- refactor annotators into separate classes, using structured annotations
- guidelines for contributing
Added
- introduced new interface for deidentification, using
Deduce()
class - a separate documentation page, with tutorial and migration guide
- support for python 3.10 and 3.11
Removed
- the
annotate_text
anddeidentify_annotations
functions - all in-text annotation (under the hood) and associated functions
- support for given names. given names can be added as another first name in the
Person
class. - support for python 3.7 and 3.8
Fixed
<
and>
are no longer replaced by(
and)
respectively- deduce does not strip text (whitespaces, tabs at beginning/end of text) anymore
Release 1.0.8
1.0.8 (2021-11-29)
Fixed
- various modifications related to adding or subtracting spaces in annotated texts
- remove the lowercasing of institutions' names
- therefore, all structured annotations have texts matching the original text in the same span
Added
- warn if there are any structured annotations whose annotated text does not match the original text in the span denoted by the structured annotation
Version 1.0.7
1.0.7 (2021-11-03)
Changed
- Internal code formatting improvements
Added
- Contributing guidelines