Releases: vmenger/deduce
Releases · vmenger/deduce
v3.0.3
3.0.3 (2024-07-16)
Added
- A cache_path option, to define the path for saving/loading the lookup structure cache. You should use this if your install directory is not writable.
Removed
- the
config_file
keyword, now replaced byconfig
which accepts both filenames and dicts - old lookup list names, e.g.
prefixes
now replaced byprefix
- annotator types
custom
,regexp
,token_pattern
,dd_token_pattern
andannotation_context
, all replaced by setting class directly asannotator_type
- everything in
deduce.pattern
, patient patterns now replaced byPatientNameAnnotator
v3.0.2
v3.0.1
v3.0.0
3.0.0 (2023-12-20)
Added
- speed optimizations, ~250%
- pseudo-annotating eponymous diseases (e.g. Creutzfeldt-Jakob)
PatientNameAnnotator
, which replacesdeduce.pattern
- a structured way for loading and building lookup structures (lists and tries), including caching
pre_match_words
for some regexp annotators, speeding up the annotating- option to present a user config as dict (using
config
keyword)
Changed
- speedup for
TokenPatternAnnotator
- some internals of
ContextPatternAnnotator
- initials now detected by lookup list, rather than pattern
- redactor open and close chars from
<
>
to[
]
, as previous chars caused issues in html (so deidentified text now shows[PATIENT]
,[LOCATIE]
, etc.) - names of lookup structures to singular (
prefix
, rather thanprefixes
) INSTELLING
tag toZIEKENHUIS
andZORGINSTELLING
- refactored and simplified annotator loading, specifically the
annotator_type
config keyword now accepts references to classes (e.gdeduce.annotator.TokenPatternAnnotator
) - renamed
interfix_with_capital
annotator tointerfix_with_name
Deprecated
- the
config_file
keyword, now replaced byconfig
which accepts both filenames and dicts - old lookup list names, e.g.
prefixes
now replaced byprefix
- annotator types 'custom', 'regexp', 'token_pattern', 'dd_token_pattern' and 'annotation_context', all replaced by setting class directly as annotator_type
Removed
- automated coverage reporting on coveralls.io
- options
lowercase_lookup
,lowercase_neg_lookup
for token patterns - everything in
deduce.pattern
, patient patterns now replaced byPatientNameAnnotator
utils.any_in_text
Fixed
- some small additions/removals for specific lookup lists
- smaller bugs related to overlapping matches
v2.5.0
2.5.0 (2023-11-28)
Added
- the
RegexpPseudoAnnotator
component for filtering regexp matches based on preceding/following words - a
prefix_with_interfix
pattern for names, detecting e.g.Dr. van Loon
Fixed
- a bug with
BsnAnnotator
with non-digit characters in regexp
Changed
- the age detection component, with improved logic and pseudo patterns
- annotations are no longer counted adjacent when separated by a comma
- streets are prioritized over names when merging overlapping annotations
- removed some false positives for postal codes ending in
gr
orie
- extended the postbus pattern for
xx.xxx
format (old notation) - some smaller optimizations and exceptions for institution, hospital, placename, residence, medical term, first name, and last name lookup lists