XML extractor: external_file
only works when secondary_tag
is also used
#17
Labels
bug
Something isn't working
external_file
only works when secondary_tag
is also used
#17
The only way in which the
external_file
option for the XML extractor is used in I-analyzer is in something like this:https://github.com/UUDigitalHumanitieslab/I-analyzer/blob/932bbc4fe33caa8b46754f18e4ad3a7caebcf4b8/backend/corpora/periodicals/periodicals.py#L166-L175
Here, the extractor also contains a
secondary_tag
argument, where'match'
specifies the name of another field in the reader. We have no instances of usingXML
without a secondary tag. That would be something like:Since this is never used, it went unnoticed that such an extractor would raise a
TypeError
if you tried to use it. This is how the XMLReader implements external files:https://github.com/UUDigitalHumanitieslab/ianalyzer-readers/blob/0372a6ea4a9b91a0666c1d113839fb5a26ce02a7/ianalyzer_readers/readers/xml.py#L185-L197
In human terms: the way it looks for the right tag is:
secondary_tag
is provided, search for it, select its parent, and provide that as the "entry tag" to the XML extractor for the field.The bug itself is easily fixable, but it does reflect the oddness of this construction. For example:
external_file=False
. The behaviour whentag
is a list orrecursive=True
is also radically different.external_file
argument, but we don't have any case where these values are not the same for the entire reader. Simplyexternal_file=True
would be fine (provided the toplevel tag is specified somewhere else).The text was updated successfully, but these errors were encountered: