Better Exception handling #25

RichJackson · 2024-06-10T08:21:31Z

Document.metadata[PROCESSING_EXCEPTION] will get overwritten if a document fails for multiple steps.

We probably want a dictionary from the step namespace to the exception instead, but note we need to communicate that this will change in some release to BIKG.

It would be nice if we could choose at a pipeline level whether exceptions actually get (re-)raised as well. It seems like we could do this either doing one of:

Always (re-)raising in the step, but then having an 'except' clause at the pipeline level that may just log exceptions rather than (re-)raising, depending on how the Pipeline object is configured
Put the 'actual exception' in Document.metadata[PROCESSING_EXCEPTION] rather than the result of traceback.format_exc()

Then in the Pipeline, iterate over the documents and raise any exceptions
Make the Document serialization format the exception into a string

Maybe a combo of both 1 and 2 above? I think raising in the Step might be better, as if there's a problem that affects all documents, we'll get an exception right away rather than when the whole run has finished. But at the same time, I think actually having the 'real exceptions' would be useful.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Exception handling #25

Better Exception handling #25

RichJackson commented Jun 10, 2024

Better Exception handling #25

Better Exception handling #25

Comments

RichJackson commented Jun 10, 2024