A permissive MARCReader #109

edsu · 2017-10-17T14:59:01Z

Over in #89 there has been a long discussion about working with invalid MARC data in the wild. I must admit I don't work with MARC much these days, so I had no idea people were running into so many problems processing large batches of MARC records.

Currently when pymarc runs into a structural problem in a MARC record it will throw an exception (RecordLeaderInvalid, BaseAddressNotFound, BaseAddressInvalid, RecordDirectoryInvalid, NoFieldsFound) which will also cause record iteration to stop.

@anarchivist offered up a PermissiveMARCReader which he has used to process large amounts of MARC data. PermissiveMARCReader catches all the exceptions thrown by structural problems with the MARC record and moves on to the next record.

Rather than introducing a new class I suggest that a new parameter named strict be added to the MARC.MARCReader constructor. When set to True it will continue to throw these exceptions. When set to False it will catch the exceptions, log them, and move on to the next record. It may be that some of these exceptions need to be relaxed, and the invalid data interpreted in some way. But let's open new issue tickets for those situations as they come up.

Based on the conversation we've been seeing lately I think the default for strict should be set to False. The MARCReader API will be backwards compatible (code that uses pymarc won't need to change). However this will be a significant change in behavior so I think a new minor version release will be needed, v3.2.

What do folks think?

timClicks · 2018-11-17T20:02:49Z

Strong +1 from me. I deal with MARC21 files from repositories, so it's basically impossible for me to fix the problem at source. Its rare that I would like to stop processing completely if I'm generating an export.

petrus-v · 2019-11-28T17:35:53Z

Hi ! I'd like to help on this issue, I'll create a PR in few days, do you have any contribution guide lines to know ?

edsu · 2019-12-01T14:17:46Z

@petrus-v PEP-8 and including unit tests (with test data if necessary) will definitely help move PRs along.

This allow to read a large amont of data that may contains wrong structural records which will be ignore to give a chance to read as much of possible of recods in a file even one is incorect

#109 permissive marcreader

Wooble · 2019-12-06T14:22:04Z

I don't know if there's anything that could be done to be even more permissive in more places, but I think #144 pretty much covered this, so closing...

petrus-v mentioned this issue Dec 3, 2019

#109 permissive marcreader #144

Merged

edsu added a commit that referenced this issue Dec 5, 2019

Merge pull request #144 from anybox/109-permissive-marcreader

180f8da

#109 permissive marcreader

Wooble closed this as completed Dec 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A permissive MARCReader #109

A permissive MARCReader #109

edsu commented Oct 17, 2017 •

edited

Loading

timClicks commented Nov 17, 2018

petrus-v commented Nov 28, 2019

edsu commented Dec 1, 2019

Wooble commented Dec 6, 2019

A permissive MARCReader #109

A permissive MARCReader #109

Comments

edsu commented Oct 17, 2017 • edited Loading

timClicks commented Nov 17, 2018

petrus-v commented Nov 28, 2019

edsu commented Dec 1, 2019

Wooble commented Dec 6, 2019

edsu commented Oct 17, 2017 •

edited

Loading