-
Notifications
You must be signed in to change notification settings - Fork 98
as_marc() should throw exception on fields that are too big #42
Comments
Can you share some code that exhibits the problem? |
You can find the generated record here: |
Yes, please share the code so we can replicate the problem. |
One part of the code is: I can't really share the data, pickle the Record object can be usefull for you? |
Please see if you can write a piece of standalone code that demonstrates the problem. Then we can help, hopefully :-) |
You can find here an exemple that reproduct the problem: |
When I try to read the resulting record in with pymarc I see this error:
Do you see the same thing? |
Yes, with pymarc I have this exception, and dumping in a file and reading it with yaz-marcdump I have this: |
I played around with your example a bit and simplified it to this https://gist.github.com/edsu/f8f0e33afcbbcaf7d194 do you see the same error with thta? |
I have this error: RecordDirectoryInvalid: Invalid directory |
Now change the 9995 to 9994, and it works? No error? |
Well, and then? I know that the notice length is too big, but pymarc should not raise an exception… |
You are right, a better exception should be thrown by pymarc when you call as_marc(). But the structure of a MARC21 directory does not support fields of that size. |
Well, it's absurd, we are in 2013, this format exists since 1960's, we should just don't care about this directory that is strictly useless to read and write MARC. |
I completely agree. Do you need to write MARC21 or can you use something else like MARCXML? |
We write MARC21 and UNIMARC to provides iso2709 records to libraries that subscribe our services. Thanks, |
I want to leave this open to get a better exception being thrown. It wasn't at all clear what the problem was. I apologize for the suckitude of the MARC21 format. The sooner it can be a thing of the past the better. pymarc was largely written to be an escape mechanism, not a means to perpetuate the format. |
The maximum length of data in a variable field in UNIMARC -- as well as MARC21 -- is 9,999 bytes (see the bottom of this page, in the "Directory Map" section). You cannot serialize a MARC record into MARC21 or UNIMARC and have a field over this length because the data format cannot handle that. The field length includes the indicators as well as all of the subfields. To serialize this data, your approach in this case would probably be to break each translation of the description text into a separate tag by language. |
@anarchivist I thought we already covered that. |
@anarchivist This is not really right in 2013, we do not need the length of the record, we can just use separators to delimitate each field/subfield. The length of the record and field was used in VERRY OLD systems, that are no more used today. |
Look at the code here: It doesn't use the length of the record to parse iso2709 and create a struct from it. |
@edsu In terms of considering the exception, it looks like pymarc should probably compare leader byte 20 to the length of a given pymarc.Field's @nahuelange Like @edsu, I'm sorry that this is frustrating, but this is a limitation of the format. If you really need to have values longer than what UNIMARC serialized in ISO 2709 will allow, please consider using MARCXML. |
@anarchivist It's not a format limitation, because it's a useless information today. To be interroperable we CAN'T do MARCXML, we provide our records to partners that uses ILS. |
@nahuelange Please don't get frustrated with me, I'm trying to help you by explaining the limitations. If you'd like to develop a workaround using pymarc, by all means do so. |
@nahuelange as @anarchivist suggested you could consider shortening the fields so they fit within the constraints. Unfortunately MARC interchange format is frozen in time. If you have to work with legacy systems that use it, you will have to work within the constraints of the format. If you are designing a new system I would strongly advise you not to perpetuate its use. |
The problem is we can't predict the size of the record. |
Can you describe your use case a bit more? |
I use pymarc to generate iso2709 of records from internals datas.
When I read my records generated by pymarc (even with pymarc!) I have directory offset problem.
With yaz-marcdump the error is:
(Directory offset 204: Bad value for data length and/or length starting (394\x1E##\x1Fa9782))
(Base address not at end of directory, base 194, end 205)
(Directory offset 132: Data out of bounds 51665 >= 15063)
What's wrong ?
The text was updated successfully, but these errors were encountered: