Skip to content
This repository has been archived by the owner on Feb 4, 2020. It is now read-only.

as_marc() should throw exception on fields that are too big #42

Open
nahuelange opened this issue Nov 15, 2013 · 27 comments
Open

as_marc() should throw exception on fields that are too big #42

nahuelange opened this issue Nov 15, 2013 · 27 comments

Comments

@nahuelange
Copy link

I use pymarc to generate iso2709 of records from internals datas.
When I read my records generated by pymarc (even with pymarc!) I have directory offset problem.
With yaz-marcdump the error is:
(Directory offset 204: Bad value for data length and/or length starting (394\x1E##\x1Fa9782))
(Base address not at end of directory, base 194, end 205)
(Directory offset 132: Data out of bounds 51665 >= 15063)

What's wrong ?

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

Can you share some code that exhibits the problem?

@nahuelange
Copy link
Author

You can find the generated record here:
https://filez.ahtna.org/ukh5
Then, for the code, I can give you some part of the code, but just I tested with Record(force_utf8=True) and False, the result is approximatly the same.

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

Yes, please share the code so we can replicate the problem.

@nahuelange
Copy link
Author

One part of the code is:
https://gist.github.com/nahuelange/c10a28d62145389d3e35

I can't really share the data, pickle the Record object can be usefull for you?

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

Please see if you can write a piece of standalone code that demonstrates the problem. Then we can help, hopefully :-)

@nahuelange
Copy link
Author

You can find here an exemple that reproduct the problem:
https://gist.github.com/nahuelange/d36c15d57e82c6e006b4

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

When I try to read the resulting record in with pymarc I see this error:

pymarc.exceptions.RecordDirectoryInvalid: Invalid directory

Do you see the same thing?

@nahuelange
Copy link
Author

Yes, with pymarc I have this exception, and dumping in a file and reading it with yaz-marcdump I have this:
38934 2200038 4500
(Directory offset 36: Bad value for data length and/or length starting (0\x1E##\x1Fa012345))
(Base address not at end of directory, base 38, end 37)
(Directory offset 24: Data out of bounds 53926 >= 38934)

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

I played around with your example a bit and simplified it to this https://gist.github.com/edsu/f8f0e33afcbbcaf7d194 do you see the same error with thta?

@nahuelange
Copy link
Author

I have this error: RecordDirectoryInvalid: Invalid directory

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

Now change the 9995 to 9994, and it works? No error?

@nahuelange
Copy link
Author

Well, and then? I know that the notice length is too big, but pymarc should not raise an exception…

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

You are right, a better exception should be thrown by pymarc when you call as_marc(). But the structure of a MARC21 directory does not support fields of that size.

@nahuelange
Copy link
Author

Well, it's absurd, we are in 2013, this format exists since 1960's, we should just don't care about this directory that is strictly useless to read and write MARC.

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

I completely agree. Do you need to write MARC21 or can you use something else like MARCXML?

@nahuelange
Copy link
Author

We write MARC21 and UNIMARC to provides iso2709 records to libraries that subscribe our services.
Well, in any case, we have to truncate some big fields, because ILS are not able to read this … format.

Thanks,

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

I want to leave this open to get a better exception being thrown. It wasn't at all clear what the problem was. I apologize for the suckitude of the MARC21 format. The sooner it can be a thing of the past the better. pymarc was largely written to be an escape mechanism, not a means to perpetuate the format.

@edsu edsu reopened this Nov 15, 2013
@anarchivist
Copy link
Contributor

The maximum length of data in a variable field in UNIMARC -- as well as MARC21 -- is 9,999 bytes (see the bottom of this page, in the "Directory Map" section). You cannot serialize a MARC record into MARC21 or UNIMARC and have a field over this length because the data format cannot handle that. The field length includes the indicators as well as all of the subfields.

To serialize this data, your approach in this case would probably be to break each translation of the description text into a separate tag by language.

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

@anarchivist I thought we already covered that.

@nahuelange
Copy link
Author

@anarchivist This is not really right in 2013, we do not need the length of the record, we can just use separators to delimitate each field/subfield. The length of the record and field was used in VERRY OLD systems, that are no more used today.
This format is a PITA

@nahuelange
Copy link
Author

Look at the code here:
https://github.com/eiro/p5-marc-mir/blob/master/lib/MARC/MIR.pm#L157

It doesn't use the length of the record to parse iso2709 and create a struct from it.

@anarchivist
Copy link
Contributor

@edsu In terms of considering the exception, it looks like pymarc should probably compare leader byte 20 to the length of a given pymarc.Field's as_marc() return value.

@nahuelange Like @edsu, I'm sorry that this is frustrating, but this is a limitation of the format. If you really need to have values longer than what UNIMARC serialized in ISO 2709 will allow, please consider using MARCXML.

@nahuelange
Copy link
Author

@anarchivist It's not a format limitation, because it's a useless information today. To be interroperable we CAN'T do MARCXML, we provide our records to partners that uses ILS.

@anarchivist
Copy link
Contributor

@nahuelange Please don't get frustrated with me, I'm trying to help you by explaining the limitations. If you'd like to develop a workaround using pymarc, by all means do so.

@edsu
Copy link
Owner

edsu commented Nov 15, 2013

@nahuelange as @anarchivist suggested you could consider shortening the fields so they fit within the constraints. Unfortunately MARC interchange format is frozen in time. If you have to work with legacy systems that use it, you will have to work within the constraints of the format. If you are designing a new system I would strongly advise you not to perpetuate its use.

@nahuelange
Copy link
Author

The problem is we can't predict the size of the record.

@edsu
Copy link
Owner

edsu commented Nov 18, 2013

Can you describe your use case a bit more?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants