as_marc() should throw exception on fields that are too big #42

nahuelange · 2013-11-15T13:58:19Z

I use pymarc to generate iso2709 of records from internals datas.
When I read my records generated by pymarc (even with pymarc!) I have directory offset problem.
With yaz-marcdump the error is:
(Directory offset 204: Bad value for data length and/or length starting (394\x1E##\x1Fa9782))
(Base address not at end of directory, base 194, end 205)
(Directory offset 132: Data out of bounds 51665 >= 15063)

What's wrong ?

edsu · 2013-11-15T14:01:02Z

Can you share some code that exhibits the problem?

nahuelange · 2013-11-15T14:09:25Z

You can find the generated record here:
https://filez.ahtna.org/ukh5
Then, for the code, I can give you some part of the code, but just I tested with Record(force_utf8=True) and False, the result is approximatly the same.

edsu · 2013-11-15T14:10:07Z

Yes, please share the code so we can replicate the problem.

nahuelange · 2013-11-15T14:35:16Z

One part of the code is:
https://gist.github.com/nahuelange/c10a28d62145389d3e35

I can't really share the data, pickle the Record object can be usefull for you?

edsu · 2013-11-15T14:36:29Z

Please see if you can write a piece of standalone code that demonstrates the problem. Then we can help, hopefully :-)

nahuelange · 2013-11-15T14:59:32Z

You can find here an exemple that reproduct the problem:
https://gist.github.com/nahuelange/d36c15d57e82c6e006b4

edsu · 2013-11-15T15:08:43Z

When I try to read the resulting record in with pymarc I see this error:

pymarc.exceptions.RecordDirectoryInvalid: Invalid directory

Do you see the same thing?

nahuelange · 2013-11-15T15:14:45Z

Yes, with pymarc I have this exception, and dumping in a file and reading it with yaz-marcdump I have this:
38934 2200038 4500
(Directory offset 36: Bad value for data length and/or length starting (0\x1E##\x1Fa012345))
(Base address not at end of directory, base 38, end 37)
(Directory offset 24: Data out of bounds 53926 >= 38934)

edsu · 2013-11-15T15:48:57Z

I played around with your example a bit and simplified it to this https://gist.github.com/edsu/f8f0e33afcbbcaf7d194 do you see the same error with thta?

nahuelange · 2013-11-15T15:54:35Z

I have this error: RecordDirectoryInvalid: Invalid directory

edsu · 2013-11-15T15:55:14Z

Now change the 9995 to 9994, and it works? No error?

nahuelange · 2013-11-15T16:00:06Z

Well, and then? I know that the notice length is too big, but pymarc should not raise an exception…

edsu · 2013-11-15T16:05:30Z

You are right, a better exception should be thrown by pymarc when you call as_marc(). But the structure of a MARC21 directory does not support fields of that size.

nahuelange · 2013-11-15T16:09:28Z

Well, it's absurd, we are in 2013, this format exists since 1960's, we should just don't care about this directory that is strictly useless to read and write MARC.

edsu · 2013-11-15T16:21:41Z

I completely agree. Do you need to write MARC21 or can you use something else like MARCXML?

nahuelange · 2013-11-15T16:25:13Z

We write MARC21 and UNIMARC to provides iso2709 records to libraries that subscribe our services.
Well, in any case, we have to truncate some big fields, because ILS are not able to read this … format.

Thanks,

edsu · 2013-11-15T16:34:10Z

I want to leave this open to get a better exception being thrown. It wasn't at all clear what the problem was. I apologize for the suckitude of the MARC21 format. The sooner it can be a thing of the past the better. pymarc was largely written to be an escape mechanism, not a means to perpetuate the format.

anarchivist · 2013-11-15T16:54:59Z

The maximum length of data in a variable field in UNIMARC -- as well as MARC21 -- is 9,999 bytes (see the bottom of this page, in the "Directory Map" section). You cannot serialize a MARC record into MARC21 or UNIMARC and have a field over this length because the data format cannot handle that. The field length includes the indicators as well as all of the subfields.

To serialize this data, your approach in this case would probably be to break each translation of the description text into a separate tag by language.

edsu · 2013-11-15T16:59:03Z

@anarchivist I thought we already covered that.

nahuelange · 2013-11-15T17:20:13Z

@anarchivist This is not really right in 2013, we do not need the length of the record, we can just use separators to delimitate each field/subfield. The length of the record and field was used in VERRY OLD systems, that are no more used today.
This format is a PITA

nahuelange · 2013-11-15T17:30:09Z

Look at the code here:
https://github.com/eiro/p5-marc-mir/blob/master/lib/MARC/MIR.pm#L157

It doesn't use the length of the record to parse iso2709 and create a struct from it.

anarchivist · 2013-11-15T18:18:34Z

@edsu In terms of considering the exception, it looks like pymarc should probably compare leader byte 20 to the length of a given pymarc.Field's as_marc() return value.

@nahuelange Like @edsu, I'm sorry that this is frustrating, but this is a limitation of the format. If you really need to have values longer than what UNIMARC serialized in ISO 2709 will allow, please consider using MARCXML.

nahuelange · 2013-11-15T20:21:42Z

@anarchivist It's not a format limitation, because it's a useless information today. To be interroperable we CAN'T do MARCXML, we provide our records to partners that uses ILS.

anarchivist · 2013-11-15T20:23:02Z

@nahuelange Please don't get frustrated with me, I'm trying to help you by explaining the limitations. If you'd like to develop a workaround using pymarc, by all means do so.

edsu · 2013-11-15T20:50:13Z

@nahuelange as @anarchivist suggested you could consider shortening the fields so they fit within the constraints. Unfortunately MARC interchange format is frozen in time. If you have to work with legacy systems that use it, you will have to work within the constraints of the format. If you are designing a new system I would strongly advise you not to perpetuate its use.

nahuelange · 2013-11-18T08:31:26Z

The problem is we can't predict the size of the record.

edsu · 2013-11-18T09:07:20Z

Can you describe your use case a bit more?

nahuelange closed this as completed Nov 15, 2013

edsu reopened this Nov 15, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

as_marc() should throw exception on fields that are too big #42

as_marc() should throw exception on fields that are too big #42

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

anarchivist commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

nahuelange commented Nov 15, 2013

anarchivist commented Nov 15, 2013

nahuelange commented Nov 15, 2013

anarchivist commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 18, 2013

edsu commented Nov 18, 2013

as_marc() should throw exception on fields that are too big #42

as_marc() should throw exception on fields that are too big #42

Comments

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

edsu commented Nov 15, 2013

anarchivist commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 15, 2013

nahuelange commented Nov 15, 2013

anarchivist commented Nov 15, 2013

nahuelange commented Nov 15, 2013

anarchivist commented Nov 15, 2013

edsu commented Nov 15, 2013

nahuelange commented Nov 18, 2013

edsu commented Nov 18, 2013