Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Six METS records are failing in the transformer #2546

Open
paul-butcher opened this issue Feb 5, 2024 · 4 comments
Open

Six METS records are failing in the transformer #2546

paul-butcher opened this issue Feb 5, 2024 · 4 comments

Comments

@paul-butcher
Copy link
Contributor

paul-butcher commented Feb 5, 2024

They appear to be failing because they cannot find the corresponding data in storage. They all yield a message similar to this in the logs.

ERROR w.p.transformer.TransformerWorker - TransformerWorker: TransformerError on MetsFileWithImages(s3://wellcomecollection-storage/digitised/b20442117,v2/data/b20442117.xml,List(v2/data/b20442117_0001.xml, v1/data/b20442117_0002.xml, v1/data/b20442117_0003.xml),2023-02-09T12:58:38.913Z,2) with Version(b20442117,2) (software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist.

b20442117, b31360051, b24875831, b2170594x, b21705938, b24873342

They are all b-numbered (therefore goobi) METS files.

This has occurred in the new 02-01 pipeline, so may be due to recent changes to handle archivematica mets. This may be something that has been filtered out/guarded against before, but those changes have somehow removed that protection, or it may be something that happens in previous pipelines but has so far gone unnoticed.

@paul-butcher
Copy link
Contributor Author

paul-butcher commented Feb 5, 2024

None of these values are present in logs in January, so this must be new to 02-01

Actually, I've just searched again, and I have found them all, with the same error, on 2024-01-09. Evidently I set the date incorrectly when I first looked.

@paul-butcher
Copy link
Contributor Author

None of these b-numbers return any results from a search on the website (currently pointing to the 01-09 pipeline)

@paul-butcher
Copy link
Contributor Author

paul-butcher commented Feb 5, 2024

Apart from b31360051 (f5ndv2hu) these are all DELETED. I suspect the new pipeline is erroneously trying to create a full record for a DELETED one.

@paul-butcher
Copy link
Contributor Author

This remains a problem, but I don't think it's panic-worthy, and certainly not a blocker for deploying the 02-01 pipeline to the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant