-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing item records #2536
Comments
In the original message, two records ( (found by searching for the b number in the works-identified index, thus: |
szjj6gqy has a relatively large matcher graph, including b33013950 and L0005170, which are suppressed. Could this be related to #1561 |
The second one on the list - mkmgxzfz has nothing else in the graph. It's just L0025074, so it's not likely to be a merger-based problem. However, where has b12695567 gone? |
I have a suspicion |
Suspicion confirmed (for this one at least):
This causes the transformer to fail because it cannot work out which kind of Library of Congress Identifier D009524 is (because it isn't one). |
This is only a problem for b12695567. The MARC records behind the other b numbers do not bear such an identifier/scheme mismatch.
All three of these appear to go through the transformer without a hitch. |
b12666579 corresponds to xwtagnu3, and is mentioned in three MIRO records out of the id minter: szjj6gqy(L0001320), hqj4zaem(L0005172), qjxnq9es(L0005171). These all redirect to https://wellcomecollection.org/works/jb4kp3td |
I wonder if it relates to #2460 |
Before merging, xwtagnu3 has two items
jb4kp3td has one
After the merge, jb4kp3td has none. |
The relevant merge rules all chain with orElse, finishing with just returning the target's item list. This means that one of the earlier rules is being applied, returning Nil. It's made of six Miros and four Sierras. |
Yes. It looks like the culprit is mergeDigitalIntoPhysicalSierraTarget. jb4kp3td has a single physical item, which means that it merges the list of digitised items with the target's list. However, that only happens if the source is a "physical/digitised Sierra work" otherwise the target's item list is emptied |
So, I think there are two things at play here:
|
There is only a test for the positive side of this - i.e. Does it add Digital items to a Physical one? and not the rest of the possible scenarios - Does it combine Physical ones, what does it do if there are no digital ones? I suspect that's a bit of an oversight. catalogue-pipeline/pipeline/merger/src/test/scala/weco/pipeline/merger/rules/ItemsRuleTest.scala Line 100 in 43a792f
|
It looks like the same thing is happening for f55dwbyz and wpznxk63. wpznxk63 has one item, f55dwbyz has three. Everything gets merged into wpznxk63, emptying wpznxk63's item list because none of f55dwbyz's items are of the right kind. Both ps2phs5n and f55dwbyz have one item each, but neither of them are of the right kind to merge into the other. |
|
I've got to the bottom of this, and there are a few different problems. The simplest one is this: MeSH IDs marked as LCSHmkmgxzfz comes from the Miro record only, so the items are missing. Sierra b12695567 is failing in the transform because it has a MeSH id in a field that should be a LoC id The next two are related to the merging of records: Dropping itemsWe had a specific scenario in which items were simply dropped. I'm fixing this now. Suspicious mergesFinally (I don't know whether this is related to the boundwith problem mentioned in the other thread?) There are some merges happening via links to Miro images that seem suspicious. I'm not sure if this is something that needs to be fixed in the source data, of if it's something I can deal with in the merger. wpznxk63 is the result of merging these two Sierra records with the corresponding Miro records Something similar is happening with jb4kp3td (b1180232, b12666579, b11802339, b11802315), where there is a mixture of 089 and 962 fields linking to Miro. b11675676 and b32950974 have the same phenomenon - b11675676 links to M0011992 via an 089 field, and b32950974 with 962. |
Thanks @paul-butcher. We've sorted out the subject headings IDs in mkmgxzfz and the relevant item records are now appearing online. It does look like there's a wider issue here connected to the automated subject authority control process back in August 2023, so Collections Information will pick this up and investigate further. Similarly the suspicious merges examples are probably best fixed at source rather than in the merger, but that decision is based on there only being 3 examples. Are you able to check whether there are only 3? If it turns out there are thousands of examples of this issue, we might want to take a different tack in resolving it... |
I don't think we have an efficient way to find matcher relationships by that criterion. If you can fix these three for now, and we can have a think about how we can report this in the future. |
There are six records where
Slack
The text was updated successfully, but these errors were encountered: