-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data in the reporting cluster does not match data elsewhere #2541
Comments
I did fix a problem in the adapters recently, that could cause reporting to have missed some changes that the pipeline will have picked up. However, that's the wrong way round.
|
I have just run a window covering the last known change on that document This is now the only Perhaps they were already all correct and then a batch process on Sierra broke them all. This record was correct in 2021. |
Right. Very odd. b1269556 was originally correct, then at 2023-08-10T18:08:07Z it changed ind2 from 2 to 0. So, we have (had) two problems here
The existence of this source data problem could have been easier to spot and deal with if we had a better way to report dodgy content to collections staff. |
the same is true of b30489313. August last year - 2023-08-17 15:53:15Z - it changed from the correct ind2=2 to the incorrect ind2=0 |
Slack
When investigating #2536, I noticed that one of the problematic records has an invalid MARC 650 field. It declares that it is a Library of Congress id (ind2=0), but contains a MeSH id (subfield 0 starts with D)
The record in question is b1269556, but this is not unique to that record. I have also seen this error occur with D009524Q000266 and D010297 and plenty of other incorrectly marked MeSH ids.
When this record last went through the pipeline, it logged an error:
When I looked in VHS for it, the field in question is incorrect as expected (ind2=0, subfield 0= D009524)
I decided I should make a report on this, to see how widespread the issue is, and facilitate its resolution.
Imagine my surprise when there appear to be no 6xx varfields with ind2=0 and a MeSH id. I know this to be incorrect, as I am currently looking at one. I searched for a few other known offenders, and they all have the correct ind2 value (2).
I cannot work out where this is coming from. How does the reporting cluster end up with different content to everywhere else?
The text was updated successfully, but these errors were encountered: