Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAI: metadata cleanup in preparation for harvest #487

Open
3 of 7 tasks
anayram opened this issue Jul 13, 2022 · 1 comment
Open
3 of 7 tasks

OAI: metadata cleanup in preparation for harvest #487

anayram opened this issue Jul 13, 2022 · 1 comment

Comments

@anayram
Copy link
Member

anayram commented Jul 13, 2022

Based on analysis by LAC (Andrew's email "Theses Canada - harvesting the University of Alberta repository" received March 10 2022), here is a list of required and other suggested cleanup to be done in preparation for an upcoming harvest of ETDMS theses metadata via OAI.

[Text quoted from LAC email]

Dates

  • In the dates in five records we get a square bracket as one of the characters. Ere are the dates and the first parts of the titles:
  • [196 - Man and landscape change in the Banff National Park area before 1911.
  • [197 - Origins of vagrancy law:
  • [198 - A study in soil ecology:
  • [200 - Theoretical and practical biography:
  • [200 - Use of euphemisms and taboo terms by young speakers of Russian and English

264 (publisher):

  • 11 titles have “unknown” as publisher and also for degree name and degree grantor. Here are the first parts of the titles:
  • Theoretical Considerations For Biological Control: (I notice incidentally that this first one may be a duplicate)
  • Relationality, Reciprocity and the Nature of Self:
  • Union and Communion:
  • Posttraumatic Growth and Spirituality:
  • Modelling Future Impacts of Climate Change and Harvest on the Reproductive Success of Female Polar Bears (Ursus maritimus)
  • Wolf movement within and beyond the territory boundary
  • The arrival and establishment of non-indigenous species:
  • Linear features impact predator-prey encounters:
  • Modeling group formation and activity patterns in self- organizing communities of organisms
  • Edmonton Social Planning Council:
  • How Academic Librarians use Evidence in their Decision Making:

502 (degree info)

  • Degree information to be corrected:
  • One thesis has degree name “Sara Victoria Weselake” – the title is: The role of the Prader-Willi syndrome obesity protein, MAGEL2 in the proper functioning of circadian rhythm
  • 2 theses have a discipline as degree name:
  • Risk construction at a public hearing (has “Organizational Analysis” as degree name)
  • Women's gendered experiences of rapid resource development in the Canadian North (has “Rural Sociology” as degree name)

Language (change not required for harvest)

  • As I scan the data I notice a small number of theses that do not have the language of publication recorded. We can accept the data like this since they will likely be hard to fix. They will be loaded without a language of publication in the MARC record.

Character issues (change not required for harvest)

  • I see a very small number of character issues in the abstracts. We see character issues in the data for every university. The issues are hard to fix and so we overlook such problems. When I search “{dollar}” In the .mrc file I get 1171 occurrences in roughly 200 records. It is a very small number. When I search “superscript” I get 56 occurrences in roughly 33 records. It is a very small number. When I search “�” there are too many results to return. But scanning the data it does not seem to be a significant problem.

Abstracts (change not required for harvest)

  • Only about 12,436 of the records have abstracts – but that is fine.

Duplicates (change not required for harvest)

  • De-duplicating on the title in MARC edit identifies 128 duplicates. This is a very small number. It does not have to be addressed. We normally ignore duplicates – they normally end up on the same MARC record in OCLC – and at that point we get error reports alerting us to those situations and we delete one of them.
@anayram
Copy link
Member Author

anayram commented Jul 13, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant