Skip to content
This repository has been archived by the owner on Jan 28, 2020. It is now read-only.

Identify duplicates / similar learning resources #882

Open
5 tasks
pdpinch opened this issue Jan 27, 2016 · 0 comments
Open
5 tasks

Identify duplicates / similar learning resources #882

pdpinch opened this issue Jan 27, 2016 · 0 comments

Comments

@pdpinch
Copy link
Member

pdpinch commented Jan 27, 2016

As a curator, I would like a way to easily identify and hide (or remove) duplicate learning resources from the repository.

Having imported several versions of the 8.01 physics course, we already have many duplicate learning objects cluttering the repository. It would be good to have a way to identify them (automatically, or by user inspection) and hide (or remove) them in oder to declutter the interface.

Some thoughts that have been discussed:

  • create a vocabulary for specifying the relationship between learning objects, e.g. duplicate, version, etc. There is undoubtedly prior art on this
  • develop a heuristic for identifying duplicates on import, and tag them with said vocabulary
  • give users a way to manually tag related learning objects, for when the automation fails
  • elasticsearch may help by giving similarity scores for documents.
  • when duplicates or versions are identified, there should be a way to synchronize the metadata between the two, to avoid re-entry

One tricky aspect of this is that some minor differences between versions of a learning resource may be considered irrelevant and they can be thought of as duplicates. Other small changes may be significant, like changes to problem text.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants