Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process for Running the Document Type Classifier #380

Open
CarsonDavis opened this issue Sep 18, 2023 · 0 comments
Open

Process for Running the Document Type Classifier #380

CarsonDavis opened this issue Sep 18, 2023 · 0 comments
Assignees
Labels
PI 24.1 Oct, Nov, Dec 2023

Comments

@CarsonDavis
Copy link
Collaborator

Description

The Document Type inference pipeline currently only works as a button press on a single collection.

Ideally, we would like to run it once on the entire database, perhaps as a migration.

Then, for all future URL imports, the pipeline should be run along side the import, so that when URLs make their way into the webapp, the rules to assign document types already exist, and the assigned types can be shown to the curator.

Implementation Considerations

  • how will we run this async? is there existing async stuff being used on the current url importer?

Deliverable

  • migration to classify documents for all existing collections
  • process to handle future url imports

Dependencies

depends on None

@code-geek code-geek added the PI 24.1 Oct, Nov, Dec 2023 label Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PI 24.1 Oct, Nov, Dec 2023
Projects
None yet
Development

No branches or pull requests

3 participants