Process for Running the Document Type Classifier #380

CarsonDavis · 2023-09-18T16:34:05Z

Description

The Document Type inference pipeline currently only works as a button press on a single collection.

Ideally, we would like to run it once on the entire database, perhaps as a migration.

Then, for all future URL imports, the pipeline should be run along side the import, so that when URLs make their way into the webapp, the rules to assign document types already exist, and the assigned types can be shown to the curator.

Implementation Considerations

how will we run this async? is there existing async stuff being used on the current url importer?

Deliverable

migration to classify documents for all existing collections
process to handle future url imports

Dependencies

depends on None

CarsonDavis assigned RajashreeDahal4 Sep 18, 2023

code-geek added the PI 24.1 Oct, Nov, Dec 2023 label Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process for Running the Document Type Classifier #380

Process for Running the Document Type Classifier #380

CarsonDavis commented Sep 18, 2023

Process for Running the Document Type Classifier #380

Process for Running the Document Type Classifier #380

Comments

CarsonDavis commented Sep 18, 2023

Description

Implementation Considerations

Deliverable

Dependencies