Skip to content

Latest commit

 

History

History
122 lines (100 loc) · 10.7 KB

README.md

File metadata and controls

122 lines (100 loc) · 10.7 KB

Enclave Wrangler

This documentation can be found here

Creating/Updating Concept Sets via CSV Upload

Example: Sample CSV.

CSV Schema

Non-required columns can can be included in the CSV, but will be ignored during processing.

CSV Column name Data Type Required Description
multipassId String True Represents the user ID of the Enclave user the concept set container/version is being made on on-behalf-of (i.e. who owns the uploaded concept set or version.) An Enclave user can find their ID by going to their account settings and copying the UserID on the top right. If you need someone else's User ID, use the Enclave Object Explorer to search for researchers and click on their name under "Results" on the right. Then, hover over the User ID field and click on the copy icon.
parent_version_codeset_id Integer True (for update operations) If you are creating a new version of a concept set, add the codeset ID of the parent concept set version. If creating a new concept set container, leave this empty.
current_max_version Double True (for update operations) If you are creating a new version of a concept set, set this to the maximum existing version number. So, if the maxiumum existing version of the concept set is v5, set this to 5.0
concept_set_name String True The name of the concept set container.
concept_id Integer True This is the concept_id column in the OMOP concept table. In the condition_occurrence table, it appears as condition_concept_id. Likewise for other domain tables.
includeDescendants Boolean True If this is set to TRUE, then this expression item will match the selected OMOP Concept and all of its descendants.
isExcluded Boolean True This column is automatically generated by the Standard Operating Procedure. If this is set to TRUE, then the concepts matched by this expression will be removed from the final expansion of this concept set version after all other expressions have been processed. This is useful, for instance, if you want to include the descendants of some concept except for certain concepts or subtrees.
includeMapped Boolean True This column is automatically generated by the Standard Operating Procedure. Do not use this unless you know what you're doing. If you want to include mapped concepts, you can set this to TRUE, but we recommend you test the expression in ATLAS or the Enclave Concept Set Editor first.
action String True This column was previously intended to allow concepts/expressions to be added, changed, or removed from existing versions. For now, though, the new version will be created from scratch and include only the expressions listed in the file. Please set value to "add/replace" if you are creating or updating a concept set.
vocabulary_concept_code String True (if concept_id is empty) Leave this column blank if concept_id column is not empty. vocabulary_id column should have a value if this column has a value. This is concept_code in OMOP concept table. In the condition_occurrence table, it appears as condition_source_value. Likewise for other domain tables.
vocabulary_id String True (if concept_id is blank) Leave this column blank if concept_id column is not empty. vocabulary_concept_code column should have a value if this column has a value. This is vocabulary_id in OMOP concept table.
annotation String False This column is not required by the Standard Operating Procedure. This is for any comments about the inclusion of this expression.
domain_team String False This column is not used by the Standard Operating Procedure.
provenance String False This column is not used by the Standard Operating Procedure.
limitations String False This column is not used by the Standard Operating Procedure.
intention String False This column is not used by the Standard Operating Procedure.
intended_research_project String False This column is not used by the Standard Operating Procedure.
authority String False This column is not used by the Standard Operating Procedure.

Container-only fields

Fields that only matter if you are making a new concept set container.

CSV Column name Data Type Required Description
container_intention String True The intention of the concept set.
container_research_project String True The name of the project, ideally the 'short name' (the part that appears in brackets before the longer name), e.g. the "RP-4A9E27" part in "[RP-4A9E27] DI&H - Data Quality". Can see a list of research projects here: https://unite.nih.gov/workspace/compass/projects
container_assigned_sme String True The concept set's assigned subject matter expert.
container_assigned_informatician String True The concept set's assigned informatician.

Not using

CSV Column name Data Type Required Description
domain String False An ignored field. Feel free to include if it helps for readability / data management.
class_id String False An ignored field. Feel free to include if it helps for readability / data management.

Options for adding concepts

This can be done 1 of 2 ways: (a) by providing OMOP concept IDs in the omop_concept_id field, or (b) adding concepts directy from a source vocabulary, using the vocabulary_id and vocabulary_concept_code fields.

Examples

Creating a new concept set container

Given a CSV like the following... TODO

...run: TODO

Creating a new concept set version

Given a CSV like the following... TODO

...run: TODO

Adding/updating concepts in a concept set version

Given a CSV like the following...

concept_set_name parent_version_codeset_id action concept_id includeDescendants isExcluded includeMapped annotation vocabulary_concept_code vocabulary_id FIELD11 concept_name domain class_id
794639872 add/replace 4034962 FALSE FALSE FALSE 237613005 Hyperproinsulinemia
794639872 add/replace FALSE TRUE FALSE 703136005 SNOMED Diabetes mellitus in remission Condition Clinical Finding

...updates can be uploaded using the following Python code:

from enclave_wrangler.dataset_upload import upload_new_cset_version_with_concepts_from_csv
path = 'path/to/csv'  # replace with path to your CSV
upload_new_cset_version_with_concepts_from_csv(path)

There is also a unit test that demonstrates this functionality in tests/test_enclave_wrangler.py called TestEnclaveWrangler.test_upload().

To do:

  • Add documentation here for how this can be run by users with Python skills
  • Give upload_new_cset_version_with_concepts_from_csv features to allow:
    • Specifying user auth token so it can be run by people on their own behalf (who don't have the bulkimport user auth token)
    • Specify whether version(s) should be finalized or left in draft state.

How to's

Access your security authorization token for the Enclave API:

Useful resources

Enclave API documentation links

More

About the enclave: https://covid.cd2h.org/enclave Logging into the enclave: https://unite.nih.gov/workspace/slate/documents/dashboard Logic that runs to create dataset generation used by TermHub: https://unite.nih.gov/workspace/data-integration/code/repos/ri.stemma.main.repository.aea80f94-828b-4795-9603-c3228b153414/contents/refs%2Fheads%2Fmaster/