-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Feature: SNOMED::ICD10CM Mapping Support #207
base: master
Are you sure you want to change the base?
Conversation
95c96e2
to
7d90e91
Compare
The actual extension is looking great! @hrshdhgd can take it for a ride when the time comes. Sorry we have not introduced you to our build system: we are using |
7398ac4
to
bae870a
Compare
Current output (updated 2022/03/03 4:25pm EST): |
9e1fd42
to
2810cba
Compare
# or is there a SKOS predicate I can map to in case where predicate is unknown? I think most of these | ||
# mappings are attempts at exact matches, but I can't be sure (at least not without using these fields | ||
# to determine: mapGroup, mapPriority, mapRule, mapAdvice). | ||
# mapCategoryId,mapCategoryName: Only these in set: 447637006 "MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapCategoryId
and mapCategoryName
I'm not sure if there's any useful information here that could be put anywhere than the other
field. There are only 3 values that show up in the entire mapping dataset for both mapCategoryId
and mapCategoryName
, and I listed them out int he comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sssom/parsers.py
Outdated
# "IF LISSENCEPHALY TYPE 3 FAMILIAL FETAL AKINESIA SEQUENCE SYNDROME CHOOSE Q04.3 | MAP OF SOURCE CONCEPT | ||
# IS CONTEXT DEPENDENT" | ||
# "MAP SOURCE CONCEPT CANNOT BE CLASSIFIED WITH AVAILABLE DATA" | ||
'predicate_id': f'SNOMED:{row["mapCategoryId"]}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned elsewhere, I'm not 100% sure if predicate_*
is best taken from mapCategory*
. It's possible that mapRule
and mapAdvice
could be better. Would like to defer to others for judgement on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a much deeper understanding of these is needed.. Ask in the call later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am super excited about this, and I am very happy about how much care you are taking sorting out the mapping here. Most of the things in here we need to discuss in person, too much to unpack to type up now, but two things would be probably good:
- Start a new "parser" documentation as described in comments above
- Rebrand the method to "snomed_map" rather than "snomed_icd10_map". In fact, there are two very different ones: complex and simple map. So lets figure out which of the two you are dealing with at the moment, see also: Review and potentially align SSSOM & SNOMED complex map sssom#42 and rebrand your methods accordingly. So
read_snomed_simple_map()
orread_snomed_complex_map()
depending on what we are dealing with here.
sphinx/parsers.rst
Outdated
@@ -0,0 +1,126 @@ | |||
Parsers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty terrible at .rst
, but here you go. Based on what you said, this should be suitable for right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sssom/parsers.py
Outdated
file_path: str, | ||
prefix_map: Dict[str, str] = None, | ||
meta: Dict[str, str] = None, | ||
filter_by_confident_mappings=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your request to filter only mapping rows that contain 'ALWAYS <code>'
, and possible future flexibility to fine tune the mappings, I added a param here and just set it to True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sssom/parsers.py
Outdated
# - Note: joeflack4: I used this info as a reference for this pattern. | ||
# https://www.medicalbillingandcoding.org/icd-10-cm/#:~:text=ICD%2D10%2DCM%20is%20a,decimal%20point%20and%20the%20subcategory. | ||
always_confidence_pattern = 'ALWAYS [A-Z]{1}[0-9]{1,2}\.[0-9A-Z]{1,4}' | ||
always_confidence_antipattern = always_confidence_pattern + '\?' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found edge cases where 'ALWAYS <code>'
appears, but has a ?
mark after, indicating that it is not 100% confident. Example:
ALWAYS T50.905? | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE | EPISODE OF CARE INFORMATION NEEDED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ms = _init_mapping_set(meta) | ||
|
||
# Filtering | ||
if filter_by_confident_mappings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the filtering you requested.
It turns out that I was wrong earlier when I thought that if mapAdvice
contained 'ALWAYS <code>'
, mapRule
would always be TRUE
. If this was the case, filtering by mapRule
would've been much easier, but unfortunately it did not turn out to be the case.
I was thinking about a simple .str.contains('ALWAYS')
, but it is possible that the word 'ALWAYS' could appear in cases other than 'ALWAYS <code>'
, so I decided to go with the stricter approach of a regex.
I created my own regex. But now I just realized that ICD10 is such a big thing that I should've just been googled a pre-baked regex. Here's one I found: https://www.johndcook.com/blog/2019/05/05/regex_icd_codes/. If you'd like me to use that or another regex instead, let me know.
I checked to see how many rows were removed by this filter, and it was about ~50%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c6a7610
to
dd52b99
Compare
- Added feature to allow for conversion of these premade mappings provided by SNOMED into SSSOM format. General updates - cli.py: Reorganized SSSOM_READ_FORMATS: Top half are plain data formats, and bottom half are special-case formats. Both halves of the list are alphabetically sorted.
…o 'SNOMED complex map'. - Docs: Added a page 'parsers.rst', and populated with 'SNOMED complex map' info.
dd52b99
to
ea18e6b
Compare
…of 'ALWAYS' level of confidence.
ea18e6b
to
2a29806
Compare
# SNOMED mappings, all of them had correlationId of 447561005, which also happens to be 'unspecified'. | ||
# If correlationId is indeed more appropriate for predicate_id, then I don't think there is a representative | ||
# field for 'mapping_justification'. | ||
# TODO: How to properly get mapping_justification? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue 3: solved
Not sure how to use sssom_schema.slots.mapping_justification
here.
edit: Harshad recommended I use SEMAPV
Issue 4
It might take a significant amount of time (maybe) to map all the SNOMED mapping fields to SSSOM mapping fields. But I should probably look into that again.
@matentzn I recommend we merge this now while this is rebased and functional, and we can make improvements to this feature later.
I did some formatting Joe, I hope that was fine. The pending errors from
|
Fixed the errors:
by adding a I think the pending ones must have risen from using spaces instead of tabs in your IDE. Not sure, tried fixing it on my end, but it still persists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot review this with these spurious diffs.. Can we somehow avoid these?
# IS CONTEXT DEPENDENT" | ||
# "MAP SOURCE CONCEPT CANNOT BE CLASSIFIED WITH AVAILABLE DATA" | ||
'predicate_id': f'SNOMED:{row["mapCategoryId"]}', | ||
'predicate_label': row['mapCategoryName'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share an example of how the resulting sssom looks like, maybe, 10 rows in markdown here in this pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sure thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table
I don't know if this table appears very strange / unreadable to others. It wasn't the case before, but as I am editing this today (2023/11/14) it is strange now.
subject_id | predicate_id | object_id | match_type | subject_label | subject_category | predicate_label | predicate_modifier | object_label | object_category | author_id | author_label | reviewer_id | reviewer_label | creator_id | creator_label | license | subject_source | subject_source_version | object_source | object_source_version | mapping_provider | mapping_cardinality | mapping_tool | mapping_tool_version | mapping_date | confidence | subject_match_field | object_match_field | match_string | subject_preprocessing | object_preprocessing | match_term_type | semantic_similarity_score | semantic_similarity_measure | see_also | other | comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SNOMED:10000006 | SNOMED:447637006 | ICD10CM:R07.9 | Unspecified | Radiating chest pain | MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED | Chest pain, unspecified | 2021-09-01 | id=4c4d2755-e65f-52dc-8353-ecf437495eb8|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS R07.9 | |||||||||||||||||||||||||||||
SNOMED:10001005 | SNOMED:447639009 | ICD10CM:P36.9 | Unspecified | Bacterial sepsis | MAP OF SOURCE CONCEPT IS CONTEXT DEPENDENT | Bacterial sepsis of newborn, unspecified | 2021-09-01 | id=f7e1ed02-3107-5a6c-9439-21e9879fb746|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=IFA 445518008 | Age at onset of clinical finding (observable entity) | < 29.0 days|mapAdvice=IF AGE AT ONSET OF CLINICAL FINDING BEFORE 29.0 DAYS CHOOSE P36.9 | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE | DESCENDANTS NOT EXHAUSTIVELY MAPPED | MAP OF SOURCE CONCEPT IS CONTEXT DEPENDENT | |||||||||||||||||||||||||||||
SNOMED:10001005 | SNOMED:447637006 | ICD10CM:A41.9 | Unspecified | Bacterial sepsis | MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED | Sepsis, unspecified organism | 2021-09-01 | id=907aa2b0-a832-5254-acdf-6f366e406d2b|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=2|mapRule=OTHERWISE TRUE|mapAdvice=ALWAYS A41.9 | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE | DESCENDANTS NOT EXHAUSTIVELY MAPPED | |||||||||||||||||||||||||||||
SNOMED:10007009 | SNOMED:447637006 | ICD10CM:Q04.8 | Unspecified | Coffin-Siris syndrome | MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED | Other specified congenital malformations of brain | 2021-09-01 | id=0b918513-be22-5059-beb6-e8348dfc631f|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS Q04.8 | |||||||||||||||||||||||||||||
SNOMED:1001000119102 | SNOMED:447637006 | ICD10CM:I26.99 | Unspecified | Pulmonary embolism with pulmonary infarction | MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED | Other pulmonary embolism without acute cor pulmonale | 2021-09-01 | id=7f3f0a95-0441-57ba-a58f-03d2f2736c60|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS I26.99 | |||||||||||||||||||||||||||||
SNOMED:10017004 | SNOMED:447637006 | ICD10CM:K03.0 | Unspecified | Occlusal wear of teeth | MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED | Excessive attrition of teeth | 2021-09-01 | id=3fa74c85-56d0-5819-97e8-80c8d4c053c0|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS K03.0 | |||||||||||||||||||||||||||||
SNOMED:100191000119105 | SNOMED:447637006 | ICD10CM:N42.89 | Unspecified | Asymmetry of prostate | MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED | Other specified disorders of prostate | 2021-09-01 | id=51537076-a3bf-59f8-aa5f-f571dfcf32cb|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS N42.89 | |||||||||||||||||||||||||||||
SNOMED:100211000119106 | SNOMED:447637006 | ICD10CM:M62.830 | Unspecified | Muscle spasm of thoracic back | MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED | Muscle spasm of back | 2021-09-01 | id=a0263a65-4056-5cf9-8caa-669afce20189|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS M62.830 | |||||||||||||||||||||||||||||
SNOMED:1002229008 | SNOMED:447637006 | ICD10CM:L75.2 | Unspecified | Apocrine miliaria of areola | MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED | Apocrine miliaria | 2021-09-01 | id=65fa44e5-272d-54e5-b787-cb977fc24720|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS L75.2 | |||||||||||||||||||||||||||||
SNOMED:1002253002 | SNOMED:447637006 | ICD10CM:Z28.3 | Unspecified | Immunization series incomplete | MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED | Underimmunization status | 2021-09-01 | id=c2ab86f4-29c5-5ea3-8818-1e389d9c59a6|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS Z28.3 | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- match_type does not exist anymore
- Why all these empty columns?
- predicate_id is not legal - I looked this SNOMED rel up but could not understand what it means at all. Can you explain how "Map source concept is properly classified" is a mapping relation? how does it map to skos?
- you should add a test that runs
validate_file
on the a small sample of the output I think,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match_type
: Oh this is an old copy. I haven't re-ran it in awhile. The code has since been updated tomapping_justification
.- Empty columns: I haven't determined how to populate these SSSOM fields. For many of them, there may not be any information available. I have enumerated the SSSOM Mapping data model as of 2022/30 in my code comments (link). If you click that link and check it out, you'll see that I have some thoughts/suggestions/questions as to what/how to populate. Any feedback/suggestions to those questions/comments are appreciated.
predicate_id
: I opened a review comment in the first commit about populating this field from SNOMED'smapCategoryId
andmapCategoryName
fields. They don't use too many mapping predicates, unfortunately. In that linked review comment, I stated that there were only 3 values in the field, but the ICD10CM mapping dataset only includes 1:MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED
(447637006
). You might be able to find more information in the "codebook" that SNOMED has published on these mappings. That codebook is nameddoc_Icd10cmMapReleaseNotes_Current-en-US_US1000124_20210901.pdf
which I have zipped up for you (w/ some other related files) here:
SNOMED ICD10CM map technical guide etc.zip- test: Sure thing, I'll add one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joeflack4 TODO: Note to self: Add a test that does validate_file
on the output.
@matentzn wrote:
Just to clarify, by spurious, do you mean the review comments that are linked to code snippets that are marked with the I don't think that's really something that can easily be changed. The way to change that would be for me to write code once and for it to be perfect and then thus not have the need to change it. But I don't really think that's realistic. I think it's probably better to hack away at things fast, get something pushed, and then make future edits as needed; agile style. |
2.a. Simple vs Complex 2.b. SNOMED map vs SNOMED::ICD map |
Is there interest to finish this? |
@cthoyt I hate to see unfinished work. I think I had basically finished "complex map" or "simple map". Other priorities took hold and this was forgotten. I leave it to @matentzn to decide, but perhaps the best thing to do is (i) close this PR, (ii) open an issue about finishing this, linking to this PR. |
@joeflack4 ok! Feel free to do |
Let leave in draft stage, I can see this becoming important in mid 2024 |
Fixes #208
New Feature: SNOMED::ICD10CM Mapping Support
General updates
utils.py
: ReorganizedSSSOM_READ_FORMATS
: Top half are plain data formats, and bottom half are special-case formats. Both halves of the list are alphabetically sorted.Docs
parsers.rst
, and populated with 'SNOMED complex map' info.