Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DICOMSeriesSelectorOperator Enhancements #501

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bluna301
Copy link

@bluna301 bluna301 commented Aug 12, 2024

Addressing some points raised in Issue #204.

Enhanced Matching for Numerical DICOM Tags

In addition to the previously supported exact matching, the following is now supported for numerical (i.e. integer and float type DICOM VRs, including IS and DS VRs) tags:

  • Range matching: when a list with two numerical values is provided, these values will serve as lower and upper bounds; match will be successful if DICOM tag value is between (inclusive) of the bounds. This was mainly tested for the SliceThickness tag.
  • RegEx matching: when a RegEx is provided, numerical DICOM tag value will be converted to a string; full match used provide more specificity (Note: RegEx matching was added as more of a proof of concept; this may not be utilized much in practice, but added it in this PR just to document. If this is not desired, I would be fine with removing).

I have also updated the example sample selection rules to include a 3rd selection ("CT Series 3"); this new selection includes an example of union matching for set type (functionality already present, but an example was not previously provided), as well as an example of the newly-added numerical range matching:

    "name": "CT Series 3",
    "conditions": {
        "StudyDescription": "(.*?)",
        "Modality": "(?i)CT",
        "ImageType": ["PRIMARY", "ORIGINAL", "AXIAL"],
        "SliceThickness": [3, 5]
    }

Sort by SOP Instance Count

In reviewing ~30 CT A/P scans from our hospital, filtering by StudyDescription, Modality, ImageType, and SliceThickness narrows down to a single, desired series in about 85% of cases (CT, Axial, 3-5 mm ST). For the remaining 15%, it is often the case that, of the series that pass the initial filtering, the series that is desired for inference is the one with the most number of DICOM images present (i.e. highest number of SOP instances). This is often due to the filtered series with more DICOM images having a higher likelihood of having the complete A/P anatomy present, as well as the presence of short series (e.g. Smart Prep Series) which technically pass the filters but are not desired for inference.

With this observation, I have implemented a new default parameter, sort_by_sop_instance_count, which, when True, and when multiple series pass the selection criteria (all_matched=True), will sort the selected series in descending SOP instance count, i.e. will list the filtered DICOM series with the most DICOM images first. This will give a higher probability that, despite downstream operators (the MONAIBundleInferenceOperator, I believe) being limited to only performing inference on the first selected series even if multiple series are selected, inference is performed on the desired series. sort_by_sop_instance_count is implemented in a very similar way to all_matched - as a default parameter with a default value of False.

Signed-off-by: bluna301 <[email protected]>
@bluna301 bluna301 changed the title [WIP] DICOMSeriesSelectorOperator enhancements [WIP] DICOMSeriesSelectorOperator Enhancements Aug 14, 2024
@MMelQin MMelQin requested a review from vikashg September 4, 2024 15:40
…p_folder declare location per pytype check

Signed-off-by: bluna301 <[email protected]>
@bluna301
Copy link
Author

bluna301 commented Sep 7, 2024

@vikashg @MMelQin - I appreciate your initial review of this PR, Elan provided me with your feedback earlier this week.

Regarding your comment about sorting and using ImagePositionPatient instead of SOP instances, I apologize as perhaps it's not clear from my explanation what I am trying to accomplish with this functionality. I am not trying to sort the slices within the selected series - I am trying to sort the series themselves.

For example, say all_matched=True, and we have 3 series that pass the selection criteria. Say these series have 29, 8, and 58 DICOM images each, and are matched in this order sequentially. Unsorted, the StudySelectedSeries list outputted would have an order of [29, 8, 58]; however, when sorted, the list would have an order of [58, 29, 8], i.e. placing the DICOM series with the highest number of DICOM images in the 0th position. Because of how downstream operators are currently configured, ultimately only a single DICOM SEG will be written for the "first" (0th) selected series in the list, even if multiple series are selected. Please refer to my original PR comment for why selecting the series with the higher number of DICOM images (if multiple series meet the criteria) could be desirable.

My code is using the get_sop_instances() method to grab the number of DICOM images present in the series, not to sort the DICOM series. I hope this helps to clarify, please let me know if you have any additional questions. I realize this feature may not be desired, or that it may be advisable to change the name of this parameter / add additional documentation to make this functionality clear.

In my latest commit, I also made a change to the STL conversion operator to address a failed pytype check:

Checking styles using pytype...
homebluna301monai-deploy-app-sdkmonaideployutilsimportutil.py20 DeprecationWarning pkg_resources is deprecated as an API. See httpssetuptools.pypa.ioenlatestpkg_resources.html
  import pkg_resources
2024-09-06 163710 $ python3 -m pytype --version
2024.04.11
2024-09-06 163711 $ python3 -m pytype -j 16 --python-version=3.10
Computing dependencies
Analyzing 54 sources with 1 local dependencies
ninja Entering directory `.pytype'
[56] check monai.deploy.operators.stl_conversion_operator
FAILED homebluna301monai-deploy-app-sdk.pytypepyimonaideployoperatorsstl_conversion_operator.pyi 
usrbinpython3 -m pytype.main --disable pyi-error,container-type-mismatch,attribute-error --imports_info homebluna301monai-deploy-app-sdk.pytypeimportsmonai.deploy.operators.stl_conversion_operator.imports --module-name monai.deploy.operators.stl_conversion_operator --platform linux -V 3.10 -o homebluna301monai-deploy-app-sdk.pytypepyimonaideployoperatorsstl_conversion_operator.pyi --analyze-annotated --nofail --quick homebluna301monai-deploy-app-sdkmonaideployoperatorsstl_conversion_operator.py
File homebluna301monai-deploy-app-sdkmonaideployoperatorsstl_conversion_operator.py, line 247, in convert Name 'temp_folder' is not defined [name-error]

For more details, see httpsgoogle.github.iopytypeerrors.html#name-error
ninja build stopped cannot make progress due to previous errors.
Leaving directory '.pytype'
pytype check failed!

I moved the declaration of temp_folder to outside of the try block to resolve this.

@bluna301 bluna301 changed the title [WIP] DICOMSeriesSelectorOperator Enhancements DICOMSeriesSelectorOperator Enhancements Sep 7, 2024
@bluna301 bluna301 marked this pull request as ready for review September 7, 2024 00:14
@vikashg
Copy link
Collaborator

vikashg commented Sep 10, 2024

Thanks @bluna301. I am looking at the code. Give me one more day and I will be able to comment on your PR.

@bluna301
Copy link
Author

Thanks @bluna301. I am looking at the code. Give me one more day and I will be able to comment on your PR.

Thanks @vikashg !

@vikashg
Copy link
Collaborator

vikashg commented Sep 11, 2024

Hey @bluna301, I understand the logic now and it makes sense too. Sometimes, I have also faced this issue and generally the series with highest number of dicoms is the right series.
I went through the code, and I will suggest that you should throw a warning or a log that you are choosing the series with highest number of dicom images as multiple series are matching the selection criteria.
This will be helpful for the user to know what is going on. Other than that I think it looks good. I have not run your code though.
@MMelQin thoughts ?

Copy link

sonarcloud bot commented Oct 1, 2024

@bluna301
Copy link
Author

bluna301 commented Oct 1, 2024

@vikashg Thank you for your review! I agree with your suggestion, it would be good to make it more explicit to the user what is going on during when multiple series are matched. I added an info log for when sorting occurs, and also added some additional detail in the documentation of this operator to make it explicit that enabling this feature will "return" (i.e. sort and place first in the returned List) the series with the highest # of DICOM images.

Screenshot 2024-10-01 at 13 54 58

I have tested the new series selection functionality locally, as well as the minor change to the STL operator, with no issues. Please let me know if you or @MMelQin would like to test out the functionality before merging the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants