Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add curation script for INSPIRED dataset BIDSification #185

Closed
wants to merge 38 commits into from

Conversation

valosekj
Copy link
Member

This PR adds a python script for the curation of the INSPIRED dataset to be BIDS compliant.
The PR is a draft and is still in progress.

Fixes: #184

Copy link
Member

@jcohenadad jcohenadad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the script under the dataset, in the 'code/' folder

@valosekj
Copy link
Member Author

valosekj commented Dec 5, 2022

I finished the script to be ready for review. Notes below.

Input dataset

Input dataset (as stored on the `duke/mri/INSPIRED`):
$ tree -L 3
.
├── 01
│   ├── csm
│   │   ├── 001.tar.gz
│   │   ├── 005.tar.gz
│   │   ├── 006.tar.gz
│   │   ├── 007.tar.gz
│   │   ├── 008.tar.gz
│   │   ├── 009.tar.gz
│   │   ├── 010.tar.gz
│   │   ├── 015.tar.gz
│   │   ├── 017.tar.gz
│   │   ├── 018.tar.gz
│   │   ├── 019.tar.gz
│   │   ├── 021.tar.gz
│   │   ├── 022.tar.gz
│   │   └── 023.tar.gz
│   ├── hc
│   │   ├── 001.tar.gz
│   │   ├── 002.tar.gz
│   │   ├── 004.tar.gz
│   │   ├── 005.tar.gz
│   │   ├── 006.tar.gz
│   │   ├── 007.tar.gz
│   │   ├── 009.tar.gz
│   │   ├── 010.tar.gz
│   │   ├── 011.tar.gz
│   │   ├── 012.tar.gz
│   │   ├── 013.tar.gz
│   │   ├── 014.tar.gz
│   │   ├── 015.tar.gz
│   │   └── 016.tar.gz
│   └── sci
│       ├── 003.tar.gz
│       └── 012.tar.gz
├── 02
│   ├── csm
│   │   ├── 001.tar.gz
│   │   ├── 002.tar.gz
│   │   ├── 003.tar.gz
│   │   ├── 004.tar.gz
│   │   ├── 005.tar.gz
│   │   ├── 006.tar.gz
│   │   ├── 007.tar.gz
│   │   ├── 008.tar.gz
│   │   ├── 009.tar.gz
│   │   ├── 010.tar.gz
│   │   ├── 011.tar.gz
│   │   ├── 012.tar.gz
│   │   ├── 013.tar.gz
│   │   ├── 014.tar.gz
│   │   ├── 015.tar.gz
│   │   ├── 016.tar.gz
│   │   ├── 017.tar.gz
│   │   ├── 018.tar.gz
│   │   ├── 019.tar.gz
│   │   ├── 020.tar.gz
│   │   ├── 021.tar.gz
│   │   ├── 022.tar.gz
│   │   ├── 023.tar.gz
│   │   └── 024.tar.gz
│   ├── hc
│   │   ├── 001.tar.gz
│   │   ├── 002.tar.gz
│   │   ├── 003.tar.gz
│   │   ├── 004.tar.gz
│   │   ├── 005.tar.gz
│   │   ├── 006.tar.gz
│   │   ├── 007.tar.gz
│   │   ├── 008.tar.gz
│   │   ├── 009.tar.gz
│   │   └── 010.tar.gz
│   └── sci
│       ├── 001.tar.gz
│       ├── 002.tar.gz
│       ├── 003.tar.gz
│       ├── 004.tar.gz
│       ├── 005.tar.gz
│       ├── 006.tar.gz
│       ├── 007.tar.gz
│       ├── 008.tar.gz
│       ├── 010.tar.gz
│       ├── 011.tar.gz
│       ├── 012.tar.gz
│       ├── 013.tar.gz
│       ├── 014.tar.gz
│       ├── 015.tar.gz
│       ├── 016.tar.gz
│       └── 017.tar.gz
└── README.txt
Files for a single subject (after unpacking by `tar -xf`):
/01/csm/001$ tree -L 5
.
└── bl
    ├── brain
    │   ├── dwi.bval
    │   ├── dwi.bvec
    │   ├── dwi.nii.gz
    │   ├── dwi_ad1000.nii.gz
    │   ├── dwi_md1000.nii.gz
    │   ├── dwi_rd1000.nii.gz
    │   ├── dwi_reversed_blip.nii.gz
    │   ├── mpm_A.nii.gz
    │   ├── mpm_MT.nii.gz
    │   ├── mpm_MT_std.nii.gz
    │   ├── mpm_MT_std_Cerebellum.nii.gz
    │   ├── mpm_MT_std_NeuroMorph_BiasCorrected.nii.gz
    │   ├── mpm_MT_std_NeuroMorph_Brain.nii.gz
    │   ├── mpm_MT_std_NeuroMorph_Parcellation.nii.gz
    │   ├── mpm_MT_std_NeuroMorph_Segmentation.nii.gz
    │   ├── mpm_MT_std_NeuroMorph_Segmentation_clean.nii.gz
    │   ├── mpm_MT_std_NeuroMorph_Segmentation_flat.nii.gz
    │   ├── mpm_MT_std_NeuroMorph_TIV.nii.gz
    │   ├── mpm_MT_std_NeuroMorph_prior.nii.gz
    │   ├── mpm_R1_UNICORT.nii.gz
    │   ├── mpm_R2s_OLS.nii.gz
    │   └── mpm_raw
    │       ├── sINSPIREDTrial-0002-00001-000176-01.json
    │       ├── sINSPIREDTrial-0002-00001-000176-01.nii
    │       ├── sINSPIREDTrial-0002-00001-000352-02.json
    │       ├── sINSPIREDTrial-0002-00001-000352-02.nii
    │       ├── sINSPIREDTrial-0002-00001-000528-03.json
    │       ├── sINSPIREDTrial-0002-00001-000528-03.nii
    │       ├── sINSPIREDTrial-0002-00001-000704-04.json
    │       ├── sINSPIREDTrial-0002-00001-000704-04.nii
    │       ├── sINSPIREDTrial-0002-00001-000880-05.json
    │       ├── sINSPIREDTrial-0002-00001-000880-05.nii
    │       ├── sINSPIREDTrial-0002-00001-001056-06.json
    │       ├── sINSPIREDTrial-0002-00001-001056-06.nii
    │       ├── sINSPIREDTrial-0005-00001-000176-01.json
    │       ├── sINSPIREDTrial-0005-00001-000176-01.nii
    │       ├── sINSPIREDTrial-0005-00001-000352-02.json
    │       ├── sINSPIREDTrial-0005-00001-000352-02.nii
    │       ├── sINSPIREDTrial-0005-00001-000528-03.json
    │       ├── sINSPIREDTrial-0005-00001-000528-03.nii
    │       ├── sINSPIREDTrial-0005-00001-000704-04.json
    │       ├── sINSPIREDTrial-0005-00001-000704-04.nii
    │       ├── sINSPIREDTrial-0005-00001-000880-05.json
    │       ├── sINSPIREDTrial-0005-00001-000880-05.nii
    │       ├── sINSPIREDTrial-0005-00001-001056-06.json
    │       ├── sINSPIREDTrial-0005-00001-001056-06.nii
    │       ├── sINSPIREDTrial-0005-00001-001232-07.json
    │       ├── sINSPIREDTrial-0005-00001-001232-07.nii
    │       ├── sINSPIREDTrial-0005-00001-001408-08.json
    │       ├── sINSPIREDTrial-0005-00001-001408-08.nii
    │       ├── sINSPIREDTrial-0008-00001-000176-01.json
    │       ├── sINSPIREDTrial-0008-00001-000176-01.nii
    │       ├── sINSPIREDTrial-0008-00001-000352-02.json
    │       ├── sINSPIREDTrial-0008-00001-000352-02.nii
    │       ├── sINSPIREDTrial-0008-00001-000528-03.json
    │       ├── sINSPIREDTrial-0008-00001-000528-03.nii
    │       ├── sINSPIREDTrial-0008-00001-000704-04.json
    │       ├── sINSPIREDTrial-0008-00001-000704-04.nii
    │       ├── sINSPIREDTrial-0008-00001-000880-05.json
    │       ├── sINSPIREDTrial-0008-00001-000880-05.nii
    │       ├── sINSPIREDTrial-0008-00001-001056-06.json
    │       ├── sINSPIREDTrial-0008-00001-001056-06.nii
    │       ├── sINSPIREDTrial-0008-00001-001232-07.json
    │       ├── sINSPIREDTrial-0008-00001-001232-07.nii
    │       ├── sINSPIREDTrial-0008-00001-001408-08.json
    │       └── sINSPIREDTrial-0008-00001-001408-08.nii
    └── cord
        ├── dwi.bval
        ├── dwi.bvec
        ├── dwi.nii.gz
        ├── dwi_reversed_blip.nii.gz
        ├── pd_medic.nii.gz
        ├── sct_processing
        │   ├── dwi
        │   │   ├── dti_AD.nii.gz
        │   │   ├── dti_FA.nii.gz
        │   │   ├── dti_MD.nii.gz
        │   │   └── dti_RD.nii.gz
        │   ├── t2
        │   │   └── t2_seg.nii.gz
        │   └── t2s
        │       ├── gm_seg.nii.gz
        │       └── wm_seg.nii.gz
        ├── t1_sag.nii.gz
        ├── t2_sag.nii.gz
        └── t2_tra.nii.gz

Conversion notes

So far, for the cord folder, I am converting all raw images (i.e. I am not converting files in the cord/sct_processing).
For the brain folder, I am converting DWI files and raw MPM images in the brain/mpm_raw folder (i.e., I am not converting processed mpm_ files).
To differentiate spine imaging from the brain, I am using the bp-cspine tag.

The filename conversion dictionaries are listed on lines 47-67.

The tricky part was related to the MPM. I had to load json sidecar for each MPM file to fetch EchoTime, FlipAngle, and SeriesDescription. Then the BIDS complaint MPM filenames are constructed within the construct_mpm_bids_filename function:

def construct_mpm_bids_filename(mpm_files_dict, path_output, subject_out):
"""
Construct BIDS compliant filename for MPM images, e.g. 'acq-T1w_echo-1_flip-1_mt-off_MPM'

My notes regarding the MPM files conversion are available here.

Output dataset

I was testing the script on the copy of the dataset (extrassd1/janvalosek/INSPIRED). The converted dataset is available in extrassd1/janvalosek/INSPIRED_bids.

The script also creates participants.tsv, participants.json, dataset_description.json, and README files. Also, the script copies itself to the /code folder.

Output dataset (the first subject shown):
$ tree -L 3
.
├── bids_conversion.log
├── code
│   └── curate_data_inspired.py
├── dataset_description.json
├── participants.json
├── participants.tsv
├── README
├── sub-torontoDCM001
│   ├── anat
│   │   ├── sub-torontoDCM001_acq-axial_bp-cspine_T2w.json
│   │   ├── sub-torontoDCM001_acq-axial_bp-cspine_T2w.nii.gz
│   │   ├── sub-torontoDCM001_acq-coronal_bp-cspine_T2w.json
│   │   ├── sub-torontoDCM001_acq-coronal_bp-cspine_T2w.nii.gz
│   │   ├── sub-torontoDCM001_acq-MTw_echo-1_flip-1_mt-on_MPM.json
│   │   ├── sub-torontoDCM001_acq-MTw_echo-1_flip-1_mt-on_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-MTw_echo-2_flip-1_mt-on_MPM.json
│   │   ├── sub-torontoDCM001_acq-MTw_echo-2_flip-1_mt-on_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-MTw_echo-3_flip-1_mt-on_MPM.json
│   │   ├── sub-torontoDCM001_acq-MTw_echo-3_flip-1_mt-on_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-MTw_echo-4_flip-1_mt-on_MPM.json
│   │   ├── sub-torontoDCM001_acq-MTw_echo-4_flip-1_mt-on_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-MTw_echo-5_flip-1_mt-on_MPM.json
│   │   ├── sub-torontoDCM001_acq-MTw_echo-5_flip-1_mt-on_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-MTw_echo-6_flip-1_mt-on_MPM.json
│   │   ├── sub-torontoDCM001_acq-MTw_echo-6_flip-1_mt-on_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-PDw_echo-1_flip-2_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-PDw_echo-1_flip-2_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-PDw_echo-2_flip-2_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-PDw_echo-2_flip-2_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-PDw_echo-3_flip-2_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-PDw_echo-3_flip-2_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-PDw_echo-4_flip-2_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-PDw_echo-4_flip-2_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-PDw_echo-5_flip-2_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-PDw_echo-5_flip-2_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-PDw_echo-6_flip-2_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-PDw_echo-6_flip-2_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-PDw_echo-7_flip-2_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-PDw_echo-7_flip-2_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-PDw_echo-8_flip-2_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-PDw_echo-8_flip-2_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-T1w_echo-1_flip-3_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-T1w_echo-1_flip-3_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-T1w_echo-2_flip-3_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-T1w_echo-2_flip-3_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-T1w_echo-3_flip-3_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-T1w_echo-3_flip-3_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-T1w_echo-4_flip-3_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-T1w_echo-4_flip-3_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-T1w_echo-5_flip-3_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-T1w_echo-5_flip-3_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-T1w_echo-6_flip-3_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-T1w_echo-6_flip-3_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-T1w_echo-7_flip-3_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-T1w_echo-7_flip-3_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_acq-T1w_echo-8_flip-3_mt-off_MPM.json
│   │   ├── sub-torontoDCM001_acq-T1w_echo-8_flip-3_mt-off_MPM.nii.gz
│   │   ├── sub-torontoDCM001_bp-cspine_T1w.json
│   │   ├── sub-torontoDCM001_bp-cspine_T1w.nii.gz
│   │   ├── sub-torontoDCM001_bp-cspine_T2star.json
│   │   └── sub-torontoDCM001_bp-cspine_T2star.nii.gz
│   └── dwi
│       ├── sub-torontoDCM001_dir-AP_bp-cspine_dwi.bval
│       ├── sub-torontoDCM001_dir-AP_bp-cspine_dwi.bvec
│       ├── sub-torontoDCM001_dir-AP_bp-cspine_dwi.json
│       ├── sub-torontoDCM001_dir-AP_bp-cspine_dwi.nii.gz
│       ├── sub-torontoDCM001_dir-AP_dwi.bval
│       ├── sub-torontoDCM001_dir-AP_dwi.bvec
│       ├── sub-torontoDCM001_dir-AP_dwi.json
│       ├── sub-torontoDCM001_dir-AP_dwi.nii.gz
│       ├── sub-torontoDCM001_dir-PA_bp-cspine_dwi.json
│       ├── sub-torontoDCM001_dir-PA_bp-cspine_dwi.nii.gz
│       ├── sub-torontoDCM001_dir-PA_dwi.json
│       └── sub-torontoDCM001_dir-PA_dwi.nii.gz
├── sub-torontoDCM002
...

The participants.tsv file looks like:

participant_id pathology data_id institution_id institution
sub-torontoDCM001 DCM 001 01 toronto
sub-torontoDCM002 DCM 005 01 toronto
...

Questions

  • Should I also convert processed files (i.e., files in the cord/sct_processing and processed brain/mpm_ files) and save them to derivatives?

@valosekj
Copy link
Member Author

valosekj commented Dec 5, 2022

Please put the script under the dataset, in the 'code/' folder

Done here:

def copy_script(path_output):
"""
Copy the script itself to the path_output/code folder
:param path_output: path to the output BIDS folder
:return:
"""

@valosekj valosekj marked this pull request as ready for review December 5, 2022 14:27
Copy link
Member

@jcohenadad jcohenadad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please keep the original subject ID
  • Update README (see my other comment in this section)

@jcohenadad
Copy link
Member

julien todo: fix readme and jan: ping julien for readme

@jcohenadad
Copy link
Member

does it pass the BIDS validator?

@valosekj valosekj closed this Dec 12, 2022
@valosekj
Copy link
Member Author

Sorry, accidentally closed --> reopening.

@valosekj valosekj reopened this Dec 12, 2022
@mariehbourget
Copy link
Contributor

There is a thing that acq- field is already used (we have acq-axial and acq-coronal for T2w and acq-MTw, acq-PDw, and acq-T1w for MPM):

this is a very good point. So maybe we should use bp- indeed 🤷

If you absolutely need BIDS compliant filenames, an alternative would be to combine both information in the acq- entity in camelCase.
examples:
acq-axialCspine or acq-cspineAxial for T2w
and
acq-PDwCspine or acq-cspinePDw for MPM
A description of each labels could also be added to the README to clarify the labeling.

@jcohenadad
Copy link
Member

#185 (comment) this is a good option as well

@valosekj
Copy link
Member Author

Thank you @mariehbourget!

I switched to acq-cspineSagittal and acq-cspineAxial in 30ae997

@valosekj
Copy link
Member Author

valosekj commented Jan 21, 2023

@jcohenadad, I updated the script for INSPIRED dataset BIDSification. The dataset should be ready for upload to git-annex.

README.md is attached to this PR for easy review.

The BIDSified dataset is available in: ~/extrassd1/janvalosek/INSPIRED_bids_20230121. It passes bids-validator.

Based on our recent discussion, I'm using label-xx_seg.nii.gz suffixes for SC, WM, and GM segmentations:

derivatives_spine_conv_dict = {
't2_seg.nii.gz': 'acq-cspineAxial_T2w_label-SC_seg.nii.gz',
'gm_seg.nii.gz': 'acq-cspine_T2star_label-GM_seg.nii.gz',
'wm_seg.nii.gz': 'acq-cspine_T2star_label-WM_seg.nii.gz'

acq-cspine tag in combination with camelCase is used to differentiate spine images from the brain. Details here.

To address unmatched sform/qform, I copy qform to sform:

def copy_qform_to_sform(path_dir_out, file_out):
"""
Copy qform to sform for to address https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/3991#issuecomment-1378765661
:param path_dir_out:
:param file_out:
:return:
"""
path_file = os.path.join(path_dir_out, file_out)
# Make sure that the input file exists, if so, copy it
if os.path.isfile(path_file):
os.system('sct_image -i ' + path_file + ' -set-qform-to-sform')


TODO: Once the dataset is git-annexed, I will add clinical info (mJOA, etc.) and manual disc labels (which I created in the context of the OHBM abstract) as separate commits to allow easy tracking by git log --stat git-annex.

scripts/README.md Outdated Show resolved Hide resolved
scripts/README.md Outdated Show resolved Hide resolved
valosekj and others added 2 commits January 24, 2023 11:22
Co-authored-by: Julien Cohen-Adad <[email protected]>
Co-authored-by: Julien Cohen-Adad <[email protected]>
@valosekj valosekj mentioned this pull request Jan 26, 2023
@valosekj
Copy link
Member Author

BIDSification completed (see #208) --> closing

@valosekj valosekj closed this Feb 15, 2023
@valosekj valosekj deleted the jv/curate_inspired branch February 27, 2023 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conversion of duke/mri/INSPIRED/ to BIDS
3 participants