Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inspired - add data #208

Closed
valosekj opened this issue Jan 26, 2023 · 7 comments
Closed

inspired - add data #208

valosekj opened this issue Jan 26, 2023 · 7 comments
Assignees

Comments

@valosekj
Copy link
Member

valosekj commented Jan 26, 2023

Within the branch jv/add_files, I added the BIDSisfied version of the INSPIRED dataset.

Related PR.
The script I used for BIDSification is available here.

TODO after the merge of the jv/add_files branch:

UPDATE: I can actually do these TODOs within this branch. So, @mguaypaq, please, wait with the merging.

  • add clinical data to participants.tsv
  • upload updated derivatives created in the context of the OHBM 2023 abstract
@mguaypaq
Copy link
Member

I added a few commits to this branch, after running bids-validator and fixing some of the errors it reports:

  • There were 761 empty JSON files, which is not allowed. Until they were removed, that was the only error reported by bids-validator. I removed them with the following command, which unlocked several more error messages from bids-validator:
    find . -type f -name '*.json' -size 0 -delete
    
  • participants.json is not a valid JSON file; it looks like it's missing a closing brace for the value of pathology.Levels. (Also, it contains some tab characters, which should probably be space characters instead.)
  • 44 JSON sidecar files contained invalid utf-8 characters. It looks like it was just because of bad encoding of the umlauts in the words "Universitätsklinik" and "Zürich".
  • 80 image files with file extension .nii.gz were not actually gzipped. I renamed them to .nii (but maybe I should have gzipped them instead?)

Now, there are still some bids-validator errors, but I'm not sure what they mean, so maybe you can fix them:

	1: [ERR] The number of volumes in this scan does not match the number of
	         volumes in the corresponding .bvec and .bval files.
	         (code: 29 - VOLUME_COUNT_MISMATCH)
		./sub-torontoDCM001/dwi/sub-torontoDCM001_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM005/dwi/sub-torontoDCM005_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM006/dwi/sub-torontoDCM006_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM007/dwi/sub-torontoDCM007_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM008/dwi/sub-torontoDCM008_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM009/dwi/sub-torontoDCM009_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM010/dwi/sub-torontoDCM010_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM015/dwi/sub-torontoDCM015_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM017/dwi/sub-torontoDCM017_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM018/dwi/sub-torontoDCM018_acq-cspine_dir-AP_dwi.nii.gz
		... and 67 more files having this issue (Use --verbose to see them all).

	2: [ERR] DWI scans should have a corresponding .bvec file.
	         (code: 32 - DWI_MISSING_BVEC)
		./sub-torontoDCM001/dwi/sub-torontoDCM001_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM001/dwi/sub-torontoDCM001_dir-PA_dwi.nii.gz
		./sub-torontoDCM005/dwi/sub-torontoDCM005_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM005/dwi/sub-torontoDCM005_dir-PA_dwi.nii.gz
		./sub-torontoDCM006/dwi/sub-torontoDCM006_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM006/dwi/sub-torontoDCM006_dir-PA_dwi.nii.gz
		./sub-torontoDCM007/dwi/sub-torontoDCM007_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM007/dwi/sub-torontoDCM007_dir-PA_dwi.nii.gz
		./sub-torontoDCM008/dwi/sub-torontoDCM008_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM008/dwi/sub-torontoDCM008_dir-PA_dwi.nii.gz
		... and 143 more files having this issue (Use --verbose to see them all).

	3: [ERR] DWI scans should have a corresponding .bval file.
	         (code: 33 - DWI_MISSING_BVAL)
		./sub-torontoDCM001/dwi/sub-torontoDCM001_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM001/dwi/sub-torontoDCM001_dir-PA_dwi.nii.gz
		./sub-torontoDCM005/dwi/sub-torontoDCM005_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM005/dwi/sub-torontoDCM005_dir-PA_dwi.nii.gz
		./sub-torontoDCM006/dwi/sub-torontoDCM006_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM006/dwi/sub-torontoDCM006_dir-PA_dwi.nii.gz
		./sub-torontoDCM007/dwi/sub-torontoDCM007_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM007/dwi/sub-torontoDCM007_dir-PA_dwi.nii.gz
		./sub-torontoDCM008/dwi/sub-torontoDCM008_acq-cspine_dir-PA_dwi.nii.gz
		./sub-torontoDCM008/dwi/sub-torontoDCM008_dir-PA_dwi.nii.gz
		... and 143 more files having this issue (Use --verbose to see them all).

There are also a few warnings. If we don't care about them, we should add a .bids-validator-config.json file to ignore them explicitly.

	1: [WARN] Not all subjects contain the same files. Each subject should
	          contain the same number of files with the same naming unless some
	          files are known to be missing.
	          (code: 38 - INCONSISTENT_SUBJECTS)
		1842 files having this issue.

	2: [WARN] Not all subjects/sessions/runs have the same scanning parameters.
	          (code: 39 - INCONSISTENT_PARAMETERS)
		103 files having this issue.

	3: [WARN] The Authors field of dataset_description.json should contain an
	          array of fields - with one author per field. This was triggered
	          because there are no authors, which will make DOI registration from
	          dataset metadata impossible.
	          (code: 113 - NO_AUTHORS)

@valosekj
Copy link
Member Author

valosekj commented Feb 5, 2023

I added a few commits to this branch, after running bids-validator and fixing some of the errors it reports:

  • There were 761 empty JSON files, which is not allowed. Until they were removed, that was the only error reported by bids-validator. I removed them with the following command, which unlocked several more error messages from bids-validator:
    find . -type f -name '*.json' -size 0 -delete
    
  • participants.json is not a valid JSON file; it looks like it's missing a closing brace for the value of pathology.Levels. (Also, it contains some tab characters, which should probably be space characters instead.)

Thank you!

  • 44 JSON sidecar files contained invalid utf-8 characters. It looks like it was just because of bad encoding of the umlauts in the words "Universitätsklinik" and "Zürich".

Thank you! I copied the existing JSON files from the provided non-BIDS dataset. This is how this happened.

  • 80 image files with file extension .nii.gz were not actually gzipped. I renamed them to .nii (but maybe I should have gzipped them instead?)

Thanks! Sure, all .nii files should be .nii.gz.

Now, there are still some bids-validator errors, but I'm not sure what they mean, so maybe you can fix them:

	1: [ERR] The number of volumes in this scan does not match the number of
	         volumes in the corresponding .bvec and .bval files.
	         (code: 29 - VOLUME_COUNT_MISMATCH)
		./sub-torontoDCM001/dwi/sub-torontoDCM001_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM005/dwi/sub-torontoDCM005_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM006/dwi/sub-torontoDCM006_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM007/dwi/sub-torontoDCM007_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM008/dwi/sub-torontoDCM008_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM009/dwi/sub-torontoDCM009_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM010/dwi/sub-torontoDCM010_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM015/dwi/sub-torontoDCM015_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM017/dwi/sub-torontoDCM017_acq-cspine_dir-AP_dwi.nii.gz
		./sub-torontoDCM018/dwi/sub-torontoDCM018_acq-cspine_dir-AP_dwi.nii.gz
		... and 67 more files having this issue (Use --verbose to see them all).

This is weird! I checked the files and number of DWI volumes corresponds with the number of bval and bvec values:

$ sct_image -i sub-torontoDCM001_acq-cspine_dir-AP_dwi.nii.gz -header | grep dim | head -1 
dim		[4, 176, 40, 10, 100, 1, 1, 1]
$ cat sub-torontoDCM001_acq-cspine_dir-AP_dwi.bval | wc -w 
     100
cat sub-torontoDCM001_acq-cspine_dir-AP_dwi.bvec | wc -w
     300

In other words, sub-torontoDCM001_acq-cspine_dir-AP_dwi.nii.gz file is a 4D file with 100 volumes (the fifth value in the dim field). Each volume is described by a b-value (scalar value) and b-vector (1x3 vector) (details for example here). This assumption is fulfilled-- we have 100 values in sub-torontoDCM001_acq-cspine_dir-AP_dwi.bval and 300 values in sub-torontoDCM001_acq-cspine_dir-AP_dwi.bvec.

2: [ERR] DWI scans should have a corresponding .bvec file.
(code: 32 - DWI_MISSING_BVEC)
./sub-torontoDCM001/dwi/sub-torontoDCM001_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM001/dwi/sub-torontoDCM001_dir-PA_dwi.nii.gz
./sub-torontoDCM005/dwi/sub-torontoDCM005_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM005/dwi/sub-torontoDCM005_dir-PA_dwi.nii.gz
./sub-torontoDCM006/dwi/sub-torontoDCM006_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM006/dwi/sub-torontoDCM006_dir-PA_dwi.nii.gz
./sub-torontoDCM007/dwi/sub-torontoDCM007_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM007/dwi/sub-torontoDCM007_dir-PA_dwi.nii.gz
./sub-torontoDCM008/dwi/sub-torontoDCM008_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM008/dwi/sub-torontoDCM008_dir-PA_dwi.nii.gz
... and 143 more files having this issue (Use --verbose to see them all).

3: [ERR] DWI scans should have a corresponding .bval file.
(code: 33 - DWI_MISSING_BVAL)
./sub-torontoDCM001/dwi/sub-torontoDCM001_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM001/dwi/sub-torontoDCM001_dir-PA_dwi.nii.gz
./sub-torontoDCM005/dwi/sub-torontoDCM005_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM005/dwi/sub-torontoDCM005_dir-PA_dwi.nii.gz
./sub-torontoDCM006/dwi/sub-torontoDCM006_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM006/dwi/sub-torontoDCM006_dir-PA_dwi.nii.gz
./sub-torontoDCM007/dwi/sub-torontoDCM007_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM007/dwi/sub-torontoDCM007_dir-PA_dwi.nii.gz
./sub-torontoDCM008/dwi/sub-torontoDCM008_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM008/dwi/sub-torontoDCM008_dir-PA_dwi.nii.gz
... and 143 more files having this issue (Use --verbose to see them all).

This is a relevant warning-- each DWI .nii.gz should be accompanied by .bval and .bvec files. However, in the case of dir-PA files (ie, files acquired with posterior-anterior phase coding), the .nii.gz file is just a 3D file (ie, 4D file with one DWI volume). These dir-PA files were acquired without any diffusion weighting (ie, b-value=0 and b-vector=[0,0,0]) for the purpose of susceptibility artifacts corrections (details for example here). It is common to omit bval and bvec files for such DWI files. The dcm2niix tool (which was probably used for the creation of the original non-BIDS dataset) probably did not create these bval and bvec.
If we really want to comply with these warnings, I can create the bval and bvec files manually within the script.

There are also a few warnings. If we don't care about them, we should add a .bids-validator-config.json file to ignore them explicitly.

	1: [WARN] Not all subjects contain the same files. Each subject should
	          contain the same number of files with the same naming unless some
	          files are known to be missing.
	          (code: 38 - INCONSISTENT_SUBJECTS)
		1842 files having this issue.

	2: [WARN] Not all subjects/sessions/runs have the same scanning parameters.
	          (code: 39 - INCONSISTENT_PARAMETERS)
		103 files having this issue.

	3: [WARN] The Authors field of dataset_description.json should contain an
	          array of fields - with one author per field. This was triggered
	          because there are no authors, which will make DOI registration from
	          dataset metadata impossible.
	          (code: 113 - NO_AUTHORS)

Might these warnings be caused by the fact that the dataset contains subjects from two centres? This is why, for example, naming is not the same (sub-toronto vs sub-zurich).

@mguaypaq
Copy link
Member

mguaypaq commented Feb 7, 2023

Thanks! Sure, all .nii files should be .nii.gz.

Done in a new commit 738ac34.

	1: [ERR] The number of volumes in this scan does not match the number of
	         volumes in the corresponding .bvec and .bval files.
	         (code: 29 - VOLUME_COUNT_MISMATCH)
		./sub-torontoDCM001/dwi/sub-torontoDCM001_acq-cspine_dir-AP_dwi.nii.gz

This is weird! I checked the files and number of DWI volumes corresponds with the number of bval and bvec values:

It looks like this may be a bug in bids-validator. According to this forum post, it can happen when the folder contains multiple .bvec and .bval files, and bids-validator wants all of them to match (even though they are for different images).

The best fix would be to open an issue and/or a PR in bids-validator, but in the meantime we can configure it to ignore this error. Or, that forum post says maybe some renaming of the files could fix the problem, but I'm not sure what the implications are.

2: [ERR] DWI scans should have a corresponding .bvec file.
(code: 32 - DWI_MISSING_BVEC)
./sub-torontoDCM001/dwi/sub-torontoDCM001_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM001/dwi/sub-torontoDCM001_dir-PA_dwi.nii.gz
3: [ERR] DWI scans should have a corresponding .bval file.
(code: 33 - DWI_MISSING_BVAL)
./sub-torontoDCM001/dwi/sub-torontoDCM001_acq-cspine_dir-PA_dwi.nii.gz
./sub-torontoDCM001/dwi/sub-torontoDCM001_dir-PA_dwi.nii.gz

[...] I can create the bval and bvec files manually within the script.

Ah! I hadn't paid attention to the fact that there was a script to BIDSify the dataset (even though you clearly wrote so in the description). Is it a problem that I've been adding "manual" commits on top, instead of improving the script?

Regardless, we should probably create these missing files with the appropriate zero values, I think.

There are also a few warnings. If we don't care about them, we should add a .bids-validator-config.json file to ignore them explicitly.

	1: [WARN] Not all subjects contain the same files.
	2: [WARN] Not all subjects/sessions/runs have the same scanning parameters.
	3: [WARN] The Authors field of dataset_description.json should contain an

Might these warnings be caused by the fact that the dataset contains subjects from two centres? This is why, for example, naming is not the same (sub-toronto vs sub-zurich).

For the author warning, I think we haven't been caring about that for our internal datasets, so I'm happy to ignore that.

For the other warnings, I don't think it's because of the naming scheme. I think the files and/or parameters are genuinely different. For example, looking at just the sub-*/dwi/ folders, I see that some expected files just don't exist:

sub-zurichSCI001/dwi/sub-zurichSCI001_dir-PA_dwi.nii.gz

sub-zurichSCI011/dwi/sub-zurichSCI011_acq-cspine_dir-PA_dwi.nii.gz

sub-zurichSCI014/dwi/sub-zurichSCI014_dir-AP_dwi.bval
sub-zurichSCI014/dwi/sub-zurichSCI014_dir-AP_dwi.bvec
sub-zurichSCI014/dwi/sub-zurichSCI014_dir-AP_dwi.nii.gz
sub-zurichSCI014/dwi/sub-zurichSCI014_dir-PA_dwi.nii.gz

sub-zurichSCI015/dwi/sub-zurichSCI015_dir-AP_dwi.bval
sub-zurichSCI015/dwi/sub-zurichSCI015_dir-AP_dwi.bvec
sub-zurichSCI015/dwi/sub-zurichSCI015_dir-AP_dwi.nii.gz
sub-zurichSCI015/dwi/sub-zurichSCI015_dir-PA_dwi.nii.gz

sub-zurichSCI016/dwi/sub-zurichSCI016_acq-cspine_dir-PA_dwi.nii.gz

I know we expect some of our other datasets to be heterogeneous, but do we expect the inspired dataset to be more uniform?

@valosekj
Copy link
Member Author

valosekj commented Feb 15, 2023

It looks like this may be a bug in bids-validator. According to this forum post, it can happen when the folder contains multiple .bvec and .bval files, and bids-validator wants all of them to match (even though they are for different images).

The best fix would be to open an issue and/or a PR in bids-validator, but in the meantime we can configure it to ignore this error. Or, that forum post says maybe some renaming of the files could fix the problem, but I'm not sure what the implications are.

Agree. Could you please configure the bids-validator to ignore this? (Does it mean just to add these files to .bidsignore?)

Ah! I hadn't paid attention to the fact that there was a script to BIDSify the dataset (even though you clearly wrote so in the description). Is it a problem that I've been adding "manual" commits on top, instead of improving the script?

I hope not. We have everything documented here on GitHub anyway.

Regardless, we should probably create these missing files with the appropriate zero values, I think.

Agree; I will create these files and add them in a new commit. I also found a relevant discussion about bval and bvec files for b=0 images here.

For the author warning, I think we haven't been caring about that for our internal datasets, so I'm happy to ignore that.

Okay!

For the other warnings, I don't think it's because of the naming scheme. I think the files and/or parameters are genuinely different. For example, looking at just the sub-*/dwi/ folders, I see that some expected files just don't exist:

sub-zurichSCI001/dwi/sub-zurichSCI001_dir-PA_dwi.nii.gz

sub-zurichSCI011/dwi/sub-zurichSCI011_acq-cspine_dir-PA_dwi.nii.gz

sub-zurichSCI014/dwi/sub-zurichSCI014_dir-AP_dwi.bval
sub-zurichSCI014/dwi/sub-zurichSCI014_dir-AP_dwi.bvec
sub-zurichSCI014/dwi/sub-zurichSCI014_dir-AP_dwi.nii.gz
sub-zurichSCI014/dwi/sub-zurichSCI014_dir-PA_dwi.nii.gz

sub-zurichSCI015/dwi/sub-zurichSCI015_dir-AP_dwi.bval
sub-zurichSCI015/dwi/sub-zurichSCI015_dir-AP_dwi.bvec
sub-zurichSCI015/dwi/sub-zurichSCI015_dir-AP_dwi.nii.gz
sub-zurichSCI015/dwi/sub-zurichSCI015_dir-PA_dwi.nii.gz

sub-zurichSCI016/dwi/sub-zurichSCI016_acq-cspine_dir-PA_dwi.nii.gz

I know we expect some of our other datasets to be heterogeneous, but do we expect the inspired dataset to be more uniform?

Most of the subjects have 8 dwi files.

Details
$ for sub in sub*;do cd $sub/dwi;echo "$sub $(ls -1 | wc -l)";cd ../..;done  
sub-torontoDCM001        8
sub-torontoDCM005        8
sub-torontoDCM006        8
sub-torontoDCM007        8
sub-torontoDCM008        8
sub-torontoDCM009        8
sub-torontoDCM010        8
sub-torontoDCM015        8
sub-torontoDCM017        8
sub-torontoDCM018        8
sub-torontoDCM019        8
sub-torontoDCM021        8
sub-torontoDCM022        8
sub-torontoDCM023        8
sub-torontoHC001        8
sub-torontoHC002        8
sub-torontoHC004        8
sub-torontoHC005        8
sub-torontoHC006        8
sub-torontoHC007        8
sub-torontoHC009        8
sub-torontoHC010        8
sub-torontoHC011        8
sub-torontoHC012        8
sub-torontoHC013        8
sub-torontoHC014        8
sub-torontoHC015        8
sub-torontoHC016        8
sub-torontoSCI003        8
sub-torontoSCI012        8
sub-zurichDCM001        8
sub-zurichDCM002        8
sub-zurichDCM003        8
sub-zurichDCM004        8
sub-zurichDCM005        8
sub-zurichDCM006        8
sub-zurichDCM007        8
sub-zurichDCM008        8
sub-zurichDCM009        8
sub-zurichDCM010        8
sub-zurichDCM011        8
sub-zurichDCM012        8
sub-zurichDCM013        8
sub-zurichDCM014        8
sub-zurichDCM015        8
sub-zurichDCM016        8
sub-zurichDCM017        8
sub-zurichDCM018        8
sub-zurichDCM019        8
sub-zurichDCM020        8
sub-zurichDCM021        8
sub-zurichDCM022        8
sub-zurichDCM023        8
sub-zurichDCM024        8
sub-zurichHC001        8
sub-zurichHC002        8
sub-zurichHC003        8
sub-zurichHC004        8
sub-zurichHC005        8
sub-zurichHC007        8
sub-zurichHC008        8
sub-zurichHC009        8
sub-zurichHC010        8
sub-zurichSCI001        7
sub-zurichSCI002        8
sub-zurichSCI003        8
sub-zurichSCI004        8
sub-zurichSCI005        8
sub-zurichSCI006        8
sub-zurichSCI007        8
sub-zurichSCI008        8
sub-zurichSCI010        8
sub-zurichSCI011        7
sub-zurichSCI012        8
sub-zurichSCI013        8
sub-zurichSCI014        4
sub-zurichSCI015        4
sub-zurichSCI016        7
sub-zurichSCI017        8

I checked the subjects with <8 DWI files; and DWI files are also missing in the provided input dataset for such subjects. So I think we are fine with this.

@valosekj
Copy link
Member Author

Agree; I will create these files and add them in a new commit. I also found a relevant discussion about bval and bvec files for b=0 images here.

Done in 882816cac029471ed22e7c761538d0a70551c23f commit. I also added a bash script I used to /code.

@valosekj
Copy link
Member Author

  • upload updated derivatives created in the context of the OHBM 2023 abstract

Addressed in 461634062462961b84e9c819e451d6d8ae3691ef commit.

@mguaypaq
Copy link
Member

I added two commits:

  • One commit to delete again some empty JSON files.
  • Another commit to add a .bids-validator-config.json file with the following warnings ignored:
    {"ignore": ["VOLUME_COUNT_MISMATCH", "INCONSISTENT_SUBJECTS", "INCONSISTENT_PARAMETERS", "NO_AUTHORS"]}
    

At this point, git-annex-get and bids-validator are happy, so I merged into master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants