Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mito QC stats JSON conversion crash for bam started-runs when BAM-derived sample id deviates from rerun sample id #232

Merged
merged 6 commits into from
Sep 27, 2024

Conversation

alkc
Copy link
Contributor

@alkc alkc commented Sep 26, 2024

Description and reviewer info

Update to process and python script that converts mito sentieon coverage TSV to JSON.

The script previously required sample-id input to properly access specific fields in the TSV.
This has now been changed so that the same values are accessed by suffix, under the strong
assumption that the TSV contains data from only one sample.

Fixes #169

Type of change

  • Documentation
  • Patch
  • Minor change
  • Major change

Checklist

  • Self-review of my code
  • Update the CHANGELOG
  • Tag the latest commit (vX.Y.Z format)
  • Log samples used for testing in the Verification_samples_log Excel sheet

Minor change / patch-ish

  • bam-started wgs run completes with custom sample id and mito QC data checks out
  • wgs trio run completes started from bjorn/fastq and mito QC data checks out for all samples
  • Updated mito script fetches the correct values and outputs correct JSON to stdout
  • At least one other person has reviewed and approved my code
  • I have made corresponding changes to the documentation (software versions, etc.)

Test/review documentation

Review performed by

  • Alexander
  • Jakob
  • Paul
  • Ryan
  • Viktor

(Add if missing)

Testing performed by

  • Alexander
  • Jakob
  • Paul
  • Ryan
  • Viktor

@alkc alkc requested a review from Jakob37 September 26, 2024 07:39
bin/mito_tsv_to_json.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Jakob37 Jakob37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Just some small reflections.

bin/mito_tsv_to_json.py Outdated Show resolved Hide resolved
args = parser.parse_args()


def extract_value_by_suffix(row: dict, key_suffix: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict[str, str] ?

Copy link
Contributor Author

@alkc alkc Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. the python3 in our wgs singularity container is too old for that.

typing.Dict might work though. i'll check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could try this:

https://peps.python.org/pep-0585/

I.e. importing

from __future__ import annotations

And then

dict[str, str]

Not sure it works. But curious to know

bin/mito_tsv_to_json.py Outdated Show resolved Hide resolved
@alkc
Copy link
Contributor Author

alkc commented Sep 26, 2024

Script confirmed to work for commit 844ae20:

Singularity wgs_active.sif:/mnt/beegfs/nextflow/GIAB... > cat ./GIAB-NA24385-D1-230622-20240924_mito_coverage.tsv | cut -f 5,9                                           
GIAB-NA24385-D1-230622_mean_cvg GIAB-NA24385-D1-230622_%_above_500                                                                                                                                                                                 
24654.35        100.0                                                                                                                                                                                                                              
Singularity wgs_active.sif:/mnt/beegfs/nextflow/GIAB... > python3 /fs1/alkc/proj/nextflow_wgs/bin/mito_tsv_to_json.py ./GIAB-NA24385-D1-230622-20240924_mito_coverage.tsv
{                                                                                                                                                                                                                                                  
    "mito_qc": {                                                                                                                                                                                                                                   
        "mean_coverage": 24654.35,                                                                                                                                                                                                                 
        "pct_above_x": {                                                                                                                                                                                                                           
            "500": 100.0                                                                                                                                                                                                                           
        }                                                                                                                                                                                                                                          
    }                                                                                                                                                                                                                                              
}                                                                                           

@alkc
Copy link
Contributor Author

alkc commented Sep 27, 2024

Scrpt confirmed to work for commit 844ae20 in wgs trio.

@alkc
Copy link
Contributor Author

alkc commented Sep 27, 2024

I'm omitting scout testing here, as this part of the pipeline does not affect scout loads.

I've confirmed that the QC data for CDM is a merged parseable json for both wgs single and trio.

@alkc
Copy link
Contributor Author

alkc commented Sep 27, 2024

The update resolves one particular obstacle in bam-started pipeline runs, but there is work remaining that will have to be resolved in a separate patch: #233

@alkc alkc merged commit c8d9e11 into master Sep 27, 2024
1 check passed
@alkc alkc deleted the alkc/fix-bam-starts-fail-diff-sample-id branch September 27, 2024 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Starting from bam with modified group/sample ids fails at build_mitochondrial_qc_json
2 participants