You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use TRGT to call short repeat expansions from PacBio HiFi genomes. We would like to develop a 'database' that tells us the minimum and maximum repeat size we've seen in our samples for each repeat locus.
This will be a text file that contains the minimum and maximum repeat size for each repeat, as well as sample names corresponding to those carrying the minimum and maximum allele sizes. Example line:
repeat
min_size
max_size
min_size_sample
max_size_sample
chr10_100000834_100000912_A
50
80
HG00639
HG00099
You may generate this from HPRC (human pangenome reference consortium) and C4R TRGT VCFs in: /hpf/largeprojects/ccmbio/ccmmarvin_shared/pacbio_longread/TRGT/proband_only_workflow/HPRC-C4R-VCFs. OR, generate from this text file: /hpf/largeprojects/ccmbio/mcouse/pacbio_report_dev/results/test_outlier_expansions_full/repeat_outliers/sorted_alleles_db.gz, which was derived from the HPRC and C4R VCFs (probably easiest to start here)
See section 'Find outliers' in this notebook for possible inspiration on how to iterate through/handle the file.
Note: HPRC TRGT VCFs came from Egor. We do not have HPRC BAMs on the hpf.
The text was updated successfully, but these errors were encountered:
We use TRGT to call short repeat expansions from PacBio HiFi genomes. We would like to develop a 'database' that tells us the minimum and maximum repeat size we've seen in our samples for each repeat locus.
This will be a text file that contains the minimum and maximum repeat size for each repeat, as well as sample names corresponding to those carrying the minimum and maximum allele sizes. Example line:
You may generate this from HPRC (human pangenome reference consortium) and C4R TRGT VCFs in: /hpf/largeprojects/ccmbio/ccmmarvin_shared/pacbio_longread/TRGT/proband_only_workflow/HPRC-C4R-VCFs. OR, generate from this text file: /hpf/largeprojects/ccmbio/mcouse/pacbio_report_dev/results/test_outlier_expansions_full/repeat_outliers/sorted_alleles_db.gz, which was derived from the HPRC and C4R VCFs (probably easiest to start here)
See section 'Find outliers' in this notebook for possible inspiration on how to iterate through/handle the file.
Note: HPRC TRGT VCFs came from Egor. We do not have HPRC BAMs on the hpf.
The text was updated successfully, but these errors were encountered: