Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create tumor/normal pairs with CNVs #76

Open
popicka opened this issue Jul 7, 2020 · 3 comments
Open

Create tumor/normal pairs with CNVs #76

popicka opened this issue Jul 7, 2020 · 3 comments

Comments

@popicka
Copy link

popicka commented Jul 7, 2020

Hi,
We are currently trying to use NEAT-genreads in order to generate realistic WGS/WES tumor and normal samples.
genReadsTumorTutorial is very clear, and we were able to generate both somatic and germline SNPs, but we are not sure how to generate somatic CNVs in tumor sample.

We would like to perform benchmark of CNV callers. Here: #30 it is mentioned that the -v parameter should be used in order to include CNVs.
Most of the CNV callers do not use VCF format, and report CNVs in BED format (most commonly like in the example below)

chr	start	end	length	copy_number
20	21655679	22029964	374286	3

What would be the recommended representation of CNVs?

Great tool!

Thank you,
Ana

@popicka
Copy link
Author

popicka commented Jul 7, 2020

We have also tested neat with CNVs in VCF format like this:

20	29956380	.	N	<DUP>	.	.	IMPRECISE;SVTYPE=DUP;END=32442249;SVLEN=2485869;FOLD_CHANGE=2.022472;FOLD_CHANGE_LOG=1.016120;PROBES=408	GT:GQ:CN:CNQ	0/1:0:5:408
20	32442749	.	N	<DUP>	.	.	IMPRECISE;SVTYPE=DUP;END=37663008;SVLEN=5220259;FOLD_CHANGE=1.349033;FOLD_CHANGE_LOG=0.431926;PROBES=772	GT:GQ:CN:CNQ	0/1:0:3:772
20	37667055	.	N	<DEL>	.	.	IMPRECISE;SVTYPE=DEL;END=62959382;SVLEN=-25292327;FOLD_CHANGE=0.812778;FOLD_CHANGE_LOG=-0.299067;PROBES=2121	GT:GQ	0/1:2121

However, golden VCF file was empty

@zstephens
Copy link
Owner

Greetings,

It has been on my todo list to facilitate different representations for input SVs, but at the moment only the standard REF/ALT format is supported. So any SV needs to be boiled down to its constituent insertions/deletions.

E.g. if you wanted to have a large duplication it would have to be formatted: chr1 1000000 A ACGTACGTACGT... where CGT... is explicitly the duplicated sequence. It's kind of a pain, I admit, but I haven't yet worked up the courage to tackle all the different <> cases yet.

-Zach

@popicka
Copy link
Author

popicka commented Jul 8, 2020

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants