The program mason_materializer
takes a reference FASTA file and a VCF variant file and
applies the variants to the reference. The input VCF can contain all variants
that can be generated by the mason_variator
program.
This functionality is useful if you want to look at the actual sequence of a personalized genome, for example, or when applying the variants from a previously simulated VCF file to put it into an external read simulator.
There are example files in the examples
directory.
The command:
$ mason_materializer --help
prints the help for Mason Materializer.
We take the files adeno_virus.fa and apply the VCF file adeno_virus.vcf to it. We write the resulting FASTA file to adeno_out.fa
$ mason_materializer -ir adeno_virus.fa -iv adeno_virus.vcf -o adeno_out.fa
...
$ head adeno_out.fa
>gi|56160436|ref|AC_000005.1|/1
CCTATCTAATAATTTACCTTATACTGGACTAGTGCCAATATTAAAATGAAGTGGGCGTAGTGTGTAATTT
GATTGGGTGGAGGTGTGGCTTTGGCGTGCTTGTAAGTTTGGGCGGATGAGGAAGTGGGGCGCGGCGTGGG
AGCCGGGCGCGCCGGATGTGACGTTTTAGACGCCATTTTACACGGAAATGATGTTTTTTGGGCGTTGTTT
GTGCAAATTTTGTGTTTTAGGCGCGAAAACTGAAATGCGGAAGTGAAAATTGATGACGGCAATTTTATTA
TAGGCGCGGAATATTTACCGAGGGCAGAGTGAACTCTGAGCCTCTACGTGTGGGTTTCGATACGTGAGCG
ACGGGGAAACTCCACGTTGGCGCTCAAAGGGCGCGTTTATTGTTCTGTCAGCTGATCGTTTGGGTATTTA
ATGCCGCCGTGTTCGTCAAGAGGCCACTCTTGAGTGCCAGCGAGAAGAGTTTTCTCTGCCAGCTCATTTT
CACGGCGCCATTATGAGAACTGAAATGACTCCCTTGGTCCTGTCGTATCAGGAAGCTGACGACATATTGG
AGCATTTGGTGGACAACTTTTTTAACGAGGTACCCAGTGATGATGATCTTTATGTTCCGTCTCTTTACGA
$ grep '^>gi' adeno_out.fa
>gi|56160436|ref|AC_000005.1|/1
Note that there is only one haplotype described in the VCF file.
mason_materializer
generates a haplotype into the output FASTA file for each
haplotype in the VCF and sequence in the reference file. The name of the
haplotype is the reference name with a suffix consisting of a dash and the
number of the haplotype.