Skip to content

Latest commit

 

History

History
51 lines (41 loc) · 1.95 KB

README.mason_materializer.md

File metadata and controls

51 lines (41 loc) · 1.95 KB

Mason Materializer

Overview

The program mason_materializer takes a reference FASTA file and a VCF variant file and applies the variants to the reference. The input VCF can contain all variants that can be generated by the mason_variator program.

This functionality is useful if you want to look at the actual sequence of a personalized genome, for example, or when applying the variants from a previously simulated VCF file to put it into an external read simulator.

Examples

There are example files in the examples directory.

Help

The command:

  $ mason_materializer --help

prints the help for Mason Materializer.

Materializing a VCF file.

We take the files adeno_virus.fa and apply the VCF file adeno_virus.vcf to it. We write the resulting FASTA file to adeno_out.fa

  $ mason_materializer -ir adeno_virus.fa -iv adeno_virus.vcf -o adeno_out.fa
  ...
  $ head adeno_out.fa
  >gi|56160436|ref|AC_000005.1|/1
  CCTATCTAATAATTTACCTTATACTGGACTAGTGCCAATATTAAAATGAAGTGGGCGTAGTGTGTAATTT
  GATTGGGTGGAGGTGTGGCTTTGGCGTGCTTGTAAGTTTGGGCGGATGAGGAAGTGGGGCGCGGCGTGGG
  AGCCGGGCGCGCCGGATGTGACGTTTTAGACGCCATTTTACACGGAAATGATGTTTTTTGGGCGTTGTTT
  GTGCAAATTTTGTGTTTTAGGCGCGAAAACTGAAATGCGGAAGTGAAAATTGATGACGGCAATTTTATTA
  TAGGCGCGGAATATTTACCGAGGGCAGAGTGAACTCTGAGCCTCTACGTGTGGGTTTCGATACGTGAGCG
  ACGGGGAAACTCCACGTTGGCGCTCAAAGGGCGCGTTTATTGTTCTGTCAGCTGATCGTTTGGGTATTTA
  ATGCCGCCGTGTTCGTCAAGAGGCCACTCTTGAGTGCCAGCGAGAAGAGTTTTCTCTGCCAGCTCATTTT
  CACGGCGCCATTATGAGAACTGAAATGACTCCCTTGGTCCTGTCGTATCAGGAAGCTGACGACATATTGG
  AGCATTTGGTGGACAACTTTTTTAACGAGGTACCCAGTGATGATGATCTTTATGTTCCGTCTCTTTACGA
  $ grep '^>gi' adeno_out.fa
  >gi|56160436|ref|AC_000005.1|/1

Note that there is only one haplotype described in the VCF file. mason_materializer generates a haplotype into the output FASTA file for each haplotype in the VCF and sequence in the reference file. The name of the haplotype is the reference name with a suffix consisting of a dash and the number of the haplotype.