Small pipeline to call methylation in Streptococcus suis using standard PacBio tools
This pipeline was constructed to call m6A and m4C methylation in Streptococcus suis mutants. These mutants should differ in their methylation status, which we wanted to investigate using PacBio. For an introduction on identification of methylation using PacBio data, see this Nature Methods paper.
This pipeline has the following requirements:
- Snakemake and conda are available on your system
- ipdSummary and MultiMotifMaker are installed on your system. This is not possible through conda. See https://www.pacb.com/support/software-downloads/ for ipdSummary (part of smrttools) and https://github.com/bioinfomaticsCSU/MultiMotifMaker for MultiMotifMaker. Update paths to executables as appropriate.
- PacBio read data is available in fasta format in the directory
pacbio_fa
and paired-end Illumina data is available in gzipped fastq format in the directoryillumina_fastq
- Enough computational power. The analysis was originally run on a 96 GB RAM computing node with 16 CPUs. This worked alright.
Unicycler and pbmm2 will be installed through conda, integrated in Snakemake.
This pipeline combines several tools to end up with motifs associated with particular methylation. The tools and their functions are:
Tool | Purpose | Outputs | Link |
---|---|---|---|
Unicycler | Hybrid assembly of complete genome | Assembly fasta (asembly.fasta ) and some quality control files |
https://github.com/rrwick/Unicycler |
pbmm2 | Mapping of PacBio reads to assembly | Bam file ("native PacBio" format) | https://github.com/PacificBiosciences/pbmm2 |
ipdSummary | Identification of m6A, m4C and unknown modifications from mapped reads | Modifications file in GFF3 format and an exhaustive overview of kinetics per nt in csv format | https://github.com/PacificBiosciences/kineticsTools |
MultiMotifMaker | Identification of motifs associated with modifications | Csv file containing motifs | https://github.com/bioinfomaticsCSU/MultiMotifMaker |
The final csv file containing motifs is the main output.