-
Notifications
You must be signed in to change notification settings - Fork 3
Output files
RLM offers multiple different output files that can be requested all together or separately. Which output files will be computed can be set via the -s
option. This option can be set to either single_read
, entropy
, pdr
or all
.
The single_read
option only writes an output file containing information for every read or read pair that passed all filters and can be considered for read level analysis. This file gets written for every score option since the information it contains needs to be computed for all downstream scores. It has the following tab-delimited format:
#chr start end read_name CpG_pattern n_CpGs n_CpGs_methyl discordance_score transitions_score mean_methylation
chr_test 2384 2518 A00442:HFH2KDSXX190418:HFH2KDSXX:1:1108:19334:12555 gGg 3 1 1 1 0.333333
chr_test 5286 5421 A00442:HFH2KDSXX190418:HFH2KDSXX:1:1103:6334:22670 GGG 3 3 0 0 1
chr_test 7418 7553 A00442:HFH2KDSXX190418:HFH2KDSXX:1:1108:17463:25692 gGGG 4 3 1 0.333333 0.75
chr_test 7444 7579 A00442:HFH2KDSXX190418:HFH2KDSXX:1:1104:11957:9768 gGgg 4 1 1 0.666667 0.25
For every read, 10 different fields are reported:
- The chromosome the read aligned to.
- The start position of the read with respect to the chromosome (0-based, half-open intervals).
- The end position of the read with respect to the chromosome (0-based, half-open intervals).
- The read name. This will be the same for mates of the same pair.
- The methylation pattern for all CpGs spanned by the read. Capital
G
indicates methylation, lower caseG
refers to unmethylated CpGs. - The number of CpGs spanned by the read.
- The number of methylated CpGs spanned by the read.
- The discordance score of the read (0 if all CpGs are either unmethylated or methylated, 1 otherwise).
- The transition score of the read (how often does the pattern switch from methylated to unmethylated for consecutive CpGs normalized by the possible number of transitions n - 1)
- The mean methylation of the read based on all CpGs spanned by it.
Note: If trimming of reads in the RRBS mode is enabled, the start and end position of the reads will match the sequence considered for RLM and will be truncated either at the 3' end (reads originating from the original forward/reverse strand) or 5' end (reads originating from the reverse complement of the original forward/reverse strand).
When choosing entropy
as score option, additionally to the single read output file another file will be written of the following form:
#chr start end entropy epipolymorphism gggg gggG ggGg ggGG gGgg gGgG gGGg gGGG Gggg GggG GgGg GgGG GGgg GGgG GGGg GGGG mean_methylation coverage
chr_test 7390304 7390306 0.663386 0.792899 5 2 1 1 1 1 0 1 0 0 0 1 0 0 0 0 0.269231 13
chr_test 7390619 7390621 0.42511 0.579882 8 0 1 0 2 0 0 0 1 0 0 1 0 0 0 0 0.134615 13
chr_test 7390646 7390648 0.584893 0.764444 6 2 0 0 2 1 0 2 2 0 0 0 0 0 0 0 0.233333 15
chr_test 7390665 7390667 0.508735 0.662722 7 0 2 0 0 0 0 0 1 0 0 1 0 0 1 1 0.25 13
For every 4-mer of consecutive CpGs that are spanned by a user-defined minimum number of reads (default: 10), the following fields are reported:
- The chromosome.
- The start position of the first CpG in the 4-mer (0-based, half-open intervals).
- The end position of the first CpG in the 4-mer (0-based, half-open intervals).
- The methylation entropy calculated for the 4-mer based on the reads that span the complete 4-mer. For more information on this score see Xie et al.
- The methylation epipolymorphism calculated for the 4-mer based on the reads that span the complete 4-mer. For more information on this score see Landan et al.
- The count of reads for all possible 16 epialleles that underlay the entropy and epipolymorphism calculations (16 columns, the header defines the epiallele per column. Capital
G
indicates methylation, lower caseG
refers to unmethylated CpGs). - The mean methylation of the 4-mer based on all 4 CpGs across all considered reads. This might slightly deviate from the value that can be calculated by standard methylation calling since RLM excludes certain reads that might be considered by standard methylation callers such as reads with indels, low quality reads, etc.
- The coverage defined as the number of reads considered that span the complete 4-mer.
4-mers are reported using the first CpG as position in order to allow creating browser tracks but the value refers to the complete 4-mer starting with this CpG.
When choosing pdr
as score option, additionally to the single read output file another file will be written of the following form:
#chr start end PDR RTS mean_methylation coverage
chr_test 7390271 7390273 0.727273 0.360606 0.636364 11
chr_test 7390275 7390277 0.727273 0.360606 0.363636 11
chr_test 7390304 7390306 0.655172 0.366667 0.103448 29
chr_test 7390346 7390348 0.619048 0.372222 0.214286 42
For every CpG that is spanned by a user-defined minimum number of reads (default: 10), the following fields are reported:
- The chromosome.
- The start position of the CpG (0-based, half-open intervals).
- The end position of the CpG (0-based, half-open intervals).
- The percent of discordant reads (PDR) calculated based on the reads that span the CpG. The number of discordant reads (neither completely unmethylated nor completely methylated reads) is normalized by the total number of considered reads. For more information on this score see Landau et al.
- The average read transition score (RTS) calculated based on the reads that span the CpG. The transition score per read (see single read output) normalized by the total number of reads spanning the CpG. For more information on this score see Charlton et al.
- The mean methylation of CpG across all considered reads. This might slightly deviate from the value that can be calculated by standard methylation calling since RLM excludes certain reads that might be considered by standard methylation callers such as reads with indels, low quality reads, etc.
- The coverage defined as the number of reads considered that span the CpG.
When choosing all
as score option, all three output files (single read, entropy and pdr) will be created.