Skip to content

Output files

sarahet edited this page Aug 10, 2021 · 10 revisions

RLM offers multiple different output files that can be requested all together or separately. Which output files will be computed can be set via the -s option. This option can be set to either single_read, entropy, pdr or all.

Single read output

The single_read option only writes an output file containing information for every read or read pair that passed all filters and can be considered for read level analysis. This file gets written for every score option since the information it contains needs to be computed for all downstream scores. It has the following tab-delimited format:

#chr	start	end	read_name	CpG_pattern	n_CpGs	n_CpGs_methyl	discordance_score	transitions_score	mean_methylation
chr_test	2384	2518	A00442:HFH2KDSXX190418:HFH2KDSXX:1:1108:19334:12555	gGg	3	1	1	1	0.333333
chr_test	5286	5421	A00442:HFH2KDSXX190418:HFH2KDSXX:1:1103:6334:22670	GGG	3	3	0	0	1
chr_test	7418	7553	A00442:HFH2KDSXX190418:HFH2KDSXX:1:1108:17463:25692	gGGG	4	3	1	0.333333	0.75
chr_test	7444	7579	A00442:HFH2KDSXX190418:HFH2KDSXX:1:1104:11957:9768	gGgg	4	1	1	0.666667	0.25

For every read, 10 different fields are reported:

  1. The chromosome the read aligned to.
  2. The start position of the read with respect to the chromosome (0-based, half-open intervals).
  3. The end position of the read with respect to the chromosome (0-based, half-open intervals).
  4. The read name. This will be the same for mates of the same pair.
  5. The methylation pattern for all CpGs spanned by the read. Capital G indicates methylation, lower case G refers to unmethylated CpGs.
  6. The number of CpGs spanned by the read.
  7. The number of methylated CpGs spanned by the read.
  8. The discordance score of the read (0 if all CpGs are either unmethylated or methylated, 1 otherwise).
  9. The transition score of the read (how often does the pattern switch from methylated to unmethylated for consecutive CpGs normalized by the possible number of transitions n - 1)
  10. The mean methylation of the read based on all CpGs spanned by it.

Note: If trimming of reads in the RRBS mode is enabled, the start and end position of the reads will match the sequence considered for RLM and will be truncated either at the 3' end (reads originating from the original forward/reverse strand) or 5' end (reads originating from the reverse complement of the original forward/reverse strand).

Entropy output

When choosing entropy as score option, additionally to the single read output file another file will be written of the following form:

#chr	start	end	entropy	epipolymorphism	gggg	gggG	ggGg	ggGG	gGgg	gGgG	gGGg	gGGG	Gggg	GggG	GgGg	GgGG	GGgg	GGgG	GGGg	GGGG	mean_methylation	coverage
chr_test	7390304	7390306	0.663386	0.792899	5	2	1	1	1	1	0	1	0	0	0	1	0	0	0	0	0.269231	13
chr_test	7390619	7390621	0.42511	0.579882	8	0	1	0	2	0	0	0	1	0	0	1	0	0	0	0	0.134615	13
chr_test	7390646	7390648	0.584893	0.764444	6	2	0	0	2	1	0	2	2	0	0	0	0	0	0	0	0.233333	15
chr_test	7390665	7390667	0.508735	0.662722	7	0	2	0	0	0	0	0	1	0	0	1	0	0	1	1	0.25	13

For every 4-mer of consecutive CpGs that are spanned by a user-defined minimum number of reads (default: 10), the following fields are reported:

  1. The chromosome.
  2. The start position of the first CpG in the 4-mer (0-based, half-open intervals).
  3. The end position of the first CpG in the 4-mer (0-based, half-open intervals).
  4. The methylation entropy calculated for the 4-mer based on the reads that span the complete 4-mer. For more information on this score see Xie et al.
  5. The methylation epipolymorphism calculated for the 4-mer based on the reads that span the complete 4-mer. For more information on this score see Landan et al.
  6. The count of reads for all possible 16 epialleles that underlay the entropy and epipolymorphism calculations (16 columns, the header defines the epiallele per column. Capital G indicates methylation, lower case G refers to unmethylated CpGs).
  7. The mean methylation of the 4-mer based on all 4 CpGs across all considered reads. This might slightly deviate from the value that can be calculated by standard methylation calling since RLM excludes certain reads that might be considered by standard methylation callers such as reads with indels, low quality reads, etc.
  8. The coverage defined as the number of reads considered that span the complete 4-mer.

4-mers are reported using the first CpG as position in order to allow creating browser tracks but the value refers to the complete 4-mer starting with this CpG.

PDR output

When choosing pdr as score option, additionally to the single read output file another file will be written of the following form:

#chr	start	end	PDR	RTS	mean_methylation	coverage
chr_test	7390271	7390273	0.727273	0.360606	0.636364	11
chr_test	7390275	7390277	0.727273	0.360606	0.363636	11
chr_test	7390304	7390306	0.655172	0.366667	0.103448	29
chr_test	7390346	7390348	0.619048	0.372222	0.214286	42

For every CpG that is spanned by a user-defined minimum number of reads (default: 10), the following fields are reported:

  1. The chromosome.
  2. The start position of the CpG (0-based, half-open intervals).
  3. The end position of the CpG (0-based, half-open intervals).
  4. The percent of discordant reads (PDR) calculated based on the reads that span the CpG. The number of discordant reads (neither completely unmethylated nor completely methylated reads) is normalized by the total number of considered reads. For more information on this score see Landau et al.
  5. The average read transition score (RTS) calculated based on the reads that span the CpG. The transition score per read (see single read output) normalized by the total number of reads spanning the CpG. For more information on this score see Charlton et al.
  6. The mean methylation of CpG across all considered reads. This might slightly deviate from the value that can be calculated by standard methylation calling since RLM excludes certain reads that might be considered by standard methylation callers such as reads with indels, low quality reads, etc.
  7. The coverage defined as the number of reads considered that span the CpG.

All output

When choosing all as score option, all three output files (single read, entropy and pdr) will be created.

Clone this wiki locally