Skip to content

Commit

Permalink
Description of output files. Part II
Browse files Browse the repository at this point in the history
  • Loading branch information
ksenia-krasheninnikova authored Nov 14, 2023
1 parent c2de1a5 commit 4fa07c4
Showing 1 changed file with 32 additions and 31 deletions.
63 changes: 32 additions & 31 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,11 @@ This subworkflow generates a KMER database and coverage model used in [PURGE_DUP
<details markdown="1">
<summary>Output files</summary>

- <code>\*hifiasm\*/*p_ctg.[g]fa</code>
- <code>.\*hifiasm.\*/.*p_ctg.[g]fa</code>
- primary assembly in GFA and FASTA format; for more details refer to [hifiasm output](https://hifiasm.readthedocs.io/en/latest/interpreting-output.html)
- <code>\*hifiasm\*/a_ctg.[g]fa</code>
- <code>.\*hifiasm.\*/.*a_ctg.[g]fa</code>
- haplotigs in GFA and FASTA format; for more details refer to [hifiasm output](https://hifiasm.readthedocs.io/en/latest/interpreting-output.html)
- <code>\*hifiasm\*/*bin</code>
- <code>.\*hifiasm.\*/.*bin</code>
- internal binary hifiasm files; for more details refer [here](https://hifiasm.readthedocs.io/en/latest/faq.html#id12)

</details>
Expand All @@ -60,9 +60,9 @@ In case hifiasm HiC mode is switched on, it is performed as an extra step with r
<details markdown="1">
<summary>Output files</summary>

- <code>\*.hifiasm.\*/purged.fa</code>
- <code>\*.hifiasm..\*/purged.fa</code>
- purged primary contigs
- <code>\*.hifiasm.\*/purged.htigs.fa</code>
- <code>\*.hifiasm..\*/purged.htigs.fa</code>
- haplotigs after purging
- other files from the purge_dups pipeline
- for details refer [here](https://github.com/dfguan/purge_dups)
Expand All @@ -79,13 +79,13 @@ The subworkflow relies on kmer coverage model to identify coverage thresholds. F
<details markdown="1">
<summary>Output files</summary>

- <code>polishing/*consensus.fa</code>
- <code>\*.hifiasm..\*/polishing/.*consensus.fa</code>
- polished joined primary and haplotigs assembly
- <code>polishing/merged.vcf.gz</code>
- <code>\*.hifiasm..\*/polishing/merged.vcf.gz</code>
- unfiltered variants
- <code>polishing/merged.vcf.gz.tbi</code>
- <code>\*.hifiasm..\*/polishing/merged.vcf.gz.tbi</code>
- index file
- <code>polishing/refdata-*</code>
- <code>\*.hifiasm..\*/polishing/refdata-*</code>
- Longranger assembly indices


Expand All @@ -100,19 +100,10 @@ This subworkflow uses read mapping of the Illumina 10X short read data to fix sh
<details markdown="1">
<summary>Output files</summary>

- <code>bed</code>
- <code>\*.hifiasm..\*/scaffolding/.*_merged_sorted.bed</code>
- bed file obtained from merged mkdup bam
- <code>cram</code>
- reads mapped to the reference
- <code>crai</code>
- index file for the mapped cram
- <code>stats</code>
- see [`CONVERT_STATS`](#convert_stats) output section
- <code>idxstats</code>
- output of samtools stats
- <code>flagstat</code>
- output of samtools flagstat

- <code>\*.hifiasm..\*/scaffolding/.*mkdup.bam</code>
- final read mapping bam with mapped reads
</details>

This subworkflow implements alignment of the Illumina HiC short reads to the primary assembly. Uses [`CONVERT_STATS`](#convert_stats) as internal subworkflow to calculate read mapping stats.</p>
Expand All @@ -124,11 +115,11 @@ This subworkflow implements alignment of the Illumina HiC short reads to the pri

<details markdown="1">
<summary>Output files</summary>
- <code>stats</code>
- <code>\*.hifiasm..\*/scaffolding/.*.stats</code>
- output of samtools stats
- <code>idxstats</code>
- <code>\*.hifiasm..\*/scaffolding/.*.idxstats</code>
- output of samtools idxstats
- <code>flagstat</code>
- <code>\*.hifiasm..\*/scaffolding/.*.flagstat</code>
- output of samtools flagstat
</details>

Expand All @@ -138,8 +129,18 @@ This subworkflow produces statistcs for a bam file containing read mapping. It i
<details markdown="1">
<summary>Output files</summary>

- <code>scaffolds</code>
- <code>\*.hifiasm..\*/scaffolding/yahs/out.break.yahs/out_scaffolds_final.fa</code>
- scaffolds in FASTA format
- <code>\*.hifiasm..\*/scaffolding/yahs/out.break.yahs/out_scaffolds_final.agp</code>
- coordinates of contigs relative to scaffolds
- <code>\*.hifiasm..\*/scaffolding/yahs/out.break.yahs/alignments_sorted.txt</code>
- Alignments for Juicer in text format
- <code>\*.hifiasm..\*/scaffolding/yahs/out.break.yahs/yahs_scaffolds.hic</code>
- Juicer HiC map
- <code>\*.hifiasm..\*/scaffolding/yahs/out.break.yahs/*cool</code>
- HiC map for cooler
- <code>\*.hifiasm..\*/scaffolding/yahs/out.break.yahs/*.FullMap.png</code>
- Pretext snapshot

</details>
The subworkflow performs scaffolding of the primary contigs using HiC mapping generated in [`HIC_MAPPING`](hic_mapping). It also performs some postprocessing steps such as generating cooler and pretext files</p>
Expand All @@ -151,11 +152,11 @@ The subworkflow performs scaffolding of the primary contigs using HiC mapping ge
<details markdown="1">
<summary>Output files</summary>

- <code>*.assembly_summary</code>
- <code>.*.assembly_summary</code>
- numeric statistics for pri and alt sequences
- <code>*ccs.merquryk</code>
- <code>.*ccs.merquryk</code>
- folder with merqury plots and kmer statistics
- <code>*busco</code>
- <code>.*busco</code>
- folder with BUSCO results

</details>
Expand All @@ -169,11 +170,11 @@ This subworkflow is used to evaluate the quality of sequences. It is performed a
<details markdown="1">
<summary>Output files</summary>

- <code>final_mitogenome.fasta</code>
- <code>\*.hifiasm.\*/mito..*/final_mitogenome.fasta</code>
- organelle assembly
- <code>final_mitogenome.[gb,gff]</code>
- <code>\*.hifiasm.\*/mito..*/final_mitogenome.[gb,gff]</code>
- organelle gene annotation
- <code>contigs_stats.tsv</code>
- <code>\*.hifiasm.\*/mito..*/contigs_stats.tsv</code>
- summary of mitochondrial findings
- output also includes other output files produced by MitoHiFi

Expand Down

0 comments on commit 4fa07c4

Please sign in to comment.