Skip to content

Commit

Permalink
Description of output files. Part I
Browse files Browse the repository at this point in the history
  • Loading branch information
ksenia-krasheninnikova authored Nov 14, 2023
1 parent 15ac1c4 commit c2de1a5
Showing 1 changed file with 19 additions and 18 deletions.
37 changes: 19 additions & 18 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,21 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) DSL2.


### PREPARE_INPUT
Here the input YAML is being processed. Thr subworkflow generate the input channels used as by the other subworkflows.</p>
Here the input YAML is being processed. This subworkflow generates the input channels used as by the other subworkflows.</p>


### GENOMESCOPE_MODEL
<details markdown="1">
<summary>Output files</summary>

- <code>model</code>
- kmer coverage model
- <code>ktab</code>
- <code>kmer/*ktab</code>
- kmer table file
- <code>hist</code>
- <code>kmer/*hist</code>
- kmer histogram file
- <code>kmer/*model.txt</code>
- genomescope model in text format
- <code>kmer/*[linear,log]_plot.png</code>
- genomescope kmer plots

</details>

Expand All @@ -39,19 +41,17 @@ This subworkflow generates a KMER database and coverage model used in [PURGE_DUP
<details markdown="1">
<summary>Output files</summary>

- <code>primary_contigs</code>
- primary assembly in FASTA format
- <code>alternate_contigs</code>
- haplotigs in FASTA format
- <code>primary_hic_contigs</code>
- primary assembly in FASTA format for hifiasm-hic mode
- <code>alternate_hic_contigs</code>
- haplotigs in FASTA format for hifiasm-hic mode
- <code>\*hifiasm\*/*p_ctg.[g]fa</code>
- primary assembly in GFA and FASTA format; for more details refer to [hifiasm output](https://hifiasm.readthedocs.io/en/latest/interpreting-output.html)
- <code>\*hifiasm\*/a_ctg.[g]fa</code>
- haplotigs in GFA and FASTA format; for more details refer to [hifiasm output](https://hifiasm.readthedocs.io/en/latest/interpreting-output.html)
- <code>\*hifiasm\*/*bin</code>
- internal binary hifiasm files; for more details refer [here](https://hifiasm.readthedocs.io/en/latest/faq.html#id12)

</details>

Raw assembly(-ies) is generated here. hifiasm is run on the input HiFi reads then raw contigs are converted from GFA into FASTA format.
In case hifiasm HiC mode is switched on tun hifiasm with HiC data</p>
This subworkflow generates a raw assembly(-ies). First, hifiasm is run on the input HiFi reads then raw contigs are converted from GFA into FASTA format, this assembly is due to purging, polishing (optional) and scaffolding further down the pipeline.
In case hifiasm HiC mode is switched on, it is performed as an extra step with results stored in hifiasm-hic folder.</p>

![Raw assembly subworkflow](https://raw.githubusercontent.com/sanger-tol/genomeassembly/documentation/docs/images/v1/raw_assembly.png)

Expand All @@ -60,11 +60,12 @@ In case hifiasm HiC mode is switched on tun hifiasm with HiC data</p>
<details markdown="1">
<summary>Output files</summary>

- <code>pri</code>
- <code>\*.hifiasm.\*/purged.fa</code>
- purged primary contigs
- <code>alt</code>
- <code>\*.hifiasm.\*/purged.htigs.fa</code>
- haplotigs after purging

- other files from the purge_dups pipeline
- for details refer [here](https://github.com/dfguan/purge_dups)
</details>

Retained haplotype is identified in primary assembly. The alternate contigs are updated correspondingly.
Expand Down

0 comments on commit c2de1a5

Please sign in to comment.