Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hifiasm assembler #70

Merged
merged 14 commits into from
Feb 28, 2024
Merged

Conversation

scintilla9
Copy link
Contributor

Hi
I've made few changes to incorporate Hifiasm for long read assembler.
The environment was tested by building a local singularity sif from Dockerfile and running with -profile singularity. I can successfully generated a hifiasm-assembled genome and polished it with gcpp.

Please check if it worked, and I am not sure which branch could be used to merge, so I selected the latest developing branch.

@fmalmeida fmalmeida changed the base branch from 68-when-not-giving-genome_size-for-long-reads-message-goes-to-log-and-not-console to dev February 27, 2024 06:54
@fmalmeida
Copy link
Owner

Hi @scintilla9 ,
Many thanks for the PR.
I will review it during this week. I will probably create a new branch which I can merge your code before merging to dev, so that, if I have to modify something due conflicts, I can easily do so.

Thanks.

@fmalmeida
Copy link
Owner

One a very first (fast) assessment, the code looks awesome, fitting perfectly with the pipe.
So, I will focus in reviewing it and merging it during this week, so I can roll-out a new release in the next 14 days. There is already quite some important fixes I added in dev which I would like to roll out asap.

@fmalmeida fmalmeida self-requested a review February 27, 2024 07:01
@fmalmeida fmalmeida added enhancement New feature or request new tool priority labels Feb 27, 2024
@fmalmeida fmalmeida linked an issue Feb 27, 2024 that may be closed by this pull request
@fmalmeida fmalmeida removed their request for review February 27, 2024 07:02
@fmalmeida
Copy link
Owner

Hi @scintilla9 ,
Which testing data did you use?
With the 15X subsets I have, it does not work. It generates an empty assembly file most likely due non-sufficient data.
I would like to generate at least two full runs before merging, so I can confidently make the small testing profiles skip this assembler.
😄

@scintilla9
Copy link
Contributor Author

Hi @fmalmeida ,
I tested on my own pacbio data.
After converting bam into CCS reads with ngs-preprocess, it has about 66M bases, and yes, a very high coverage dataset for the target species (~1200X).

@fmalmeida
Copy link
Owner

Hi @scintilla9 so I will try to find something public to test it. Thx.

Another thing is, maybe would be worthy to do one of the following?

  • Either make it skipped by default so the user can turn it on if needed?
  • Or only run if the parameters saying that long reads are corrected or high quality are true?
  • Another possibility?

I have some considerations on the sense that, if the tool is proper only for high quality reads and would fail on the more general (frequent) runs, then, would make sense to not always run it for everything.

What do you think? Any ideas?

@scintilla9
Copy link
Contributor Author

Hi @fmalmeida
According to the instruction of Hifiasm, typically having at least 13X coverage with HiFi reads should be sufficient. Not sure why assembly failed when using a 15X subset.

I think the first option will be more straightforward, however, will the inconsistency of having most of the assemblers were run by default while one skippied by default confuse users?

@fmalmeida
Copy link
Owner

My subset is 15X but is not hifi, they are normal reads.
And yes, I think it may confuse users so that is why I am wondering on what.
If (by the name) it is only for hifi, maybe the second scenario would be the most appropriate to skip it when not running with high quality long reads.

@fmalmeida
Copy link
Owner

Code has been working fine. Docs were updated. Testing profile for hifi reads subset has been added.
So, I will merge this to dev, and start running some testings with full datasets to guarantee it is proper for a new release.

Finally, now that hifiasm is included, I will open follow-up tickets to investigate how the pipeline can be modified to allow one to also pass on parental data or hi-c data as specified in the tool's webpage.

@fmalmeida fmalmeida merged commit 11004e1 into fmalmeida:dev Feb 28, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add hifiasm for long reads assemble
2 participants