Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The bioboxes validator task should ensure a metric TSV file is generated #205

Open
michaelbarton opened this issue Feb 17, 2017 · 6 comments

Comments

@michaelbarton
Copy link
Contributor

When #204 in completed, the bioboxes file validator should check
that the mandatory metrics file is produced by the assembly validator.

@michaelbarton
Copy link
Contributor Author

@pbelmann

evaluate a genome assembly in FASTA format using optional multiple reference
genome sequences in FASTA format

Do you use the reference assembly biobox without a reference? I forgot that we originally defined this as being optional. The GAET biobox requires a reference as it compares the two sets of genome annotations.

@michaelbarton
Copy link
Contributor Author

I also noticed that we define contig and scaffold as the possible options of the fasta value. Would the software act differently depending on what they were? This is a trivial point, however I think it's generally good to simplify the RFCs where ever possible.

@pbelmann
Copy link
Member

@pbelmann

evaluate a genome assembly in FASTA format using optional multiple reference
genome sequences in FASTA format
Do you use the reference assembly biobox without a reference? I forgot that we originally defined this as >being optional. The GAET biobox requires a reference as it compares the two sets of genome annotations.

Yes and I think we should leave it optional for evaluating assemblies where you don't have a reference.

I also noticed that we define contig and scaffold as the possible options of the fasta value. Would the software act differently depending on what they were? This is a trivial point, however I think it's generally good to simplify the RFCs where ever possible.

Well, the idea was to define the input according to the short read assembly output definition. But I think we are not using it, so I would say we can remove this and maybe also in the output short read assembler interface.

@michaelbarton
Copy link
Contributor Author

Yes and I think we should leave it optional for evaluating assemblies where you don't have a
reference.

I think is fine for QUAST but for GAET, it cannot run without a reference. Some tools might be able to generate metrics without a reference, but others will need it.

@michaelbarton
Copy link
Contributor Author

Well, the idea was to define the input according to the short read assembly output definition. But I > think we are not using it, so I would say we can remove this and maybe also in the output short
read assembler interface.

I agree. Going further it might be useful to have a list of terms we use, and what they specifically mean.

@pbelmann
Copy link
Member

Yes and I think we should leave it optional for evaluating assemblies where you don't have a
reference.

I think is fine for QUAST but for GAET, it cannot run without a reference. Some tools might be able to generate metrics without a reference, but others will need it.

We could make the reference in the reference based interface mandatory and introduce a third, reference-free interface. Quast could implement both, GAET just the reference based one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants