Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extract_md5sum check #177

Merged
merged 5 commits into from
Jun 26, 2023
Merged

Add extract_md5sum check #177

merged 5 commits into from
Jun 26, 2023

Conversation

wholtz
Copy link
Contributor

@wholtz wholtz commented Jun 22, 2023

Checklist

  • Pull request details were added to HISTORY.rst

I frequently have workflows that generate *.fastq.gz files as outputs. Because gzip embeds a timestamp in the file, md5sum comparisons fail. gzip with -n can be used to omit the timestamp, but the gzipping is often part of third-party tool and not configurable. This PR adds a ungzip_md5sum option that runs md5sum on the ungzipped contents of the file.

@rhpvorderman
Copy link
Member

This is a good idea, but let's prevent the need for unxz_md5sum and unbzip2_md5sum. I opt for extract_md5sum. xopen can be used to transparently open files with all these compressions.

Then the dichotomy becomes:

  • md5sum open as is, even if it is an archive and just sum the bytes.
  • extract_md5sum extract the archive using xopen and sum those bytes.

That seems pretty maintainable and doable to me. Only minor changes to this PR are needed. Do you want to do it yourself or do you rather want me take it over?

@wholtz wholtz changed the title Add ungzip_md5sum check Add extract_md5sum check Jun 23, 2023
@wholtz
Copy link
Contributor Author

wholtz commented Jun 23, 2023

Thanks for the review @rhpvorderman, I have made the requested modifications. Feel free to tweak as needed.

@rhpvorderman
Copy link
Member

Excellent. This is exactly what I had in mind. Great feature idea and implementation. Thanks!

@rhpvorderman rhpvorderman merged commit 61f2641 into LUMC:develop Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants