Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The kputuw function is considerably faster as it encodes 2 digits at a time and also utilises __builtin_clz. This changes kputll to use the same 2 digits at a time trick. I have a __builtin_clzll variant too, but with longer numbers it's not the main bottleneck and we fall back to kputuw for small numbers. This avoids complicating the code with builtin checks and alternate versions.
An alternative, purely for sam_format1_append would be something like:
This works as BAM/CRAM only support 32-bit numbers for POS, PNEXT and TLEN anyway, so ll vs w is an irrelevant distinction. However I chose to modify the header file so it fixes other callers.
Overall compressed BAM to uncompressed SAM conversion is about 5% quicker (tested on 10 million short-read seqs; it'll be minimal on long seqs). This includes decode time and other functions too. The sam_format1_append only component of that is about 15-25% quicker depending on compiler and version.