Skip to content

Commit

Permalink
Update to MemoryViews 0.2
Browse files Browse the repository at this point in the history
  • Loading branch information
jakobnissen committed Jul 3, 2024
1 parent 53cdff7 commit cdd6e24
Show file tree
Hide file tree
Showing 12 changed files with 296 additions and 149 deletions.
5 changes: 5 additions & 0 deletions .github/ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Thank you for making an issue.
If you are submitting a bug report, it will help us if you include the following information:

- Your version of Julia and all packages in your activated Julia environment
- A small example that demonstrates the bug. If possible, please make the code copy-pastable into a fresh REPL.
7 changes: 7 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Thank you for your contribution!
If you have any questions about your PR, or need help completing it, you can ping the maintainers of this repository, who will be happy to help if they can find time.

You can optionally use the following checklist when you work on your PR:
- [ ] I have updated any relevant documentation and docstrings.
- [ ] I have added unit tests, and the CodeCov bot shows tests cover my new code.
- [ ] I have mentioned my changes in the CHANGELOG.md file.
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ version = "0.1.0"
authors = ["Jakob Nybo Nissen <[email protected]>"]

[deps]
MemViews = "a791c907-b98b-4e44-8f4d-e4c2362c6b2f"
MemoryViews = "a791c907-b98b-4e44-8f4d-e4c2362c6b2f"
StringViews = "354b36f9-a18e-4713-926e-db85100087ba"

[compat]
FormatSpecimens = "1.1.0"
MemViews = "0.2"
MemoryViews = "0.2"
StringViews = "1.3.3"

[extras]
Expand Down
5 changes: 4 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
MemViews = "a791c907-b98b-4e44-8f4d-e4c2362c6b2f"
MemoryViews = "a791c907-b98b-4e44-8f4d-e4c2362c6b2f"
XAMAuxData = "e99d641e-1821-45d7-9150-ecb7bf333fe1"

[sources]
XAMAuxData = {path = ".."}
2 changes: 1 addition & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ using Documenter, XAMAuxData

meta = quote
using XAMAuxData: SAM, BAM, AuxTag, Hex, Errors, Error
using MemViews: MemView
using MemoryViews: MemoryView
line = "AK:z:some string\ts1:i:2512\tst:A:+\tas:f:211.2\tar:B:c3,-16,21,-100"
end

Expand Down
97 changes: 43 additions & 54 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
CurrentModule = XAMAuxData
DocTestSetup = quote
using XAMAuxData: BAM, SAM, AuxTag, Hex, Errors, Error
using MemViews: MemView
using MemoryViews: MemoryView
end
```

Expand All @@ -17,20 +17,22 @@ PAF and GFA files. `BAM` is used for the binary encoded aux format in BAM files.
Most examples in this documentation will use the SAM format, since it's more human readable.
Any differences to the BAM format will be explicitly mentioned.

!!! note
Annoyingly, the specification of GFA auxiliary fields differ slightly from that of SAM
auxiliary fields. Hence, in the future, a dedicated GFA module may be introduced.

The single auxiliary field `AN:i:1234` is encoded as the key-value pair `AuxTag("AN") => 1234`.
A collection of aux fields are represented by a `SAM.Auxiliary` (or `BAM.Auxiliary`), which are instances of `AbstractDict{AuxTag, Any}`.
A collection of aux fields are represented by a `SAM.Auxiliary` (or `BAM.Auxiliary`), which are subtypes of `AbstractDict{AuxTag, Any}`.

The package may be used like this:
```jldoctest
# Import the module you want to use
using XAMAuxData: SAM
julia> using XAMAuxData: SAM
data = "AN:A:z\ta1:Z:abc def \tkv:i:-25234\tzz:f:-14.466e-3"
aux = SAM.Auxiliary(data)
aux
julia> data = "AN:A:z\ta1:Z:abc def \tkv:i:-25234\tzz:f:-14.466e-3";
# output
4-element XAMAuxData.SAM.Auxiliary{MemViews.ImmutableMemView{UInt8}}:
julia> aux = SAM.Auxiliary(data)
4-element XAMAuxData.SAM.Auxiliary{MemoryViews.ImmutableMemoryView{UInt8}}:
"AN" => 'z'
"a1" => "abc def "
"kv" => -25234
Expand All @@ -46,37 +48,29 @@ Also, the construction of an `AugTax` validates that it conforms to the regex `[
Strings can be converted to `AuxTag` for convenience, as in below:

```jldoctest
v = AuxTag[] # implicit conversion
push!(v, "AB")
v
# output
julia> push!(AuxTag[], "AB") # implicit conversion
1-element Vector{AuxTag}:
AuxTag("AB")
```

Attempting to construct an invalid `AuxTag` will error:
```jldoctest
AuxTag("11")
# output
julia> AuxTag("11")
ERROR: Invalid AuxTag. Tags must conform to r"^[A-Za-z][A-Za-z0-9]$".
[...]
```

## Constructing `Auxiliary` objects
`SAM.Auxiliary` and `BAM.Auxiliary` are constructed the same two ways.

Immutable auxiliaries are constructed from any bytes-like object which has a `MemView` method.
Immutable auxiliaries are constructed from any bytes-like object which has a `MemoryView` method.
This may be a `String`, `SubString{String}`, `Memory{UInt8}` etc.
Auxiliary objects are constructed directly from these:

```jldoctest
# Make an IMMUTABLE Auxiliary
SAM.Auxiliary("AB:i:12\tKN:A:z")
# output
2-element XAMAuxData.SAM.Auxiliary{MemViews.ImmutableMemView{UInt8}}:
julia> aux = SAM.Auxiliary("AB:i:12\tKN:A:z")
2-element XAMAuxData.SAM.Auxiliary{MemoryViews.ImmutableMemoryView{UInt8}}:
"AB" => 12
"KN" => 'z'
```
Expand All @@ -89,11 +83,9 @@ In the example below, the first 22 bytes of the vector (the `some not-aux data h
corresponds to the data before the aux data, and hence the first index of the aux data in the vector is 23.

```jldoctest
data = collect(codeunits("some not-aux data hereAB:i:12\tKN:A:z"))
# Make a MUTABLE Auxiliary
aux = SAM.Auxiliary(data, 23)
julia> data = collect(codeunits("some not-aux data hereAB:i:12\tKN:A:z"));
# output
julia> aux = SAM.Auxiliary(data, 23) # Make a MUTABLE Auxiliary
2-element XAMAuxData.SAM.Auxiliary{Vector{UInt8}}:
"AB" => 12
"KN" => 'z'
Expand All @@ -102,7 +94,7 @@ aux = SAM.Auxiliary(data, 23)
No matter whether constructed from a memory view or from a `Vector`, there cannot be any unused bytes at or after the starting index in an `Auxiliary`.
Any trailing bytes will be considered part of the auxiliary data, and may possibly be considered invalid:
```jldoctest
julia> bad_aux = SAM.Auxiliary("AB:A:p\t\t"); # trailing tabs
julia> bad_aux = SAM.Auxiliary("AB:A:p\t\t"); # trailing tabs
julia> isvalid(bad_aux)
false
Expand All @@ -112,29 +104,24 @@ false
`Auxiliary`'s can be read and written like a normal `AbstractDict{AuxTag, Any}`:

```jldoctest
# Create empty mutable SAM.Auxiliary
aux = SAM.Auxiliary(UInt8[], 1)
julia> aux = SAM.Auxiliary(UInt8[], 1); # empty Auxiliary
# Note: The strings are implicitly `convert`ed to AuxTag
aux["AX"] = 'y'
aux["cm"] = 12.1
aux["G1"] = [-1.24, 33.1]
julia> # Note: The strings are implicitly `convert`ed to AuxTag
aux["AX"] = 'y'; aux["AX"]
'y': ASCII/Unicode U+0079 (category Ll: Letter, lowercase)
println(length(aux))
println(aux["AX"])
println(aux["cm"])
println(aux["G1"])
julia> aux["cm"] = 12.1; aux["cm"]
12.1f0
# overwrite AX key
aux["AX"] = [0x01, 0x02]
println(aux["AX"])
julia> aux["G1"] = [-1.24, 33.1]; aux["G1"]
2-element Memory{Float32}:
-1.24
33.1
# output
3
y
12.1
Float32[-1.24, 33.1]
UInt8[0x01, 0x02]
julia> aux["AX"] = [0x01, 0x02]; aux["AX"] # overwrite AX key
2-element Memory{UInt8}:
0x01
0x02
```

Like `Dict`, the order of key/value pairs in auxiliaries is arbitrary and cannot be relied on.
Expand Down Expand Up @@ -174,8 +161,8 @@ Hence, the value written to an `Auxiliary` may not be the same value when being
| `AbstractVector{Int8}` | `B:c` |
| `AbstractVector{UInt16}` | `B:S` |
| `AbstractVector{Int16}` | `B:s` |
| `AbstractVector{<:Signed}` | `B:i` |
| `AbstractVector{<:Unsigned}` | `B:I` |
| `AbstractVector{<:Signed}` | `B:i` |
| `AbstractVector{<:Unsigned}` | `B:I` |
| `AbstractVector{<:AbstractFloat}`| `B:f` |
| `Hex` | `H` |

Expand All @@ -185,12 +172,14 @@ Hence, the value written to an `Auxiliary` may not be the same value when being
outside the recommended range can still be read on 64-bit systems.
- ✝ Permitted `Char` values are only those in `'!':'~'`.
- ‡ Only values representable by a `Float32` are allowed.
- § Only characters in `'!':'~'` and spaces (`' '`) are permitted in strings
- § Only characters in `'!':'~'` and spaces (`' '`) are permitted in strings
- ¶ These are stored as `Int32` and `UInt32` for `Signed` and `Unsigned`, respectively.

### BAM element types
The only different between SAM and BAM types is that the latter format permits different types of integers.
Hence, except the types mentioned below, all the SAM types in the table above are also supported in BAM,
with the same Julia <-> BAM type correspondance.
Further, reading a value of `i` will return an `Int32` instead of an `Int`.

| Input type | BAM type | Julia type read|
| -----------|----------|--------------- |
Expand All @@ -212,24 +201,24 @@ However, if you want to write an `AbstractVector{UInt8}` value explicitly as an
aux = SAM.Auxiliary(UInt8[], 1)
aux["AB"] = UInt8[0x01, 0x02]
using MemViews
using MemoryViews
# Print the memory content of the aux.
# The array was written as a value of the type B:c
println(String(MemView(aux)))
println(String(MemoryView(aux)))
# Wrap input type in the Hex type
aux["AB"] = Hex(UInt8[0x01, 0x02])
# It is now written as a H instead
println(String(MemView(aux)))
println(String(MemoryView(aux)))
# output
AB:B:C,1,2
AB:H:0102
```

## Writing `Auxiliary`s to files
Calling `MemView` on an `Auxiliary` will return a view of the underlying data.
Calling `MemoryView` on an `Auxiliary` will return a view of the underlying data.
This data is guaranteed to be valid SAM/BAM auxiliary data:

```jldoctest
Expand All @@ -238,8 +227,8 @@ aux = SAM.Auxiliary(field1 * '\t' * field2)
# Get a view of the data underlying `aux`.
# This is guaranteed to be valid SAM data (and likewise for BAM)
using MemViews
mem = MemView(aux)
using MemoryViews
mem = MemoryView(aux)
# We make no guarantees about which order the two fields are,
# but we DO guarantee the memory is a valid SAM aux data
Expand Down
18 changes: 9 additions & 9 deletions src/XAMAuxData.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module XAMAuxData

using MemViews: MemViews, MemView, MutableMemView, ImmutableMemView
using MemoryViews: MemoryViews, MemoryView, MutableMemoryView, ImmutableMemoryView
using StringViews: StringView

struct Unsafe end
Expand All @@ -26,14 +26,14 @@ julia> aux = SAM.Auxiliary(UInt8[], 1);
julia> aux["AB"] = Hex([0xae, 0xf8, 0x6c]);
julia> String(MemView(aux))
julia> String(MemoryView(aux))
"AB:H:AEF86C"
julia> aux = BAM.Auxiliary(UInt8[], 1);
julia> aux["AB"] = Hex([0xae, 0xf8, 0x6c]);
julia> String(MemView(aux))
julia> String(MemoryView(aux))
"ABHAEF86C\\0"
```
"""
Expand All @@ -49,7 +49,7 @@ end
# Must encode to uppercase A-F
hexencode_nibble(u::UInt8)::UInt8 = u < 0x0a ? UInt8('0') + u : UInt8('A') - 0x0a + u

function hexencode!(mem::MutableMemView, hex::Hex)
function hexencode!(mem::MutableMemoryView, hex::Hex)
@inbounds for (byte_no, byte) in enumerate(hex.x)
mem[2 * byte_no - 1] = hexencode_nibble(byte >> 4)
mem[2 * byte_no] = hexencode_nibble(byte & 0x0f)
Expand Down Expand Up @@ -95,16 +95,16 @@ as_aux_value(x::Hex) = x

function as_aux_value(s::AbstractString)
cu = codeunits(s)
auxs = if MemViews.MemKind(typeof(cu)) isa MemViews.IsMemory
auxs = if MemoryViews.MemoryKind(typeof(cu)) isa MemoryViews.IsMemory
s
else
String(s)
end
# Take view of codeunits, because StringViews' codeunits return
# the underlying array, so this makes it work for StringViews,
# without having to implement MemView(::StringView), which would
# without having to implement MemoryView(::StringView), which would
# be piracy in this package.
mem = ImmutableMemView(codeunits(auxs))
mem = ImmutableMemoryView(codeunits(auxs))
if is_printable(mem)
auxs
else
Expand Down Expand Up @@ -184,7 +184,7 @@ function Base.copy(aux::AbstractAuxiliary)
v = if x isa Vector{UInt8}
x[aux.start:end]
else
copy(MemView(aux))
copy(MemoryView(aux))
end
typeof(aux)(v, 1)
end
Expand Down Expand Up @@ -330,7 +330,7 @@ function iter_encodings end
Base.IteratorSize(::Type{<:AbstractEncodedIterator}) = Base.SizeUnknown()
Base.eltype(::Type{AbstractEncodedIterator}) = Union{Error, Tuple{AuxTag, UInt8, UnitRange{Int}}}

function load_hex(mem::ImmutableMemView)::Union{Memory{UInt8}, Error}
function load_hex(mem::ImmutableMemoryView)::Union{Memory{UInt8}, Error}
len = length(mem)
# Note: According to specs, Hex can't be empty, but we load it anyway
# because we should be generous in what we accept
Expand Down
Loading

0 comments on commit cdd6e24

Please sign in to comment.