Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#360 Bwt #411

Merged
merged 62 commits into from
Jan 18, 2024
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
fcaa7d3
progress
TwFlem Nov 10, 2023
8813062
basic bitvector
TwFlem Nov 23, 2023
e0c2eec
jacobsons start and refactor to uint for accurate machine words
TwFlem Nov 23, 2023
2fe342f
basic rank test
TwFlem Nov 25, 2023
f73bc77
confident that jacobson rank is working
TwFlem Nov 26, 2023
4e75786
reusing the incoming bitvector instead of copying everyithing for
TwFlem Nov 26, 2023
b7ddaef
access and bounds checking
TwFlem Nov 26, 2023
0406520
just do uint64 for simplicity. bound checking and access
TwFlem Nov 26, 2023
2ad117e
bit vector fixes, rsa good enough, wavelet start
TwFlem Dec 2, 2023
a9ba8ae
Simple wavelet tree with access
TwFlem Dec 3, 2023
79fefe2
wavelet fix access, add select, fix rsa bitvector select
TwFlem Dec 3, 2023
8280e7a
fix select again, select for wavelet tree
TwFlem Dec 4, 2023
6802a87
simple FM count
TwFlem Dec 5, 2023
5ee364b
got count working, but had to throw out jacobsons
TwFlem Dec 6, 2023
7468d73
rsa fixes and refactors
TwFlem Dec 6, 2023
2d09fa9
bwt locate
TwFlem Dec 6, 2023
bb3cf93
extract
TwFlem Dec 6, 2023
249867a
add 1 more test for reconstruction
TwFlem Dec 6, 2023
9aea859
doc BWT, refactor, and return a possible error during construction
TwFlem Dec 7, 2023
3866434
add TODO about sorting and the nullChar
TwFlem Dec 7, 2023
154db95
bwt examples, remove TODO that does not matter
TwFlem Dec 8, 2023
ff0fec9
wavelet tree doc and address todos
TwFlem Dec 8, 2023
8589198
wavelet tree explination
TwFlem Dec 8, 2023
5d63789
doc and note for waveletTree
TwFlem Dec 8, 2023
51d8bfd
typo
TwFlem Dec 8, 2023
3a62d7b
extract changes, move around and add to wavelet doc
TwFlem Dec 9, 2023
f25c851
add bwt high level. move wavelet tree's some rsa bv docs
TwFlem Dec 12, 2023
4915695
simplify bitvector, docs for bitvector and rsaBitvector
TwFlem Dec 12, 2023
5c97fa7
fix wavelet select
TwFlem Dec 12, 2023
a4548e3
lint
TwFlem Dec 12, 2023
0168d0e
more lint
TwFlem Dec 12, 2023
42abe6c
doc adjustments
TwFlem Dec 12, 2023
8d661a1
changelog and ensure correct nullChar sorting
TwFlem Dec 12, 2023
f95065c
Merge remote-tracking branch 'poly-upstream/main' into bwt
TwFlem Dec 12, 2023
a520093
fix changelog
TwFlem Dec 12, 2023
f6c2bb9
golanglintci fixes
TwFlem Dec 12, 2023
f5b459b
bubble up errs instead of panics
TwFlem Dec 13, 2023
dcd5aff
fix examples
TwFlem Dec 13, 2023
4dde901
add recovery to other public API
TwFlem Dec 14, 2023
be19a2d
Merge branch 'main' into pr/TwFlem/411
carreter Dec 20, 2023
ecc78d3
Cite Ben Langmead.
TwFlem Dec 21, 2023
5b1ce0b
fix typo
TwFlem Dec 21, 2023
660dbdb
Update bwt/bwt.go
TwFlem Dec 21, 2023
e4fef76
Update bwt/wavelet.go
TwFlem Dec 21, 2023
dd29d3e
Update bwt/wavelet.go
TwFlem Dec 21, 2023
a69d8c5
Update bwt/wavelet.go
TwFlem Dec 21, 2023
63e10a4
Update bwt/wavelet.go
TwFlem Dec 21, 2023
9e02f5d
Update bwt/wavelet.go
TwFlem Dec 21, 2023
92ed9da
Update bwt/wavelet.go
TwFlem Dec 21, 2023
08749e3
Update bwt/wavelet.go
TwFlem Dec 21, 2023
a4eb771
doc improvement
TwFlem Dec 21, 2023
0791c49
doc improvement
TwFlem Dec 21, 2023
e52ba57
requested changes, fix edgcases, test edgecases
TwFlem Dec 21, 2023
6269764
Fix BWT Locate explanation. Typos and English.
TwFlem Dec 22, 2023
307d881
fix wavelet tree example and docs
TwFlem Dec 22, 2023
b3f27f4
fix rsa docs and provide better examples
TwFlem Dec 22, 2023
3e834dd
Fix select. Problems appeared when it started actually getting used in
TwFlem Dec 22, 2023
ff74a1e
put back the shortcut for rank of char at max position in bv
TwFlem Dec 22, 2023
69c5089
wt reconstruct and bwt GetTransform with example
TwFlem Dec 29, 2023
94f2a15
added unit tests for reachable panics.
TimothyStiles Jan 2, 2024
1cc3755
fix err messages, add basic example
TwFlem Jan 3, 2024
77ca52d
fix test
TwFlem Jan 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- Basic BWT for sub-sequence count and offset for sequence alignment. Only supports exact matches for now.


## [0.30.0] - 2023-12-18
Oops, we weren't keeping a changelog before this tag!
Expand Down
75 changes: 75 additions & 0 deletions bwt/bitvector.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
package bwt

import (
"math"
)

const wordSize = 64

// bitvector a sequence of 1's and 0's. You can also think
// of this as an array of bits. This allows us to encode
// data in a memory efficient manner.
type bitvector struct {
bits []uint64
numberOfBits int
}

// newBitVector will return an initialized bitvector with
// the specified number of zeroed bits.
func newBitVector(initialNumberOfBits int) bitvector {
capacity := getNumOfBitSetsNeededForNumOfBits(initialNumberOfBits)
bits := make([]uint64, capacity)
return bitvector{
bits: bits,
numberOfBits: initialNumberOfBits,
}
}

// getBitSet gets the while word as some offset from the
// bitvector. Useful if you'd prefer to work with the
// word rather than with individual bits.
func (b bitvector) getBitSet(bitSetPos int) uint64 {
return b.bits[bitSetPos]
}

// getBit returns the value of the bit at a given offset
// True represents 1
// False represents 0
func (b bitvector) getBit(i int) bool {
b.checkBounds(i)

chunkStart := i / wordSize
offset := i % wordSize

return (b.bits[chunkStart] & (uint64(1) << (63 - offset))) != 0
}

// setBit sets the value of the bit at a given offset
// True represents 1
// False represents 0
func (b bitvector) setBit(i int, val bool) {
b.checkBounds(i)

chunkStart := i / wordSize
offset := i % wordSize

if val {
b.bits[chunkStart] |= uint64(1) << (63 - offset)
} else {
b.bits[chunkStart] &= ^(uint64(1) << (63 - offset))
}
}

func (b bitvector) checkBounds(i int) {
if i >= b.len() || i < 0 {
panic("better out of bounds message")
}
}

func (b bitvector) len() int {
return b.numberOfBits
}

func getNumOfBitSetsNeededForNumOfBits(n int) int {
return int(math.Ceil(float64(n) / wordSize))
}
119 changes: 119 additions & 0 deletions bwt/bitvector_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
package bwt

import (
"testing"
)

type GetBitTestCase struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be exposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It Shouldn't be. Definitions in _test aren't public.

package poly

import (
	"fmt"

	"github.com/bebop/poly/bwt"
)

func polyRoot() {
	thing := bwt.GetBitTestCase{}
	fmt.Println(thing)
}

^ That won't build

position int
expected bool
}

func TestBitVector(t *testing.T) {
initialNumberOfBits := wordSize*10 + 1

bv := newBitVector(initialNumberOfBits)

if bv.len() != initialNumberOfBits {
t.Fatalf("expected len to be %d but got %d", initialNumberOfBits, bv.len())
}

for i := 0; i < initialNumberOfBits; i++ {
bv.setBit(i, true)
}

bv.setBit(3, false)
bv.setBit(11, false)
bv.setBit(13, false)
bv.setBit(23, false)
bv.setBit(24, false)
bv.setBit(25, false)
bv.setBit(42, false)
bv.setBit(63, false)
bv.setBit(64, false)
bv.setBit(255, false)
bv.setBit(256, false)

getBitTestCases := []GetBitTestCase{
{0, true},
{1, true},
{2, true},
{3, false},
{4, true},
{7, true},
{8, true},
{9, true},
{10, true},
{11, false},
{12, true},
{13, false},
{23, false},
{24, false},
{25, false},
{42, false},
{15, true},
{16, true},
{62, true},
{63, false},
{64, false},
// Test past the first word
{65, true},
{72, true},
{79, true},
{80, true},
{255, false},
{256, false},
{511, true},
{512, true},
}

for _, v := range getBitTestCases {
actual := bv.getBit(v.position)
if actual != v.expected {
t.Fatalf("expected %dth bit to be %t but got %t", v.position, v.expected, actual)
}
}
}

func TestBitVectorBoundPanic_GetBit_Lower(t *testing.T) {
defer func() { _ = recover() }()

initialNumberOfBits := wordSize*10 + 1
bv := newBitVector(initialNumberOfBits)
bv.getBit(-1)

t.Fatalf("expected get bit lower bound panic")
}

func TestBitVectorBoundPanic_GetBit_Upper(t *testing.T) {
defer func() { _ = recover() }()
initialNumberOfBits := wordSize*10 + 1
bv := newBitVector(initialNumberOfBits)
bv.getBit(initialNumberOfBits)

t.Fatalf("expected get bit upper bound panic")
}

func TestBitVectorBoundPanic_SetBit_Lower(t *testing.T) {
defer func() {
if r := recover(); r != nil {
return
}
t.Fatalf("expected set bit lower bound panic")
}()
initialNumberOfBits := wordSize*10 + 1
bv := newBitVector(initialNumberOfBits)
bv.setBit(-1, true)
}

func TestBitVectorBoundPanic_SetBit_Upper(t *testing.T) {
defer func() {
if r := recover(); r != nil {
return
}
t.Fatalf("expected set bit upper bound panic")
}()
initialNumberOfBits := wordSize*10 + 1
bv := newBitVector(initialNumberOfBits)
bv.setBit(initialNumberOfBits, true)
}
Loading