-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#360 Bwt #411
Merged
Merged
#360 Bwt #411
Changes from 39 commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
fcaa7d3
progress
TwFlem 8813062
basic bitvector
TwFlem e0c2eec
jacobsons start and refactor to uint for accurate machine words
TwFlem 2fe342f
basic rank test
TwFlem f73bc77
confident that jacobson rank is working
TwFlem 4e75786
reusing the incoming bitvector instead of copying everyithing for
TwFlem b7ddaef
access and bounds checking
TwFlem 0406520
just do uint64 for simplicity. bound checking and access
TwFlem 2ad117e
bit vector fixes, rsa good enough, wavelet start
TwFlem a9ba8ae
Simple wavelet tree with access
TwFlem 79fefe2
wavelet fix access, add select, fix rsa bitvector select
TwFlem 8280e7a
fix select again, select for wavelet tree
TwFlem 6802a87
simple FM count
TwFlem 5ee364b
got count working, but had to throw out jacobsons
TwFlem 7468d73
rsa fixes and refactors
TwFlem 2d09fa9
bwt locate
TwFlem bb3cf93
extract
TwFlem 249867a
add 1 more test for reconstruction
TwFlem 9aea859
doc BWT, refactor, and return a possible error during construction
TwFlem 3866434
add TODO about sorting and the nullChar
TwFlem 154db95
bwt examples, remove TODO that does not matter
TwFlem ff0fec9
wavelet tree doc and address todos
TwFlem 8589198
wavelet tree explination
TwFlem 5d63789
doc and note for waveletTree
TwFlem 51d8bfd
typo
TwFlem 3a62d7b
extract changes, move around and add to wavelet doc
TwFlem f25c851
add bwt high level. move wavelet tree's some rsa bv docs
TwFlem 4915695
simplify bitvector, docs for bitvector and rsaBitvector
TwFlem 5c97fa7
fix wavelet select
TwFlem a4548e3
lint
TwFlem 0168d0e
more lint
TwFlem 42abe6c
doc adjustments
TwFlem 8d661a1
changelog and ensure correct nullChar sorting
TwFlem f95065c
Merge remote-tracking branch 'poly-upstream/main' into bwt
TwFlem a520093
fix changelog
TwFlem f6c2bb9
golanglintci fixes
TwFlem f5b459b
bubble up errs instead of panics
TwFlem dcd5aff
fix examples
TwFlem 4dde901
add recovery to other public API
TwFlem be19a2d
Merge branch 'main' into pr/TwFlem/411
carreter ecc78d3
Cite Ben Langmead.
TwFlem 5b1ce0b
fix typo
TwFlem 660dbdb
Update bwt/bwt.go
TwFlem e4fef76
Update bwt/wavelet.go
TwFlem dd29d3e
Update bwt/wavelet.go
TwFlem a69d8c5
Update bwt/wavelet.go
TwFlem 63e10a4
Update bwt/wavelet.go
TwFlem 9e02f5d
Update bwt/wavelet.go
TwFlem 92ed9da
Update bwt/wavelet.go
TwFlem 08749e3
Update bwt/wavelet.go
TwFlem a4eb771
doc improvement
TwFlem 0791c49
doc improvement
TwFlem e52ba57
requested changes, fix edgcases, test edgecases
TwFlem 6269764
Fix BWT Locate explanation. Typos and English.
TwFlem 307d881
fix wavelet tree example and docs
TwFlem b3f27f4
fix rsa docs and provide better examples
TwFlem 3e834dd
Fix select. Problems appeared when it started actually getting used in
TwFlem ff74a1e
put back the shortcut for rank of char at max position in bv
TwFlem 69c5089
wt reconstruct and bwt GetTransform with example
TwFlem 94f2a15
added unit tests for reachable panics.
TimothyStiles 1cc3755
fix err messages, add basic example
TwFlem 77ca52d
fix test
TwFlem File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
package bwt | ||
|
||
import ( | ||
"math" | ||
) | ||
|
||
const wordSize = 64 | ||
|
||
// bitvector a sequence of 1's and 0's. You can also think | ||
// of this as an array of bits. This allows us to encode | ||
// data in a memory efficient manner. | ||
type bitvector struct { | ||
bits []uint64 | ||
numberOfBits int | ||
} | ||
|
||
// newBitVector will return an initialized bitvector with | ||
// the specified number of zeroed bits. | ||
func newBitVector(initialNumberOfBits int) bitvector { | ||
capacity := getNumOfBitSetsNeededForNumOfBits(initialNumberOfBits) | ||
bits := make([]uint64, capacity) | ||
return bitvector{ | ||
bits: bits, | ||
numberOfBits: initialNumberOfBits, | ||
} | ||
} | ||
|
||
// getBitSet gets the while word as some offset from the | ||
// bitvector. Useful if you'd prefer to work with the | ||
// word rather than with individual bits. | ||
func (b bitvector) getBitSet(bitSetPos int) uint64 { | ||
return b.bits[bitSetPos] | ||
} | ||
|
||
// getBit returns the value of the bit at a given offset | ||
// True represents 1 | ||
// False represents 0 | ||
func (b bitvector) getBit(i int) bool { | ||
b.checkBounds(i) | ||
|
||
chunkStart := i / wordSize | ||
offset := i % wordSize | ||
|
||
return (b.bits[chunkStart] & (uint64(1) << (63 - offset))) != 0 | ||
} | ||
|
||
// setBit sets the value of the bit at a given offset | ||
// True represents 1 | ||
// False represents 0 | ||
func (b bitvector) setBit(i int, val bool) { | ||
b.checkBounds(i) | ||
|
||
chunkStart := i / wordSize | ||
offset := i % wordSize | ||
|
||
if val { | ||
b.bits[chunkStart] |= uint64(1) << (63 - offset) | ||
} else { | ||
b.bits[chunkStart] &= ^(uint64(1) << (63 - offset)) | ||
} | ||
} | ||
|
||
func (b bitvector) checkBounds(i int) { | ||
if i >= b.len() || i < 0 { | ||
panic("better out of bounds message") | ||
} | ||
} | ||
|
||
func (b bitvector) len() int { | ||
return b.numberOfBits | ||
} | ||
|
||
func getNumOfBitSetsNeededForNumOfBits(n int) int { | ||
return int(math.Ceil(float64(n) / wordSize)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
package bwt | ||
|
||
import ( | ||
"testing" | ||
) | ||
|
||
type GetBitTestCase struct { | ||
position int | ||
expected bool | ||
} | ||
|
||
func TestBitVector(t *testing.T) { | ||
initialNumberOfBits := wordSize*10 + 1 | ||
|
||
bv := newBitVector(initialNumberOfBits) | ||
|
||
if bv.len() != initialNumberOfBits { | ||
t.Fatalf("expected len to be %d but got %d", initialNumberOfBits, bv.len()) | ||
} | ||
|
||
for i := 0; i < initialNumberOfBits; i++ { | ||
bv.setBit(i, true) | ||
} | ||
|
||
bv.setBit(3, false) | ||
bv.setBit(11, false) | ||
bv.setBit(13, false) | ||
bv.setBit(23, false) | ||
bv.setBit(24, false) | ||
bv.setBit(25, false) | ||
bv.setBit(42, false) | ||
bv.setBit(63, false) | ||
bv.setBit(64, false) | ||
|
||
getBitTestCases := []GetBitTestCase{ | ||
{0, true}, | ||
{1, true}, | ||
{2, true}, | ||
{3, false}, | ||
{4, true}, | ||
{7, true}, | ||
{8, true}, | ||
{9, true}, | ||
{10, true}, | ||
{11, false}, | ||
{12, true}, | ||
{13, false}, | ||
{23, false}, | ||
{24, false}, | ||
{25, false}, | ||
{42, false}, | ||
{15, true}, | ||
{16, true}, | ||
{62, true}, | ||
{63, false}, | ||
{64, false}, | ||
{65, true}, | ||
{72, true}, | ||
{79, true}, | ||
{80, true}, | ||
} | ||
|
||
for _, v := range getBitTestCases { | ||
actual := bv.getBit(v.position) | ||
if actual != v.expected { | ||
t.Fatalf("expected %dth bit to be %t but got %t", v.position, v.expected, actual) | ||
} | ||
} | ||
} | ||
|
||
func TestBitVectorBoundPanic_GetBit_Lower(t *testing.T) { | ||
defer func() { _ = recover() }() | ||
|
||
initialNumberOfBits := wordSize*10 + 1 | ||
bv := newBitVector(initialNumberOfBits) | ||
bv.getBit(-1) | ||
|
||
t.Fatalf("expected get bit lower bound panic") | ||
} | ||
|
||
func TestBitVectorBoundPanic_GetBit_Upper(t *testing.T) { | ||
defer func() { _ = recover() }() | ||
initialNumberOfBits := wordSize*10 + 1 | ||
bv := newBitVector(initialNumberOfBits) | ||
bv.getBit(initialNumberOfBits) | ||
|
||
t.Fatalf("expected get bit upper bound panic") | ||
} | ||
|
||
func TestBitVectorBoundPanic_SetBit_Lower(t *testing.T) { | ||
defer func() { | ||
if r := recover(); r != nil { | ||
return | ||
} | ||
t.Fatalf("expected set bit lower bound panic") | ||
}() | ||
initialNumberOfBits := wordSize*10 + 1 | ||
bv := newBitVector(initialNumberOfBits) | ||
bv.setBit(-1, true) | ||
} | ||
|
||
func TestBitVectorBoundPanic_SetBit_Upper(t *testing.T) { | ||
defer func() { | ||
if r := recover(); r != nil { | ||
return | ||
} | ||
t.Fatalf("expected set bit upper bound panic") | ||
}() | ||
initialNumberOfBits := wordSize*10 + 1 | ||
bv := newBitVector(initialNumberOfBits) | ||
bv.setBit(initialNumberOfBits, true) | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be exposed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It Shouldn't be. Definitions in _test aren't public.
^ That won't build