Skip to content

Commit

Permalink
update readme files
Browse files Browse the repository at this point in the history
  • Loading branch information
tibvdm committed Apr 15, 2024
1 parent 457437d commit 3ff7233
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 8 deletions.
15 changes: 7 additions & 8 deletions fa-compression/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,27 @@
![Codecov](https://img.shields.io/codecov/c/github/unipept/unipept-index?token=IZ75A2FY98&flag=fa-compression&logo=codecov)
![Static Badge](https://img.shields.io/badge/doc-rustdoc-blue)

The `fa-compression` library offers compression for Unipept's functional annotation strings. These strings follow a very specific
format that the compression algorithm will use to achieve a guaranteed minimal compression of **50%** for both very large and very
small input strings. The compression ratio will often situate around **60-70%**.
The `fa-compression` library offers two compression algorithms for Unipept's functional annotation strings. These strings follow a very specific format that the first compression algorithm will use to achieve a guaranteed minimal compression of **50%** for both very large and very small input strings. The compression ratio will often situate around **68-71%**. This compression algorithm never has to allocate
extra memory to build an encoding table or something similar. We can encode each string separately. This is particullary useful when
all strings have to be encoded/decoded on their own. There is no need to decode an entire database to only fetch a single entry.

The compression algorithm never has to allocate extra memory to build an encoding table or something similar. We can encode each
string separately. This is particullary useful when all strings have to be encoded/decoded on their own. There is no need to decode
an entire database to only fetch a single entry.
The second algorithm will build a global decoding table. This table maps 32-bit integers onto functional annotations. Each annotation in an
input string can then be represented by only 3 bytes. The compression ratio will be slightly higher at around **76%**. The performance however, is slightly worse. The allocation of a table, makes it a little bit more complex to use.

## Example

```rust
use fa_compression;

fn main() {
let encoded: Vec<u8> = fa_compression::encode(
let encoded: Vec<u8> = fa_compression::algorithm1::encode(
"IPR:IPR016364;EC:1.1.1.-;IPR:IPR032635;GO:0009279;IPR:IPR008816"
);

// [ 44, 44, 44, 189, 17, 26, 56, 173, 18, 116, 117, 225, 67, 116, 110, 17, 153, 39 ]
println!("{:?}", encoded);

let decoded: String = fa_compression::decode(&encoded);
let decoded: String = fa_compression::algorithm1::decode(&encoded);

// "EC:1.1.1.-;GO:0009279;IPR:IPR016364;IPR:IPR032635;IPR:IPR008816"
println!("{:?}", decoded);
Expand Down
13 changes: 13 additions & 0 deletions sa-mappings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Suffix Array Mappings

![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/unipept/unipept-index/test.yml?logo=github)
![Codecov](https://img.shields.io/codecov/c/github/unipept/unipept-index?token=IZ75A2FY98&flag=sa-mappings&logo=codecov)
![Static Badge](https://img.shields.io/badge/doc-rustdoc-blue)

A suffix array returns a range of matches. This range has to be mapped to some informational data. The `sa-mappings` library
offers a few utilities to build these mapping tables. The library offers functionality to map each SA output onto its
proteins, taxonomy and functional annotations.

- `sa_mappings::taxonomy::TaxonAggregator` can aggregate a list of taxa.
- `sa_mappings::functionality::FunctionAggregator` can aggregate a list of functional annotations.
- `sa_mappings::proteins::Proteins` can map an SA entry to a protein and all its information

0 comments on commit 3ff7233

Please sign in to comment.