Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the format, metadata for shared probability tables, strings, etc. #164

Closed
dominiccooney opened this issue Sep 27, 2018 · 2 comments
Closed

Comments

@dominiccooney
Copy link
Member

The binast-node repo and this repo's entropy branch are experimenting with entropy-based encoding methods. To achieve better file sizes, there's an idea to share probability tables and other data between files in a corpus. We need to define what this format is.

Because a lot of data in this file depends on the grammar, possibly the grammar should be part of this "dictionary" file, or the dictionary file should refer to the grammar.

Because a given file would not be usable with a different dictionary, it probably makes sense for a file to have metadata pointing to its dictionary as opposed to/in addition to out of band metadata such as a HTTP header.

@Yoric
Copy link
Collaborator

Yoric commented Oct 9, 2018

Discussion moved here.

@Yoric Yoric closed this as completed Oct 9, 2018
@Yoric
Copy link
Collaborator

Yoric commented Oct 9, 2018

Because a given file would not be usable with a different dictionary, it probably makes sense for a file to have metadata pointing to its dictionary as opposed to/in addition to out of band metadata such as a HTTP header.

That specific point moved here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants