Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing tokenized single cells #119

Open
nleroy917 opened this issue May 28, 2024 · 0 comments
Open

Storing tokenized single cells #119

nleroy917 opened this issue May 28, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@nleroy917
Copy link
Member

It would be nice to be able to store individual, tokenized cells so that we can share tokenized datasets that are used for training machine learning models.

The data might look like this:

cell1 = [999, 101, 22]
cell2 = [123, 456, 89, 99999]
cell3 = [1]

These are all tokenized to one universe, but there's really no "bed file" that is to be "tokenized"... so this needs to be addressed. From discord. AKA -- "unlinked tokenized bedset"

unlinked: because the individual bed files don't actually exist as entities in bedbase.
tokenized: because they're all defined in terms of a universe
bedset: because it's many of them (like, 100,000 of them). -- also why each one is not its own entity in the db.

@khoroshevskyi khoroshevskyi added the enhancement New feature or request label Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants