Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIME type sniffing #30

Merged
merged 12 commits into from
Jun 7, 2024
Merged

MIME type sniffing #30

merged 12 commits into from
Jun 7, 2024

Conversation

PlamenHristov
Copy link
Contributor

No description provided.

@banyancomputer banyancomputer deleted a comment from linear bot Jun 5, 2024
Cargo.toml Outdated Show resolved Hide resolved
Comment on lines 7 to 9
pub enum MetadataKey {
MimeType,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that being able to store arbitrary bytes with an arbitrary key was an intentional design choice to maximize flexibility, but that might be inaccurate. But obviously restricting the key type to an enum has value in terms of organization and consistency.

@sstelfox any thoughts?

Copy link
Contributor Author

@PlamenHristov PlamenHristov Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought the same, but then realized Node is not exposed to the client (if I remember correctly)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata entries are going to need to support arbitrary keys. We intend to allow users to set and access any key by string (case-sensitive, string limit of 255 characters) it just isn't fully exposed out to the public facing APIs yet (just on the Node type). There are going to be some "reserved" keywords for things like this.

I do like the idea of using enums for the reserved type, maybe we could just add an extra variant like "User" or "Custom" that contains a String if it doesn't match a reserved type.

@jscatena88
Copy link
Contributor

At a high level, I don't know how I feel about adding the concept of mime types into the filesystem. It seems one or two logical layers above the layer of a "file system". And requires bringing in two crates, at least one of which has a questionable future. I wonder if this could be its own crate that the fuse crate or others could import that is built on top of banyanFS, maybe defining constants that are used for the metadata key.

Or maybe less extreme it could probably be an optional feature that banyanFS can be built without?

@PlamenHristov
Copy link
Contributor Author

PlamenHristov commented Jun 5, 2024

At a high level, I don't know how I feel about adding the concept of mime types into the filesystem. It seems one or two logical layers above the layer of a "file system". And requires bringing in two crates, at least one of which has a questionable future. I wonder if this could be its own crate that the fuse crate or others could import that is built on top of banyanFS, maybe defining constants that are used for the metadata key.

Or maybe less extreme it could probably be an optional feature that banyanFS can be built without?

I believe the main use case of all this was for wasm, therefore moving the implementation to that module makes the most sense.

@sstelfox ?

@PlamenHristov PlamenHristov marked this pull request as ready for review June 6, 2024 13:53
@PlamenHristov PlamenHristov changed the title Preliminary implementation of MIME type sniffing MIME type sniffing Jun 6, 2024
@sstelfox
Copy link
Contributor

sstelfox commented Jun 6, 2024

At a high level, I don't know how I feel about adding the concept of mime types into the filesystem. It seems one or two logical layers above the layer of a "file system". And requires bringing in two crates, at least one of which has a questionable future. I wonder if this could be its own crate that the fuse crate or others could import that is built on top of banyanFS, maybe defining constants that are used for the metadata key.
Or maybe less extreme it could probably be an optional feature that banyanFS can be built without?

I believe the main use case of all this was for wasm, therefore moving the implementation to that module makes the most sense.

@sstelfox ?

I'm in the process of reviewing this but yeah we largely want this as part of this implementation and not part of the protocol / spec itself. Our WASM front-end needs to have this mime guessed information using more than just the file extension. We don't want to have to retrieve the data from storage to perform the mime type guessing whenever we need it so it needs to be part of the metadata associated with entries.

We could have done this in javascript itself, but previously discussed an asset pipeline living at a higher level inside BanyanFS for doing things like image preview generation (to store as associated data), likewise things like embeddings could easily be stored in associated data and would probably need the mime type for selecting the type of embedding models to use for that type.

So yes I think Mime belong in here, but not the spec. It's probably worth feature gating whether this is present or not to allow for use in small or minimal environments but I don't think that is necessary for this PR.

Copy link
Contributor

@sstelfox sstelfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor tweaks but looks really solid!

Cargo.toml Outdated Show resolved Hide resolved
Comment on lines 7 to 9
pub enum MetadataKey {
MimeType,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata entries are going to need to support arbitrary keys. We intend to allow users to set and access any key by string (case-sensitive, string limit of 255 characters) it just isn't fully exposed out to the public facing APIs yet (just on the Node type). There are going to be some "reserved" keywords for things like this.

I do like the idea of using enums for the reserved type, maybe we could just add an extra variant like "User" or "Custom" that contains a String if it doesn't match a reserved type.

None
}

fn pattern_match(&self) -> Option<mime::MediaType> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang I wasn't expecting to see an actual byte based mime guesser in here that's pretty neat. I've never thought of matching variable length byte arrays like this, that's pretty neat and probably pretty efficient.

Copy link
Contributor

@sstelfox sstelfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@sstelfox
Copy link
Contributor

sstelfox commented Jun 7, 2024

Does looks like some of the new tests are legitimately breaking, need a touch of clean up there but otherwise 👍🏼

@PlamenHristov PlamenHristov added this pull request to the merge queue Jun 7, 2024
Merged via the queue into main with commit e95b1bc Jun 7, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants