Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple GGUF files #379

Merged
merged 7 commits into from
Jun 5, 2024
Merged

Support multiple GGUF files #379

merged 7 commits into from
Jun 5, 2024

Conversation

EricLBuehler
Copy link
Owner

Support multiple GGUF files by refactoring the GGUF Content usage.

Copy link

github-actions bot commented Jun 4, 2024

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    9           21           21            0            0
 Python                 24          864          731           25          108
 TOML                   15          403          365            1           37
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               16         1056            0          782          274
 |- BASH                 6          203          190            0           13
 |- Python               6          121          110            0           11
 |- Rust                 3          185          172            9            4
 (Total)                           1565          472          791          302
-------------------------------------------------------------------------------
 Rust                   93        29569        26925          433         2211
 |- Markdown            47          468            0          455           13
 (Total)                          30037        26925          888         2224
===============================================================================
 Total                 161        32389        28436         1241         2712
===============================================================================
  

@EricLBuehler EricLBuehler linked an issue Jun 5, 2024 that may be closed by this pull request
@EricLBuehler EricLBuehler merged commit a8c2b41 into master Jun 5, 2024
11 checks passed
@EricLBuehler EricLBuehler deleted the multi_gguf_files branch June 5, 2024 00:12
@polarathene
Copy link
Contributor

Oh no I was a bit too slow getting my PR ready 💀

I'm not used to contributing on a project where the change I work on is so prone to conflicts from other activity 😓 Should I have a draft PR that I iterate on instead to better communicate this?

Comment on lines +286 to +292
Some GGUF models are very large and are sharded into multiple files. Mistral.rs supports this, and to use it, delimit the `.gguf` filenames with a space as such:

```bash
./mistralrs-server --chat-template <chat_template> gguf -m . -f "a.gguf b.gguf"
```

For the Python API, a list of strings is also accepted for this case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example GGUF file where this is actually the case? Or was this a misunderstanding from referenced motivation in #380 ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the sharding is necessary other than as this comment suggests, to workaround size restrictions on a file host?

Seems like it'd be more appropriate to have a tool that concatenates the files together if there's a UX issue you want to address, but I don't really see the point in adding additional complexity / maintenance to support something that should realistically be addressed outside of mistral.rs (runtime) 🤷‍♂️

Similar to the tokenizer patching you added recently, these are workarounds that seem more appropriate as CLI tools to apply "fixes" 🤔

@EricLBuehler
Copy link
Owner Author

I'm not used to contributing on a project where the change I work on is so prone to conflicts from other activity 😓 Should I have a draft PR that I iterate on instead to better communicate this?

Would that be for developing a new feature? If so, that sounds good. Otherwise, we have a Discord for this purpose :)

@polarathene
Copy link
Contributor

Would that be for developing a new feature?

No, I was just refactoring on the GGUF tokenizer, but this PR touched quite a bit that I know rebasing my work won't be something I'm interested in doing.

Otherwise, we have a Discord for this purpose :)

I think you saw my github comment on candle about what I was doing before this PR was merged, you had given it a 👍

Good to know though, my mistake.

@EricLBuehler
Copy link
Owner Author

Ah sorry, I thought that was a different refactor. I will roll back these changes, as it would be great to see your changes.

Generally, multiple GGUF files is a rare occurrence. Would your PR include support for those?

@polarathene
Copy link
Contributor

Would your PR include support for those?

No, it was just focused on tidying up the GGUF metadata tokenizer file you already put together, and would then have had the from_gguf() model methods also leverage that to tidy that up a little.

On Linux, from what I've read the file is split in parts that you can just run something like:

cat goliath-120b.Q6_K.gguf-split-* > goliath-120b.Q6_K.gguf

A small CLI tool could do similar to glob the related files and merge the output. I don't know for sure but I assume users would be fine with piecing it back into a single file.

I haven't spent much time reading over your PR changes for the feature, so I could be mistaken on your integration to workaround multi-part files, but at a glance it seemed to interleave a bit (multiple loops and other additions to carry support).

Since I didn't implement my PR on the model files, the conflicting files aren't much. I just don't have the energy atm to compare changes I'd need to do.

@EricLBuehler
Copy link
Owner Author

Nice! Perhaps we can add such a tool. This PR was a bit hackey, so I actually just rolled back this PR. There is just 1 (easy) conflict now in #389.

EricLBuehler added a commit that referenced this pull request Jun 7, 2024
* Intial work on phi3v

* Add the image embedding layer

* Lints

* Implement the loader

* Add infrastructure for phi3 image processor

* Merge

* Merge

* Merge

* Merge

* Partially implement padding

* Implement the hd transform step

* Work on the image processor

* Clippy

* Complete the phi3v inputs processor

* Rename

* Merge

* Merge

* Rename to phi3v and fix deser

* Fix varbuilder

* Fix varbuilder

* Default for do convert rgb

* Some defaults

* Allow no processor config

* Setup debug flag

* Add phi3v

* Implement messages flattening

* Update

* Rewrite the pad, hd transform

* Clippy

* Detect num channels

* Fix reshape

* Fix global image channel dim

* Fix assert

* Fix dtype

* Fix gt

* Fix image id neg

* Fix dim0 of pixel values

* Fix dtype

* Check if model supports gemm

* Fix some shape errors

* Fix some shape errors

* Fix rank of slice_assign

* Fix image toks

* Properly downcase

* Fix response

* Fix response

* Allow no images in prompt

* Output correct hidden state

* Fix nonzero and add test

* Fix n image toks

* Add mistralrs_vision

* Typo

* Fix and add tests

* Fix indexing

* Fix test condition

* Fix unsqueeze

* Fix dtype for norm

* Update clip

* Clippy

* Run clip in f32

* Run in bf16

* Run in bf16 again

* Fix dtype

* Set toks to have correct context lens

* Set toks to have correct context lens

* Support multiple GGUF files (#379)

* Move to gguf module

* Add content abstraction for multiple gguf files

* Fix test

* Allow specifying and loading multiple gguf files

* Update docs and examples

* Print some info

* Merge

* Organize normal loading metadata (#381)

* Organize normal loading metadata

* Fix

* Bump version 0.1.13 -> 0.1.14 (#382)

* Patch incorrect unwrap and bump version (#383)

* Patch incorrect unwrap

* Bump version to 0.1.15

* More verbose logging during loading (#385)

* More verbose logging when loading

* More logging

* Refactor enabling debug logging (#387)

* Refactor enabling debug logging

* Fix reversed order

* Merge

* Merge

* Merge

* Use precise gelu

* Use correct kernel

* Debugging commit

* Add fused bias linear

* Finish merge

* Use fused layer in clip

* Save progress

* Remove debugs

* Update example

* Resize exact

* Update interpolate

* Fix batch dim

* Update test and transform

* It works

* Add some examples

* Allow more than one image

* Add support in python api

* Add to toml selector

* Update python api

* Overhaul readme and docs

* Update

* Export vision arch

* Export vision arch

* Export vision arch

* Fix max img dim

* Fix unwrap
EricLBuehler added a commit that referenced this pull request Jun 8, 2024
* Intial work on phi3v

* Add the image embedding layer

* Lints

* Implement the loader

* Add infrastructure for phi3 image processor

* Merge

* Merge

* Merge

* Merge

* Partially implement padding

* Implement the hd transform step

* Work on the image processor

* Clippy

* Complete the phi3v inputs processor

* Rename

* Merge

* Merge

* Rename to phi3v and fix deser

* Fix varbuilder

* Fix varbuilder

* Default for do convert rgb

* Some defaults

* Allow no processor config

* Setup debug flag

* Add phi3v

* Implement messages flattening

* Update

* Rewrite the pad, hd transform

* Clippy

* Detect num channels

* Fix reshape

* Fix global image channel dim

* Fix assert

* Fix dtype

* Fix gt

* Fix image id neg

* Fix dim0 of pixel values

* Fix dtype

* Check if model supports gemm

* Fix some shape errors

* Fix some shape errors

* Fix rank of slice_assign

* Fix image toks

* Properly downcase

* Fix response

* Fix response

* Allow no images in prompt

* Output correct hidden state

* Fix nonzero and add test

* Fix n image toks

* Add mistralrs_vision

* Typo

* Fix and add tests

* Fix indexing

* Fix test condition

* Fix unsqueeze

* Fix dtype for norm

* Update clip

* Clippy

* Run clip in f32

* Run in bf16

* Run in bf16 again

* Fix dtype

* Set toks to have correct context lens

* Set toks to have correct context lens

* Support multiple GGUF files (#379)

* Move to gguf module

* Add content abstraction for multiple gguf files

* Fix test

* Allow specifying and loading multiple gguf files

* Update docs and examples

* Print some info

* Merge

* Organize normal loading metadata (#381)

* Organize normal loading metadata

* Fix

* Bump version 0.1.13 -> 0.1.14 (#382)

* Patch incorrect unwrap and bump version (#383)

* Patch incorrect unwrap

* Bump version to 0.1.15

* More verbose logging during loading (#385)

* More verbose logging when loading

* More logging

* Refactor enabling debug logging (#387)

* Refactor enabling debug logging

* Fix reversed order

* Merge

* Merge

* Merge

* Use precise gelu

* Use correct kernel

* Debugging commit

* Add fused bias linear

* Finish merge

* Use fused layer in clip

* Save progress

* Remove debugs

* Update example

* Resize exact

* Update interpolate

* Fix batch dim

* Update test and transform

* It works

* Add some examples

* Allow more than one image

* Add support in python api

* Add to toml selector

* Update python api

* Overhaul readme and docs

* Update

* Export vision arch

* Export vision arch

* Export vision arch

* Fix max img dim

* Fix unwrap
@EricLBuehler EricLBuehler restored the multi_gguf_files branch August 18, 2024 12:06
@EricLBuehler EricLBuehler deleted the multi_gguf_files branch August 18, 2024 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support loading multiple GGUF files
2 participants