Support multiple GGUF files #379

EricLBuehler · 2024-06-04T22:02:29Z

Support multiple GGUF files by refactoring the GGUF Content usage.

github-actions · 2024-06-04T22:03:30Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    9           21           21            0            0
 Python                 24          864          731           25          108
 TOML                   15          403          365            1           37
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               16         1056            0          782          274
 |- BASH                 6          203          190            0           13
 |- Python               6          121          110            0           11
 |- Rust                 3          185          172            9            4
 (Total)                           1565          472          791          302
-------------------------------------------------------------------------------
 Rust                   93        29569        26925          433         2211
 |- Markdown            47          468            0          455           13
 (Total)                          30037        26925          888         2224
===============================================================================
 Total                 161        32389        28436         1241         2712
===============================================================================

polarathene · 2024-06-05T04:04:25Z

Oh no I was a bit too slow getting my PR ready 💀

I'm not used to contributing on a project where the change I work on is so prone to conflicts from other activity 😓 Should I have a draft PR that I iterate on instead to better communicate this?

polarathene · 2024-06-05T04:16:31Z

README.md

+Some GGUF models are very large and are sharded into multiple files. Mistral.rs supports this, and to use it, delimit the `.gguf` filenames with a space as such:
+
+```bash
+./mistralrs-server --chat-template <chat_template> gguf -m . -f "a.gguf b.gguf"
+```
+
+For the Python API, a list of strings is also accepted for this case.


Do you have an example GGUF file where this is actually the case? Or was this a misunderstanding from referenced motivation in #380 ?

Yes: https://huggingface.co/mradermacher/dolphin-2.9-mixtral-8x22b-GGUF/tree/main

I don't understand why the sharding is necessary other than as this comment suggests, to workaround size restrictions on a file host?

Seems like it'd be more appropriate to have a tool that concatenates the files together if there's a UX issue you want to address, but I don't really see the point in adding additional complexity / maintenance to support something that should realistically be addressed outside of mistral.rs (runtime) 🤷‍♂️

Similar to the tokenizer patching you added recently, these are workarounds that seem more appropriate as CLI tools to apply "fixes" 🤔

EricLBuehler · 2024-06-05T09:21:31Z

I'm not used to contributing on a project where the change I work on is so prone to conflicts from other activity 😓 Should I have a draft PR that I iterate on instead to better communicate this?

Would that be for developing a new feature? If so, that sounds good. Otherwise, we have a Discord for this purpose :)

polarathene · 2024-06-05T11:30:01Z

Would that be for developing a new feature?

No, I was just refactoring on the GGUF tokenizer, but this PR touched quite a bit that I know rebasing my work won't be something I'm interested in doing.

Otherwise, we have a Discord for this purpose :)

I think you saw my github comment on candle about what I was doing before this PR was merged, you had given it a 👍

Good to know though, my mistake.

EricLBuehler · 2024-06-05T11:52:25Z

Ah sorry, I thought that was a different refactor. I will roll back these changes, as it would be great to see your changes.

Generally, multiple GGUF files is a rare occurrence. Would your PR include support for those?

polarathene · 2024-06-05T12:13:24Z

Would your PR include support for those?

No, it was just focused on tidying up the GGUF metadata tokenizer file you already put together, and would then have had the from_gguf() model methods also leverage that to tidy that up a little.

On Linux, from what I've read the file is split in parts that you can just run something like:

cat goliath-120b.Q6_K.gguf-split-* > goliath-120b.Q6_K.gguf

A small CLI tool could do similar to glob the related files and merge the output. I don't know for sure but I assume users would be fine with piecing it back into a single file.

I haven't spent much time reading over your PR changes for the feature, so I could be mistaken on your integration to workaround multi-part files, but at a glance it seemed to interleave a bit (multiple loops and other additions to carry support).

Since I didn't implement my PR on the model files, the conflicting files aren't much. I just don't have the energy atm to compare changes I'd need to do.

EricLBuehler · 2024-06-05T12:18:57Z

Nice! Perhaps we can add such a tool. This PR was a bit hackey, so I actually just rolled back this PR. There is just 1 (easy) conflict now in #389.

* Intial work on phi3v * Add the image embedding layer * Lints * Implement the loader * Add infrastructure for phi3 image processor * Merge * Merge * Merge * Merge * Partially implement padding * Implement the hd transform step * Work on the image processor * Clippy * Complete the phi3v inputs processor * Rename * Merge * Merge * Rename to phi3v and fix deser * Fix varbuilder * Fix varbuilder * Default for do convert rgb * Some defaults * Allow no processor config * Setup debug flag * Add phi3v * Implement messages flattening * Update * Rewrite the pad, hd transform * Clippy * Detect num channels * Fix reshape * Fix global image channel dim * Fix assert * Fix dtype * Fix gt * Fix image id neg * Fix dim0 of pixel values * Fix dtype * Check if model supports gemm * Fix some shape errors * Fix some shape errors * Fix rank of slice_assign * Fix image toks * Properly downcase * Fix response * Fix response * Allow no images in prompt * Output correct hidden state * Fix nonzero and add test * Fix n image toks * Add mistralrs_vision * Typo * Fix and add tests * Fix indexing * Fix test condition * Fix unsqueeze * Fix dtype for norm * Update clip * Clippy * Run clip in f32 * Run in bf16 * Run in bf16 again * Fix dtype * Set toks to have correct context lens * Set toks to have correct context lens * Support multiple GGUF files (#379) * Move to gguf module * Add content abstraction for multiple gguf files * Fix test * Allow specifying and loading multiple gguf files * Update docs and examples * Print some info * Merge * Organize normal loading metadata (#381) * Organize normal loading metadata * Fix * Bump version 0.1.13 -> 0.1.14 (#382) * Patch incorrect unwrap and bump version (#383) * Patch incorrect unwrap * Bump version to 0.1.15 * More verbose logging during loading (#385) * More verbose logging when loading * More logging * Refactor enabling debug logging (#387) * Refactor enabling debug logging * Fix reversed order * Merge * Merge * Merge * Use precise gelu * Use correct kernel * Debugging commit * Add fused bias linear * Finish merge * Use fused layer in clip * Save progress * Remove debugs * Update example * Resize exact * Update interpolate * Fix batch dim * Update test and transform * It works * Add some examples * Allow more than one image * Add support in python api * Add to toml selector * Update python api * Overhaul readme and docs * Update * Export vision arch * Export vision arch * Export vision arch * Fix max img dim * Fix unwrap

EricLBuehler added 2 commits June 4, 2024 16:51

Move to gguf module

bd989d5

Add content abstraction for multiple gguf files

0f078b0

EricLBuehler added 3 commits June 4, 2024 18:36

Fix test

851a268

Allow specifying and loading multiple gguf files

6e7aa62

Update docs and examples

83cd87c

EricLBuehler mentioned this pull request Jun 4, 2024

Support loading multiple GGUF files #380

Closed

EricLBuehler added 2 commits June 4, 2024 19:40

Merge branch 'master' into multi_gguf_files

c2bed23

Print some info

6229d3f

EricLBuehler linked an issue Jun 5, 2024 that may be closed by this pull request

Support loading multiple GGUF files #380

Closed

EricLBuehler merged commit a8c2b41 into master Jun 5, 2024
11 checks passed

EricLBuehler deleted the multi_gguf_files branch June 5, 2024 00:12

EricLBuehler mentioned this pull request Jun 5, 2024

Running model from a GGUF file, only #326

Closed

polarathene reviewed Jun 5, 2024

View reviewed changes

polarathene mentioned this pull request Jun 5, 2024

Refactor: GGUF metadata tokenizer #389

Merged

EricLBuehler restored the multi_gguf_files branch August 18, 2024 12:06

EricLBuehler deleted the multi_gguf_files branch August 18, 2024 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple GGUF files #379

Support multiple GGUF files #379

EricLBuehler commented Jun 4, 2024

github-actions bot commented Jun 4, 2024 •

edited

Loading

polarathene commented Jun 5, 2024

polarathene Jun 5, 2024

EricLBuehler Jun 5, 2024

polarathene Jun 5, 2024

EricLBuehler commented Jun 5, 2024

polarathene commented Jun 5, 2024

EricLBuehler commented Jun 5, 2024

polarathene commented Jun 5, 2024

EricLBuehler commented Jun 5, 2024

Support multiple GGUF files #379

Support multiple GGUF files #379

Conversation

EricLBuehler commented Jun 4, 2024

github-actions bot commented Jun 4, 2024 • edited Loading

polarathene commented Jun 5, 2024

polarathene Jun 5, 2024

Choose a reason for hiding this comment

EricLBuehler Jun 5, 2024

Choose a reason for hiding this comment

polarathene Jun 5, 2024

Choose a reason for hiding this comment

EricLBuehler commented Jun 5, 2024

polarathene commented Jun 5, 2024

EricLBuehler commented Jun 5, 2024

polarathene commented Jun 5, 2024

EricLBuehler commented Jun 5, 2024

github-actions bot commented Jun 4, 2024 •

edited

Loading