woolyrust

A Rust wrapper around the llama.cpp library, aiming for a high-level API that provides enough functionality to be versatile and useful. The basic, higher-level C functions that this library builds upon are are provided by the woolycore library.

Supported Operating Systems: Windows, MacOS, Linux

License

MIT licensed, like the core upstream llama.cpp it wraps. See LICENSE for details.

Features

Simple high-level Rust interface to use for text generation (Llama).
Basic samplers of llama.cpp, including: temp, top-k, top-p, min-p, tail free sampling, locally typical sampling, mirostat.
Support for llama.cpp's BNF-like grammar rules for sampling.
Ability to cache the current prediction state which can be used to cache processed prompt data in memory so that it can be reused to speed up regeneration using the exact same prompt. It can also be used to continue predictions that have been frozen.
Tokenize text or just get the number of tokens for a given text string.
Generate embeddings using models such as nomic-ai/nomic-embed-text-v1.5-GGUF on HuggingFace in a batched process.

A change log covering API changes can be found here.

Build notes

The upstream llama.cpp code is built through the build.rs build script with the cmake crate as it builds the API bindings code - all automatically, so a simple cargo build suffices.

cargo build --release

This should automatically include Metal support and embed the shaders if the library is being built on MacOS.

For CUDA systems, a feature called cuda has been added, which needs to be supplied for CUDA acceleration. This will greatly increase the compile time of the project. An example build command to enable CUDA would be:

cargo build --release --features cuda

NOTE: Upstream llamacpp makes heavy use of cmake build files and woolycore has adopted them to avoid duplication of effort and to greatly ease maintenance. This unfortunately means cmake is a required dependency to build the library.

Git updates

This project uses submodules for upstream projects so make sure to update with appropriate parameters:

git pull --recurse-submodules

Tests

The unit tests require an environment variable (WOOLY_TEST_MODEL_FILE) to be set with the path to the GGUF file for the model to use during testing. Passing --nocapture as a parameter to cargo allows for the predicted text to show up on stdout for your viewing pleasure.

export WOOLY_TEST_MODEL_FILE=models/example-llama-3-8b.gguf
cargo test --release -- --nocapture --test-threads 1

On windows, setting the environment may look something like this:

set WOOLY_TEST_MODEL_FILE=models/example-llama-3-8b.gguf

Don't forget to add --features cuda for CUDA acceleration on Windows/Linux platforms if that is desired.

To run the embeddings unit test, you have to set the environment variable (WOOLY_TEST_EMB_MODEL_FILE) as well. This should be set to a GGUF file for the embedding model to use for those tests. At present, the tests are designed for nomic-ai/nomic-embed-text-v1.5-GGUF.

Final Notes

Was unsuccessful getting woolycore to build statically for CUDA targets...

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src		src
tests		tests
woolycore @ 35aa4cb		woolycore @ 35aa4cb
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Changelog.md		Changelog.md
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

woolyrust

License

Features

Build notes

Git updates

Tests

Final Notes

About

Releases

Packages

Languages

License

tbogdala/woolyrust

Folders and files

Latest commit

History

Repository files navigation

woolyrust

License

Features

Build notes

Git updates

Tests

Final Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages