Releases: av/harbor
v0.1.29
Misc
boost
now supports standalone usage, without the rest of the harbor
Full Changelog: v0.1.28...v0.1.29
v0.1.28
STT - faster-whisper-service integration
Harbor now has a dedicated stt
backend, in addition to the already present tts
. Open WebUI will be configured to use it automatically instead of "local" whisper, when running together. The server will use GPU automatically, if possible on the given platform and CPU otherwise.
# Start the service
harbor up stt
# Convigure model/version
harbor stt model Systran/faster-distil-whisper-large-v3
harbor stt version latest
Misc
- OpenHands integration, the service is not very configurable atm, with only basic support for Ollama URL, file an issue if that changes in the future!
- CLI linter
Full Changelog: v0.1.27...v0.1.28
v0.1.27 - Harbor Boost
v0.1.27 - Harbor Boost
Harbor can now boost small llamas to be better at creative and reasoning tasks. I'm happy to present Harbor Boost - optimizing LLM proxy with OpenAI-compatible API.
It allows implementing workflows like below:
- When "random" is mentioned in the message, klmbr will rewrite 35% of message characters to increase the entropy and produce more diverse completion
- Launch self-reflection reasoning chain when the message ends with a question mark
- Expand the conversation context with the "inner monologue" of the model, where it can iterate over your question a few times before giving the final answer
Count "r"s in "strawberry"this problem is solved
See how Harbor can boost the creativity randomness in a small llama beyound the infinite "Turquoise", using klmbr
:
Screencast.from.22-09-24.17.41.52.webm
klmbr
will process your inputs to inject some randomness into them, so even with 0
temperature - LLM output will be varied (sometimes in a very unexpected way). Harbor allows to configure various parameters of klmbr
via both CLI and .env
.
You can also use rcn
(brand new technique) an g1
CoT to make your llama more reasonable.
This works, essentially, by just giving an LLM more time to "think" about its answer and improves reasoning in many cases at the expense of larger amount of tokens consumed.
Misc
harbor size
- shows the size of caches from Harbor services on your system (we don't recomment running it, it hurts)harbor bench
- better logs with ETA and service pointers, fixed issue with parameter propagation for reproducible results, added BBH256/32 examplesharbor update
should now allow updating past 0.1.9 on MacOS (granted you'll manage to update past it in the first place 🙃)
Full Changelog: v0.1.26...v0.1.27
v0.1.26
v0.1.26 - Run Harbor with external Ollama
It's now possible to configure Harbor to use external Ollama installation. The URL is relative to the container internal network.
# URL is internal to the container network
harbor config get ollama.internal_url
# Suitable default, when running built-in Ollama
harbor url -i ollama # http://ollama:11434
# Linux
# 172.17.0.1 is the IP of your host within the container
harbor config set ollama.internal_url http://172.17.0.1:33821
# Windows, MacOS
# Should have additional default host out of the box
harbor config set ollama.internal_url http://docker.host.internal:33821
Full Changelog: v0.1.25...v0.1.26
v0.1.25
v0.1.25 - KTransformers integration
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
🔥 Show Cases | 🚀 Quick Start | 📃 Tutorial | 💬 DiscussionStarting
# [Optional] Pre-build the image
# This is very large, as it's based on pytorch+cuda
# go grab a coffee!
harbor build ktransformers
# Start the service
harbor up ktransformers
Harbor's version was monkey-patched to be compatible with Open WebUI and will appears as ktransformers
in the model selector upon successful start.
https://github.com/av/harbor/wiki/ktransformers-webui.png
Full Changelog: v0.1.24...v0.1.25
v0.1.24
v0.1.24 - "But we have o1 at home!"
Based on the reference work from:
Minimal streamlit-based service with Ollama as a backend, that implements the o1-like reasoning chains.
Starting
# Start the service
harbor up ol1
# Open ol1 in the browser
harbor open ol1
Configuration
# Get/set desired Ollama model for ol1
harbor ol1 model
# Set the temperature
harbor ol1 args set temperature 0.5
Full Changelog: v0.1.23...v0.1.24
v0.1.23
v0.1.23 - harbor history
Harbor remembers a number of most recently executed CLI commands. You can search/re-run the commands via the harbor history
command.
This is an addition to the native history in your shell, that'll persist longer and is specific to the Harbor CLI.
Use history.size
config option to adjust the number of commands stored in the history.
# Set current history size
harbor history size 50
History is stored in the .history
file in the Harbor workspace, you can also edit/access it manually.
# Using a built-in helper
harbor history ls | grep ollama
# Manually, using the file
cat $(harbor home)/.history | grep ollama
You can clear the history with the harbor history clear
command.
# Clear the history
harbor history clear
# Empty
harbor history
Full Changelog: v0.1.22...v0.1.23
v0.1.22
v0.1.22 - JupyterLab intergration
# [Optional] pre-build the image
harbor build jupyter
# Start the service
harbor up jupyter
# Open JupyterLab in the browser
harbor open jupyter
Your notebooks are stored in the Harbor workspace, under the jupyter
directory.
# Opens workspace folder in the File Mangager
harbor jupyter workspace
# See workspace location,
# relative to $(harbor home)
harbor config get juptyer.workspace
Additionally, you can configure service to install additional packages.
# See deps help
# It's a manager for underlying array
harbor jupyter deps -h
# Add packages to install, supports the same
# specifier syntax as pip
harbor jupyter deps add numpy
harobr jupyter deps add SomeProject@git+https://git.repo/[email protected]
harbor jupyter deps add SomePackage[PDF,EPUB]==3.1.4
Full Changelog: v0.1.21...v0.1.22
v0.1.21
v0.1.21 - Harbor profiles
Profiles is a way to save/load a complete configuration for the specific task. For example, to quickly switch between the models that take a few commands to configure. Profiles include all options that can be set via harbor config
(which is aliased by most of the CLI helpers).
Usage
harbor
profile|profiles|p [ls|rm|add] - Manage Harbor profiles
profile ls|list - List all profiles
profile rm|remove <name> - Remove a profile
profile add|save <name> - Add current config as a profile
profile set|use|load <name> - Use a profile
There are a few considerations when using profiles:
- When the profile is loaded, modifications are not saved by default and will be lost when switching to another profile (or reloading the current one). Use
harbor profile save <name>
to persist the changes after making them - Profiles are stored in the Harbor workspace and can be shared between different Harbor instances
- Profiles are not versioned and are not guaranteed to work between different Harbor versions
- You can also edit profiles as
.env
files in the workspace, it's not necessary to use the CLI
Example
# 1. Switch to the default for a "clean" state
harbor profile use default
# 2. Configure services as needed
harbor defaults remove ollama
harbor defaults add llamacpp
harbor llamacpp model https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
harbor llamacpp args -ngl 99 --ctx-size 8192 -np 4 -ctk q8_0 -ctv q8_0 -fa
# 3. Save profile for future use
harbor profile add cpp8b
# 4. Up - runs in the background
harbor up
# 5. Adjust args - no parallelism, no kv quantization, no flash attention
# These changes are not saved in "cpp8b"
harbor llamacpp args -ngl 99 --ctx-size 2048
# 6. Save another profile
harbor profile add cpp8b-smart
# 7. Restart with "smart" settings
harbor profile use cpp8b-smart
harbor restart llamacpp
# 8. Switch between created profiles
harbor profile use default
harbor profile use cpp8b-smart
harbor profile use cpp8b
Full Changelog: v0.1.20...v0.1.21
v0.1.20
v0.1.20 - SGLang integration
SGLang is a fast serving framework for large language models and vision language models.
Starting
# [Optional] Pre-pull the image
harbor pull sglang
# Download with HF CLI
harbor hf download google/gemma-2-2b-it
# Set the model to run using HF specifier
harbor sglang model google/gemma-2-2b-it
# See original CLI help for available options
harbor run sglang --help
# Set the extra arguments via "harbor args"
harbor sglang args --context-length 2048 --disable-cuda-graph
Full Changelog: v0.1.19...v0.1.20