This project is an implementation of a modular telegram bot based on aiogram, designed for local ML Inference with remote service support. Currently integrated with:
- Stable Diffusion (using stable-diffusion-webui API),
- TTS text-to-speech engine (using TTS (VITS) and so-vits-SVC) as well as OS voices.
- STT integrated with multiple speech recognition engines, including whisper.cpp1, whisperS2T, silero, wav2vec2
- LLMs such as llama (1-3), gpt-j, gpt-2 with support for assistant mode via instruct-tuned lora models and multimodality via adapter-model
- TTA experimental text-to-audio support via audiocraft
Accelerated LLM inference support: llama.cpp, mlc-llm and llama-mps
Remote LLM inference support: oobabooga/text-generation-webui, LostRuins/koboldcpp and llama.cpp server
Compatibility table is available here
Evolved from predecessor Botality I
Shipped with an easy-to-use webui, you can run commands and talk with the bot right in the webui.
You can find it here (coming soon)
Some versions have breaking changes, see Changelog file for more information
[Bot]
- User-based queues and delayed task processing
- Multiple modes to filter access scopes (WL/BL/Both/Admin-only)
- Support of accelerated inference on M1 Macs
- Memory manager, keeps track of models loaded at the same time and loads/unloads them on demand.
[LLM]
- Supports dialog mode casually playing a role described in a character file, keeping chat history with all users in group chats or with each user separately
- Character files can be easily localized for any language for non-english models
- Assistant mode via /ask command or with direct replies (configurable)
- Single-reply short-term memory for assistant feedback
- Supports visual question answering, when multimodal-adapter is available
[SD]
- CLI-like way to pass stable diffusion parameters
- pre-defined prompt wrappers
- lora integration with easy syntax: lora_name100 => <lora:lora_name:1.0> and custom lora activators
[TTS]
- can be run remotely, or on the same machine
- tts output is sent as voice messages
- can be used on voice messages (speech and acapella songs) to dub them with a different voice
[STT]
- can be activated as a speech recognition tool via /stt command replying to voice messages
- if
stt_autoreply_mode
parameter is notnone
, it recognizes voice messages and replies to them with LLM and TTS modules
[TTA]
- can be used with /sfx and /music commands after adding
tta
toactive_modules
- copy
.env.example
file and rename the copy to.env
, do NOT add the .env file to your commits! - set up your telegram bot token and other configuration options in
.env
file - install requirements
pip install -r requrements.txt
- install optional requirements if you want to use tts and tts_server
pip install -r requrements-tts.txt
andpip install -r requrements-llm.txt
if you want to use llm, you'll probably also need a fresh version of pytorch. For speech-to-text runpip install -r requrements-stt.txt
, for text-to-audio runpip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft
- you can continue configuration in the webui, it has helpful tips about each configuration option
- for stable diffusion module, make sure that you have webui installed and it is running with
--api
flag - for text-to-speech module download VITS models, put their names in
tts_voices
configuration option and path to their directory intts_path
- for llm module, see LLM Setup section bellow
- if you want to use webui + api, run it with
python dashboard.py
, otherwise run the bot withpython bot.py
python3.10+ is recommended, due to aiogram compatibility, if you are experiencing problems with whisper or logging, please update numpy.
- original llama (7b version was tested on llama-mps fork for macs), requires running the bot with
python3.10 -m torch.distributed.launch --use_env bot.py
assistant mode for original llama is available with LLaMa-Adapter, to use both chat and assistant mode, some changes[1][2] are necessary for non-mac users. - hf llama (tests outdated) + alpaca-lora / ru-turbo-alpaca-lora
- gpt-2 (tested on ru-gpt3), nanoGPT (tested on minChatGPT [weights])
- gpt-j (tested on a custom model)
- llama.cpp (tested on a lot of models)[models]]
- mlc-llm-chat (tested using prebuilt binaries on demo-vicuna-v1-7b-int3 model, M1 GPU acceleration confirmed, integrated via mlc-chatbot)
- oobabooga webui
- kobold.cpp with the same
remote_ob
backend - llama.cpp server with
remote_lcpp
llm backend option (Obsidian model w/ multimodality tested)
-
Make sure that you have enough RAM / vRAM to run models.
-
Download the weights (and the code if needed) for any large language model
-
in .env file, make sure that
"llm"
is inactive_modules
, then set:
llm_paths
- change the path(s) of model(s) that you downloaded
llm_backend
- select frompytorch
,llama.cpp
,mlc_pb
,remote_ob
,remote_lcpp
llm_python_model_type
= if you setpytorch
in the previous option, set the model type that you want to use, it can begpt2
,gptj
,llama_orig
,llama_hf
andauto_hf
.
llm_character
= a character of your choice, fromcharacters
directory, for examplecharacters.gptj_6B_default
, character files also have prompt templates and model configuration options optimal to specific model, feel free to change the character files, edit their personality and use with other models.
llm_assistant_chronicler
= a input/output formatter/parser for assistant task, can beinstruct
orraw
, do not change if you do not usemlc_pb
.
llm_history_grouping
=user
to store history with each user separately orchat
to store group chat history with all users in that chat
llm_assistant_use_in_chat_mode
=True
/False
when False, use /ask command to ask the model questions without any input history, when True, all messages are treated as questions. -
For llama.cpp: make sure that you have a c++ compiler, then put all necessary flags to enable GPU support, and install it
pip install llama-cpp-python
, download model weights and change the path inllm_paths
. -
For mlc-llm, follow the installation instructions from the docs, then clone mlc-chatbot, and put 3 paths in
llm_paths
. Use withllm_assistant_use_in_chat_mode=True
and withraw
chronicler. -
For oobabooga webui and kobold.cpp, instead of specifying
llm_paths
, setllm_host
, setllm_active_model_type
toremote_ob
and set thellm_character
to one that has the same prompt format / preset as your model. Run the server with --api flag. -
For llama.cpp c-server, start the
./server
, set its URL inllm_host
and setllm_active_model_type
toremote_lcpp
, for multimodality please refer to this thread
Send a message to your bot with the command /tti -h for more info on how to use stable diffusion in the bot, and /tts -h for tts module. The bot uses the same commands as voice names in configuration file for tts. Try /llm command for llm module details. LLM defaults to chat mode for models that support it, assistant can be called with /ask command
License: the code of this project is currently distributed under CC BY-NC-SA 4.0 license, third party libraries might have different licenses.