v5.3.0
This release adds embeddings-guided and prompt-driven search along with a number of methods to train language models
🔎 Prompt-driven search is a big step forward towards conversational search in txtai. With this release, complex prompts can now be passed to txtai to customize how search results are returned. Lots of exciting possibilities on this front, stay tuned.
💡 The trainer pipeline now has support for training language models from scratch. It supports masked language modeling (MLM), causal language modeling (CLM) and replaced token detection (ELECTRA-style). This is part of the micromodels effort.
See below for full details on the new features, improvements and bug fixes.
New Features
- Add language modeling task to HFTrainer (#403)
- Add language modeling example notebook (#408)
- Add FAQ section to documentation (#413)
- Add language generation task to HFTrainer (#414)
- Add replaced token detection task to HFTrainer (#415)
- Add generator pipeline for text generation (#416)
- Add notebook for embeddings-guided and prompt-driven search with LLMs (#418)
Improvements
- Normalize BM25 and TF-IDF scores (#401)
- Add note to restart kernel if running in Google Colab - Thank you @hsm207! (#410)
- Add clear error when starting API and config file not found (#412)
- Extractor pipeline 2.0 (#417)
- Make texts parameter optional for extractor pipeline in applications (#420)
Bug Fixes
- Fix issue with ORDER BY case sensitivity (#405)