v7.4.0
This release adds the SQLite ANN, new text extraction features and a programming language neutral embeddings index format
See below for full details on the new features, improvements and bug fixes.
New Features
- Add SQLite ANN (#780)
- Enhance markdown support for Textractor (#758)
- Update txtai index format to remove Python-specific serialization (#769)
- Add new functionality to RAG application (#753)
- Add bm25s library to benchmarks (#757) Thank you @a0346f102085fe9f!
- Add serialization package for handling supported data serialization methods (#770)
- Add MessagePack serialization as a top level dependency (#771)
Improvements
- Support
<pre>
blocks with Textractor (#749) - Update HF LLM to reduce noisy warnings (#752)
- Update NLTK model downloads (#760)
- Refactor benchmarks script (#761)
- Update documentation to use base imports (#765)
- Update examples to use RAG pipeline instead of Extractor when paired with LLMs (#766)
- Modify NumPy and Torch ANN components to use np.load/np.save (#772)
- Persist Embeddings index ids (only used when content storage is disabled) with MessagePack (#773)
- Persist Reducer component with skops library (#774)
- Persist NetworkX graph component with MessagePack (#775)
- Persist Scoring component metadata with MessagePack (#776)
- Modify vector transforms to load/save data using np.load/np.save (#777)
- Refactor embeddings configuration into separate component (#778)
- Document txtai index format (#779)