Skip to content

v7.4.0

Compare
Choose a tag to compare
@davidmezzetti davidmezzetti released this 05 Sep 17:25
· 48 commits to master since this release

This release adds the SQLite ANN, new text extraction features and a programming language neutral embeddings index format

See below for full details on the new features, improvements and bug fixes.

New Features

  • Add SQLite ANN (#780)
  • Enhance markdown support for Textractor (#758)
  • Update txtai index format to remove Python-specific serialization (#769)
  • Add new functionality to RAG application (#753)
  • Add bm25s library to benchmarks (#757) Thank you @a0346f102085fe9f!
  • Add serialization package for handling supported data serialization methods (#770)
  • Add MessagePack serialization as a top level dependency (#771)

Improvements

  • Support <pre> blocks with Textractor (#749)
  • Update HF LLM to reduce noisy warnings (#752)
  • Update NLTK model downloads (#760)
  • Refactor benchmarks script (#761)
  • Update documentation to use base imports (#765)
  • Update examples to use RAG pipeline instead of Extractor when paired with LLMs (#766)
  • Modify NumPy and Torch ANN components to use np.load/np.save (#772)
  • Persist Embeddings index ids (only used when content storage is disabled) with MessagePack (#773)
  • Persist Reducer component with skops library (#774)
  • Persist NetworkX graph component with MessagePack (#775)
  • Persist Scoring component metadata with MessagePack (#776)
  • Modify vector transforms to load/save data using np.load/np.save (#777)
  • Refactor embeddings configuration into separate component (#778)
  • Document txtai index format (#779)

Bug Fixes

  • Translation: AttributeError: 'ModelInfo' object has no attribute 'modelId' (#750)
  • Change RAGTask to RagTask (#763)
  • Notebook 42 error (#768)