Skip to content

Releases: AnswerDotAI/RAGatouille

0.0.8post1

19 Mar 13:45
Compare
Choose a tag to compare

Minor fix: Corrects from time import time import introduced in indexing overhaul and causing crashing issues as time was then used improperly.

0.0.8

18 Mar 19:49
d27b693
Compare
Choose a tag to compare

0.0.8 is finally here!

Major changes:

  • Indexing overhaul contributed by @jlscheerer #158
  • Relaxed dependencies to ensure lower install load #173
  • Indexing for under 100k documents will by default no longer use Faiss, performing K-Means in pure PyTorch instead. This is a bit of an experimental change, but benchmark results are encouraging and result in greatly increased compatibility. #173
  • CRUD improvements by @anirudhdharmarajan. Feature is still experimental/not fully supported, but rapidly improving!

Fixes:

  • Many small bug fixes, mainly around typing
  • Training triplets improvement (already present in 0.0.7 post versions) by @JoshuaPurtell

0.0.7post3

16 Feb 19:30
Compare
Choose a tag to compare
  • Improvements for data preprocessing issues and fixes for broken training example by @jonppe (#138) 🙏

0.0.7post2

13 Feb 21:45
b7ae28a
Compare
Choose a tag to compare

Fixes & tweaks to the previous release:

  • Automatically adjust batch size on longer contexts (32 for 512 tokens, 16 for 1024, 8 for 2048, decreasing like this until a minimum of 1)
  • Apply dynamic max context length to reranking

0.0.7post1

13 Feb 20:55
Compare
Choose a tag to compare

Release focusing on length adjustments. Much more dynamism and on-the-fly adaptation, both for query length and maximum document length!

  • Remove hardcoded maximum length: it is now inferred from your base model's maximum position encodings. This enables support for longer-context ColBERT, such as Jina ColBERT
  • Upstream changes to colbert-ai to allow any base model to be used, rather than pre-defined ones.
  • Query length now adjusts dynamically, from 32 (hardcoded minimum) to your model's maximum context window for longer queries.

0.0.6c2

11 Feb 21:05
5409914
Compare
Choose a tag to compare

(notes encompassing changes in the last few PyPi releases that were undocumented until now)

Changes:

  • Query only a subset documents based on doc ids by @PrimoUomo89 #94
  • Return chunk ids in results thanks to @PrimoUomo89 #125
  • Lower kmeans iterations when not necessary to run more #129
  • Properly license the library as Apache-2 on PyPi

Fixes:

  • Dynamically increase search hyper parameters for large k values and lower doc counts, reducing the number of situations where the total number of documents return is substantially below k #131
  • Fix enabling the use of Training data processing with hard negatives turned off by @corrius #117
  • Proper handling of different input types when pre-processing training triplets by @GautamR-Samagra #115

0.0.6b5

05 Feb 17:06
Compare
Choose a tag to compare

Minor fixes&improvements release.

Community contribs:

0.0.6b2

29 Jan 21:04
5dbac07
Compare
Choose a tag to compare
  • Fix newly introduced dependency issue

0.0.6b0

28 Jan 19:47
Compare
Choose a tag to compare
  • Fixes sometimes skipped shuffling of training triplets
  • Fixes accidental duplicates when input training data has many more positives than negatives.
  • Bump to colbert-ai 0.2.18, fully removing multiprocessing calls when indexing

0.0.6a1

27 Jan 16:08
2ac1b1d
Compare
Choose a tag to compare

Fixes & minor improvements:

  • Better verbosity control, especially for high-CRUD encode() scenarios
  • Fixed max document length being often set too low when using in-memory encode() or rerank()
  • Allow forced overwrite of indexes (#63)
  • Fix wrong argument being passed to negative miner (should not have had any impact in practice)