Chunking by using Sliding Window #3

SInginc · 2024-08-22T04:05:51Z

Thank you so much for the wonderful pre-print and sharing the source code!

In the pre-print, a sliding window method was mentioned to be used in the chunking:

To reduce noise generated by sequential processing, we implement a sliding window technique, managing five paragraphs at a time. We continuously adjust the window by removing the first paragraph and adding the next, maintaining focus on topic consistency.

In data_chunk.py, I observed a sequential process of

split the text by \n\n
extract propositions from each paragraph
use add_propositions of AgenticChunker to do the chunking.

And add_propositions was sequentially adding propositions.

In the add_proposition of AgenticChunker, I observed that a proposition was added based on:

Whether it's the first one
Whether there are any relevant chunk

And in the _find_relevant_chunk, I think all the existing chunks were used for finding the most relevant chunk.

I will be very appreciative if you can point me to the part of using sliding window!

Thank you so much!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunking by using Sliding Window #3

Chunking by using Sliding Window #3

SInginc commented Aug 22, 2024

Chunking by using Sliding Window #3

Chunking by using Sliding Window #3

Comments

SInginc commented Aug 22, 2024