Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunking by using Sliding Window #3

Open
SInginc opened this issue Aug 22, 2024 · 0 comments
Open

Chunking by using Sliding Window #3

SInginc opened this issue Aug 22, 2024 · 0 comments

Comments

@SInginc
Copy link

SInginc commented Aug 22, 2024

Thank you so much for the wonderful pre-print and sharing the source code!

In the pre-print, a sliding window method was mentioned to be used in the chunking:

To reduce noise generated by sequential processing, we implement a sliding window technique, managing five paragraphs at a time. We continuously adjust the window by removing the first paragraph and adding the next, maintaining focus on topic consistency.

In data_chunk.py, I observed a sequential process of

  1. split the text by \n\n
  2. extract propositions from each paragraph
  3. use add_propositions of AgenticChunker to do the chunking.

And add_propositions was sequentially adding propositions.

In the add_proposition of AgenticChunker, I observed that a proposition was added based on:

  1. Whether it's the first one
  2. Whether there are any relevant chunk

And in the _find_relevant_chunk, I think all the existing chunks were used for finding the most relevant chunk.

I will be very appreciative if you can point me to the part of using sliding window!

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant