Add example to text chunking processor documentation #6794

yuye-aws · 2024-03-27T09:59:32Z

Description

Add example to text chunking processor documentation

Issues Resolved

Add example to text chunking processor documentation #6707

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: yuye-aws <[email protected]>

Signed-off-by: Fanit Kolchina <[email protected]>

Signed-off-by: yuye-aws <[email protected]>

natebower

@yuye-aws @kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!

natebower · 2024-03-29T12:21:59Z

_ingest-pipelines/processors/text-chunking.md

-}
-```
-{% include copy-curl.html %}
+Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/).


The text after the comma in the first sentence is a bit redundant. Can we simplify?

Shall we also change the sentence from other files like text-image-embedding.md?

Creating an index is a separate step so I think separating the two makes sense, even though "ingest" appears in both parts of the sentence.

_search-plugins/text-chunking.md

natebower · 2024-03-29T12:27:51Z

_search-plugins/text-chunking.md

+
+## Step 2: Create an index for ingestion
+
+In order to use the ingest pipeline, you need to create a k-NN index. The `passage_chunk_embedding` field must be of a `nested` type. The `knn.dimension` field must contain the number of dimensions for your model:


"the" nested type?

Already changed

natebower · 2024-03-29T12:29:16Z

_search-plugins/text-chunking.md

+
+## Step 4: Search the index using neural search
+
+You can use a `nested` query to perform vector search on your index. We recommend setting `score_mode` to `max`, where the document score is set to the maximum of the scores from all passage embeddings:


Is "maximum" the right word here, or do we mean "highest"?

Either way. Changed to "highest".

_search-plugins/text-chunking.md

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Yuye Zhu <[email protected]>

Signed-off-by: yuye-aws <[email protected]>

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws

LGTM. Thank you, @yuye-aws!

yuye-aws added 2 commits March 27, 2024 17:54

add search document example for text chunking and embedding pipeline

567bc45

Signed-off-by: yuye-aws <[email protected]>

tune document

f4cb100

Signed-off-by: yuye-aws <[email protected]>

yuye-aws requested review from hdhalter, kolchfa-aws, Naarcha-AWS, vagimeli, AMoo-Miki, natebower, dlvenable and stephen-crawford as code owners March 27, 2024 09:59

hdhalter added release-notes PR: Include this PR in the automated release notes v2.13.0 labels Mar 27, 2024

kolchfa-aws self-assigned this Mar 28, 2024

kolchfa-aws and others added 2 commits March 28, 2024 20:32

Add the text chunking page

e361da1

Signed-off-by: Fanit Kolchina <[email protected]>

correct example

5b840f0

Signed-off-by: yuye-aws <[email protected]>

natebower reviewed Mar 29, 2024

View reviewed changes

hdhalter added the 5 - Editorial review PR: Editorial review in progress label Mar 29, 2024

yuye-aws and others added 5 commits March 29, 2024 23:00

Update _search-plugins/text-chunking.md

2e6201e

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Yuye Zhu <[email protected]>

Update _search-plugins/text-chunking.md

5c260e7

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Yuye Zhu <[email protected]>

resolve review comments

6b867c4

Signed-off-by: yuye-aws <[email protected]>

Move cascading section to processor file

1b6ae0d

Signed-off-by: Fanit Kolchina <[email protected]>

Merge branch 'main' into text_chunking_nested_example

3002729

kolchfa-aws approved these changes Mar 29, 2024

View reviewed changes

kolchfa-aws merged commit d676a79 into opensearch-project:main Mar 29, 2024
3 checks passed

hdhalter added 3 - Tech review PR: Tech review in progress 3 - Done Issue is done/complete and removed 3 - Tech review PR: Tech review in progress 5 - Editorial review PR: Editorial review in progress labels Mar 29, 2024

yuye-aws deleted the text_chunking_nested_example branch August 29, 2024 05:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example to text chunking processor documentation #6794

Add example to text chunking processor documentation #6794

yuye-aws commented Mar 27, 2024 •

edited

Loading

natebower left a comment

natebower Mar 29, 2024

yuye-aws Mar 29, 2024

kolchfa-aws Mar 29, 2024

natebower Mar 29, 2024

yuye-aws Mar 29, 2024

natebower Mar 29, 2024

kolchfa-aws Mar 29, 2024

kolchfa-aws left a comment


		## Step 2: Create an index for ingestion

		In order to use the ingest pipeline, you need to create a k-NN index. The `passage_chunk_embedding` field must be of a `nested` type. The `knn.dimension` field must contain the number of dimensions for your model:


		## Step 4: Search the index using neural search

		You can use a `nested` query to perform vector search on your index. We recommend setting `score_mode` to `max`, where the document score is set to the maximum of the scores from all passage embeddings:

Add example to text chunking processor documentation #6794

Add example to text chunking processor documentation #6794

Conversation

yuye-aws commented Mar 27, 2024 • edited Loading

Description

Issues Resolved

Checklist

natebower left a comment

Choose a reason for hiding this comment

natebower Mar 29, 2024

Choose a reason for hiding this comment

yuye-aws Mar 29, 2024

Choose a reason for hiding this comment

kolchfa-aws Mar 29, 2024

Choose a reason for hiding this comment

natebower Mar 29, 2024

Choose a reason for hiding this comment

yuye-aws Mar 29, 2024

Choose a reason for hiding this comment

natebower Mar 29, 2024

Choose a reason for hiding this comment

kolchfa-aws Mar 29, 2024

Choose a reason for hiding this comment

kolchfa-aws left a comment

Choose a reason for hiding this comment

yuye-aws commented Mar 27, 2024 •

edited

Loading