Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example to text chunking processor documentation #6794

Merged

Conversation

yuye-aws
Copy link
Member

@yuye-aws yuye-aws commented Mar 27, 2024

Description

Add example to text chunking processor documentation

Issues Resolved

Add example to text chunking processor documentation #6707

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@hdhalter hdhalter added release-notes PR: Include this PR in the automated release notes v2.13.0 labels Mar 27, 2024
@kolchfa-aws kolchfa-aws self-assigned this Mar 28, 2024
kolchfa-aws and others added 2 commits March 28, 2024 20:32
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: yuye-aws <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuye-aws @kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!

}
```
{% include copy-curl.html %}
Once you have created an ingest pipeline, you need to create an index for ingestion and ingest documents into the index. To learn more, see [Text chunking]({{site.url}}{{site.baseurl}}/search-plugins/text-chunking/).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text after the comma in the first sentence is a bit redundant. Can we simplify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also change the sentence from other files like text-image-embedding.md?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating an index is a separate step so I think separating the two makes sense, even though "ingest" appears in both parts of the sentence.

_search-plugins/text-chunking.md Outdated Show resolved Hide resolved

## Step 2: Create an index for ingestion

In order to use the ingest pipeline, you need to create a k-NN index. The `passage_chunk_embedding` field must be of a `nested` type. The `knn.dimension` field must contain the number of dimensions for your model:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the" nested type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already changed


## Step 4: Search the index using neural search

You can use a `nested` query to perform vector search on your index. We recommend setting `score_mode` to `max`, where the document score is set to the maximum of the scores from all passage embeddings:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "maximum" the right word here, or do we mean "highest"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way. Changed to "highest".

_search-plugins/text-chunking.md Outdated Show resolved Hide resolved
@hdhalter hdhalter added the 5 - Editorial review PR: Editorial review in progress label Mar 29, 2024
Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you, @yuye-aws!

@kolchfa-aws kolchfa-aws merged commit d676a79 into opensearch-project:main Mar 29, 2024
3 checks passed
@hdhalter hdhalter added 3 - Tech review PR: Tech review in progress 3 - Done Issue is done/complete and removed 3 - Tech review PR: Tech review in progress 5 - Editorial review PR: Editorial review in progress labels Mar 29, 2024
@yuye-aws yuye-aws deleted the text_chunking_nested_example branch August 29, 2024 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Done Issue is done/complete release-notes PR: Include this PR in the automated release notes v2.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants