Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch docs update #42422

Merged
merged 1 commit into from
Sep 30, 2024
Merged

Elasticsearch docs update #42422

merged 1 commit into from
Sep 30, 2024

Conversation

lasteris
Copy link
Contributor

@lasteris lasteris commented Aug 8, 2024

Added code snippet about Bulk API usage via low level RestClient.

I've thought, that this is the important thing, which needs to be described in docs, while It's not so obvious how to use (official docs demonstrate only Java API Bulk API usage).
User should know how to do it, till this is really convenient way to index data with the best performance.

In code snippet, you can see ByteArrayEntity. I've tried InputStreamEntity instead, but got Stream already closed error.

@quarkus-bot

This comment has been minimized.

Copy link
Contributor

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the suggestion!

Since bulk API allows more operations than just adding documents to the index, I'm thinking that it probably would be better to update the example a bit more ....

What if instead of receiving the file/input stream at the Resource level, we accept a lit of fruits, pass them to the service and let the service create a bulk x-ndjson request.

Accepting a file as an input, opens up the possibility to do any kind of operations, which may be too much ...

We also should probably add a corresponding example in Java API as well. Even though, as you've mentioned the Elasticsearch guide already has that example, the Quarkus docs suggest this just above:

if you want to use the Java API client instead, follow the instructions in the Using the Elasticsearch Java Client paragraph instead

so we'd want to keep the two examples perform the same operations...

@lasteris
Copy link
Contributor Author

lasteris commented Aug 9, 2024

Hey, thanks for the suggestion!

Since bulk API allows more operations than just adding documents to the index, I'm thinking that it probably would be better to update the example a bit more ....

What if instead of receiving the file/input stream at the Resource level, we accept a lit of fruits, pass them to the service and let the service create a bulk x-ndjson request.

Accepting a file as an input, opens up the possibility to do any kind of operations, which may be too much ...

We also should probably add a corresponding example in Java API as well. Even though, as you've mentioned the Elasticsearch guide already has that example, the Quarkus docs suggest this just above:

if you want to use the Java API client instead, follow the instructions in the Using the Elasticsearch Java Client paragraph instead

so we'd want to keep the two examples perform the same operations...

Ok, why not, i'll do it )

@quarkus-bot

This comment has been minimized.

@lasteris
Copy link
Contributor Author

@marko-bekhta made some changes, feel free to review 🤗

@lasteris
Copy link
Contributor Author

lasteris commented Aug 11, 2024

Sorry for that. I've done a couple of typos and have not noticed it before commit. Now it fixed, I think...
I've tested code in my project, but copied code inattentively. I hope next time such bad thing not happen to me(

@quarkus-bot

This comment has been minimized.

Copy link
Contributor

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the updates. You've made good progress on this one.
I didn't notice that the services in the corresponding integration tests were unmodified. I'm sorry about that. I've pointed to those in the inline comments, along with some more suggestions.
Also, please squash your commits once you are done with the updates. Thanks!

@@ -151,13 +174,49 @@ public class FruitService {
restClient.performRequest(request); //<4>
}

public void indexBulk(List<Fruit> list) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could find the corresponding source files for these services in

and

could you add the code there, and then just copy it back to the guide, that way you'll be sure that all works ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry... i've missed anywhere in guideline, that i should touch some tests...
Sure, i'll do it )

Comment on lines 117 to 139
Also create simple utility class, which help us create valid `ndjson`
for proper work with https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#docs-bulk-api-desc[Bulk API].

[source, java]
----
public class NDJson {
private final List<JsonObject> list;

public NDJson(List<JsonObject> list) {
this.list = list;
}

public String encode() {
var sb = new StringBuilder();
for (JsonObject json : list) {
sb.append(json.encode());
sb.append("\n");
}
return sb.toString();
}
}
----

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just add a simple util method to the fruit service instead:

private static String toNdJsonString(List<JsonObject> objects) {
    return objects.stream()
            .map(JsonObject::encode)
            .collect(Collectors.joining("\n", "", "\n"));
}

the prefix/suffix (2d/3d parameters) are applied to the joined string, so adding the suffix will ensure that there is a new line at the end.

for (var fruit : list) {

entityList.add(new JsonObject().put("index", new JsonObject() //<5>
.put("_index", "fruits").put("_id", observation.id())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.put("_index", "fruits").put("_id", observation.id())));
.put("_index", "fruits").put("_id", fruit.id)));

Comment on lines 177 to 174
public void indexBulk(List<Fruit> list) throws IOException {

var entityList = new ArrayList<JsonObject>();

for (var fruit : list) {

entityList.add(new JsonObject().put("index", new JsonObject() //<5>
.put("_index", "fruits").put("_id", observation.id())));
entityList.add(JsonObject.mapFrom(fruit));
}

Request request = new Request(
"POST", "fruits/_bulk?pretty");
request.setEntity(new StringEntity(
new NDJson(entityList).encode(), //<6>
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type
restClient.performRequest(request);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void indexBulk(List<Fruit> list) throws IOException {
var entityList = new ArrayList<JsonObject>();
for (var fruit : list) {
entityList.add(new JsonObject().put("index", new JsonObject() //<5>
.put("_index", "fruits").put("_id", observation.id())));
entityList.add(JsonObject.mapFrom(fruit));
}
Request request = new Request(
"POST", "fruits/_bulk?pretty");
request.setEntity(new StringEntity(
new NDJson(entityList).encode(), //<6>
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type
restClient.performRequest(request);
}
public void indexBulk(List<Fruit> list) throws IOException {
var entityList = new ArrayList<JsonObject>();
for (var fruit : list) {
entityList.add(new JsonObject().put("index", new JsonObject() //<5>
.put("_index", "fruits").put("_id", observation.id())));
entityList.add(JsonObject.mapFrom(fruit));
}
Request request = new Request("POST", "fruits/_bulk?pretty");
request.setEntity(new StringEntity(
new NDJson(entityList).encode(), //<6>
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type
restClient.performRequest(request);
}

Let's try keeping the code format similar to what the rest of the service is doing. BTW, once you add the code to the service in the integration tests I've mentioned above and build it, Maven will take care of formatting it for you.

"POST", "fruits/_bulk?pretty");
request.setEntity(new StringEntity(
new NDJson(entityList).encode(), //<6>
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type
ContentType.create("application/x-ndjson"))); // <7>
....
<7> Pass the content type that is expected by the search backend for bulk requests.

let's be consistent and use the callouts just like you've done in the line above.

restClient.performRequest(request);
}

public void delBulk(List<Long> identityList) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void delBulk(List<Long> identityList) throws IOException {
public void delete(List<Long> identityList) throws IOException {

I'd use simple index(List<Fruit> fruits)/ delete(List<Long> ids) method names. Caller not necessarily need to know that we are using the bulk API to perform the operation.

Comment on lines 270 to 267
<7> Delete operation in Bulk API requires no following body, so we iterate over fruit's identity list and construct body like shown below
+
[source, json]
----
{"delete", {"_index" : "fruits", "_id", "1"}}\n
{"delete", {"_index" : "fruits", "_id", "2"}}\n
... ... ... ...\n
{"delete", {"_index" : "fruits", "_id", "N"}}\n
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<7> Delete operation in Bulk API requires no following body, so we iterate over fruit's identity list and construct body like shown below
+
[source, json]
----
{"delete", {"_index" : "fruits", "_id", "1"}}\n
{"delete", {"_index" : "fruits", "_id", "2"}}\n
... ... ... ...\n
{"delete", {"_index" : "fruits", "_id", "N"}}\n
----
<7> The bulk API's delete operation JSON already contains all the required information; hence, there is no request body following this operation in the Bulk API request body.
+
[source, json]
----
{"delete", {"_index" : "fruits", "_id", "1"}}
{"delete", {"_index" : "fruits", "_id", "2"}}
... ... ... ...
{"delete", {"_index" : "fruits", "_id", "N"}}
----

I'd remove the \n from the example and add the new line at the end, as we are not "escaping" any other characters.


for (var fruit : list) {
br.operations(op -> op
.index(idx -> idx.index("fruits").id(fruit.id()).document(fruit)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.index(idx -> idx.index("fruits").id(fruit.id()).document(fruit)));
.index(idx -> idx.index("fruits").id(fruit.id).document(fruit)));

The bean in the example only has public fields.

Comment on lines 517 to 518
if (result.errors()) {
for (BulkResponseItem item: result.items()) {
if (item.error() != null) {
Log.error(item.error().reason());
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging an exception wouldn't let the caller know that something went wrong. For the sake of this example, I'd just throw an exception that something went wrong:

Suggested change
if (result.errors()) {
for (BulkResponseItem item: result.items()) {
if (item.error() != null) {
Log.error(item.error().reason());
}
}
}
if (result.errors()) {
throw new RuntimException("The indexing operation encountered errors.");
}

@lasteris
Copy link
Contributor Author

@marko-bekhta
The changes you requested have been made.
I merged the commits.
And I also have a question.
What is the usual review communication procedure?
Should I reply to every comment?
Or will you look at my changes yourself and clarify the status if you have any questions?

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

Copy link

github-actions bot commented Aug 17, 2024

🙈 The PR is closed and the preview is expired.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@marko-bekhta
Copy link
Contributor

Thanks for making the changes, I'll try to take a look later today or tomorrow.

Should I reply to every comment?

No, that's not necessary, unless you want to do so. Or if you want to discuss something in more details 🙂

Copy link
Contributor

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think we are nearly there 😃
We'd have to update the ndjson examples as they are missing the new-line at the end; but, otherwise, this looks nice.

Comment on lines 251 to 256
{"id": "N", "name": "dragonfruit", "color": "pink"}
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"id": "N", "name": "dragonfruit", "color": "pink"}
----
{"id": "N", "name": "dragonfruit", "color": "pink"}
----

I think we have to have that "extra" new line at the end as that's what Elasticsearch expects to mark the end of the request body. Without it I think the request is rejected, no? (and the same with the other ndjson example)

import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.QueryParam;
import jakarta.ws.rs.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we try to stay away from the star imports 🙈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it's mistake. Usually we too..

Comment on lines 60 to 75
given()
.contentType("application/json")
.body(List.of(fruit))
.when().post("/fruits/bulk")
.then()
.statusCode(200);

given()
.contentType("application/json")
.body(List.of(fruit.id))
.when().delete("/fruits/bulk")
.then()
.statusCode(200);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably wouldn't suggest this one, but since we'd have to update the ndjson examples in the doc file ...
Maybe we can create a new fruit index / check it exists / delete / check it doesn't exist anymore ?

see how just above Fruit result = get("/fruits/1").as(Fruit.class); is used to get the fruit (we can use that to check if it exists)
you'll need to wait for the index to refresh so you may want to use awaitlity for that instatead of the Thread.sleep(), e.g.:

await().atMost(5, TimeUnit.SECONDS).untilAsserted(() -> {
            get("/fruits/1").statusCode(200);
        });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, that's nicee

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@lasteris
Copy link
Contributor Author

lasteris commented Aug 24, 2024

@marko-bekhta I've built locally (My machine on Windows 11)
изображение

As can see, tests are successful. In checks, also tests on Windows are ok. What does it mean

  • Quarkus CI / JVM Tests - JDK 17 (Failed)
  • Quarkus CI / JVM Tests - JDK 21 (Failed)
  • Quarkus CI / Native Tests - Data6 (Failed)

And how i can fix it ((

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@lasteris
Copy link
Contributor Author

Finally, build successful ) @marko-bekhta
Thank you for helping me throughout all the problems.

@lasteris
Copy link
Contributor Author

Also, i think, got an idea why my code not works but your does.
Am i right that its because whole process need to be wrapped in await, or else we just call search in a middle of indexing process ?

@marko-bekhta
Copy link
Contributor

Also, i think, got an idea why my code not works but your does. Am i right that its because whole process need to be wrapped in await, or else we just call search in a middle of indexing process ?

See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-get.html#realtime and elastic/elasticsearch#48843

Calling get can start returning the document before the index refresh is completed. If the index has not yet been refreshed, the search will not find the document. So it was either put all the asserts in the await block or put the search query in the await and then get by id outside after the search completes successfully.
Having all the asserts together seemed like a nicer option 😃.

It worked okay in the past since there was a thread.sleep() which waited for the refresh to complete before sending requests.

Copy link
Contributor

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks nice, I've added a couple comments inline, I assume the compiler plugin change was introduced while testing and wasn't removed before the push?

Comment on lines 125 to 132
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>11</source>
<target>11</target>
</configuration>
</plugin>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit unexpected ...
I'd expect that the compiler plugin configuration would come from the parent poms. We shouldn't override the versions in a test module... was there a particular reason for adding it?

Suggested change
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>11</source>
<target>11</target>
</configuration>
</plugin>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not do this intentionally.
I think I just used some suggestion of IDE and missed, when pushed commit

Copy link
Contributor Author

@lasteris lasteris Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inattention. I accepted the IDE's offer, but then overlooked it when reviewing the commit.
I've deleted these lines, and now I am trying to Quarkus build again.


@QuarkusTest
public class FruitResourceTest {
private static final TypeRef<List<Fruit>> LIST_OF_FRUIT_TYPE_REF = new TypeRef<>() {
};

@Test
public void testEndpoint() throws InterruptedException {
public void testEndpoint() {
RestAssured.defaultParser = Parser.JSON;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we are just not missing a content type header in some request/response ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand the meaning of the question. Do you mean that in this context this line makes no sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😃 what I was trying to say ... I'd expect that RestAssured figures out the parser on its own, and we don't need to specify one explicitly. But if it fails somewhere with an exception that it cannot parse the body, then maybe we are just missing a @Produces annotation somewhere in our rest endpoint...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, year, got your point finally. Thank you for clarification :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😃 what I was trying to say ... I'd expect that RestAssured figures out the parser on its own, and we don't need to specify one explicitly. But if it fails somewhere with an exception that it cannot parse the body, then maybe we are just missing a @Produces annotation somewhere in our rest endpoint...

I've made all the changes.🌚

Added code snippet about Bulk API usage via low level RestClient

expanded Bulk API usage description

rewritten low level Rest Client example
added Java API example

added awaitility
added test cases for bulk operations
@quarkus-bot
Copy link

quarkus-bot bot commented Aug 29, 2024

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit c679d63.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

Warning

There are other workflow runs running, you probably need to wait for their status before merging.

You can consult the Develocity build scans.

@quarkus-bot
Copy link

quarkus-bot bot commented Aug 29, 2024

Status for workflow Quarkus Documentation CI

This is the status report for running Quarkus Documentation CI on commit c679d63.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

Copy link
Contributor

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @lasteris 😃
@gsmet I think this one is ready now.

@gsmet gsmet merged commit 792033d into quarkusio:main Sep 30, 2024
23 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.16 - main milestone Sep 30, 2024
@gsmet
Copy link
Member

gsmet commented Sep 30, 2024

Thanks to both of you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants