-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elasticsearch docs update #42422
Elasticsearch docs update #42422
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the suggestion!
Since bulk API allows more operations than just adding documents to the index, I'm thinking that it probably would be better to update the example a bit more ....
What if instead of receiving the file/input stream at the Resource level, we accept a lit of fruits, pass them to the service and let the service create a bulk x-ndjson
request.
Accepting a file as an input, opens up the possibility to do any kind of operations, which may be too much ...
We also should probably add a corresponding example in Java API as well. Even though, as you've mentioned the Elasticsearch guide already has that example, the Quarkus docs suggest this just above:
if you want to use the Java API client instead, follow the instructions in the Using the Elasticsearch Java Client paragraph instead
so we'd want to keep the two examples perform the same operations...
Ok, why not, i'll do it ) |
This comment has been minimized.
This comment has been minimized.
@marko-bekhta made some changes, feel free to review 🤗 |
Sorry for that. I've done a couple of typos and have not noticed it before commit. Now it fixed, I think... |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the updates. You've made good progress on this one.
I didn't notice that the services in the corresponding integration tests were unmodified. I'm sorry about that. I've pointed to those in the inline comments, along with some more suggestions.
Also, please squash your commits once you are done with the updates. Thanks!
@@ -151,13 +174,49 @@ public class FruitService { | |||
restClient.performRequest(request); //<4> | |||
} | |||
|
|||
public void indexBulk(List<Fruit> list) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry... i've missed anywhere in guideline, that i should touch some tests...
Sure, i'll do it )
Also create simple utility class, which help us create valid `ndjson` | ||
for proper work with https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#docs-bulk-api-desc[Bulk API]. | ||
|
||
[source, java] | ||
---- | ||
public class NDJson { | ||
private final List<JsonObject> list; | ||
|
||
public NDJson(List<JsonObject> list) { | ||
this.list = list; | ||
} | ||
|
||
public String encode() { | ||
var sb = new StringBuilder(); | ||
for (JsonObject json : list) { | ||
sb.append(json.encode()); | ||
sb.append("\n"); | ||
} | ||
return sb.toString(); | ||
} | ||
} | ||
---- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just add a simple util method to the fruit service instead:
private static String toNdJsonString(List<JsonObject> objects) {
return objects.stream()
.map(JsonObject::encode)
.collect(Collectors.joining("\n", "", "\n"));
}
the prefix/suffix (2d/3d parameters) are applied to the joined string, so adding the suffix will ensure that there is a new line at the end.
for (var fruit : list) { | ||
|
||
entityList.add(new JsonObject().put("index", new JsonObject() //<5> | ||
.put("_index", "fruits").put("_id", observation.id()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.put("_index", "fruits").put("_id", observation.id()))); | |
.put("_index", "fruits").put("_id", fruit.id))); |
public void indexBulk(List<Fruit> list) throws IOException { | ||
|
||
var entityList = new ArrayList<JsonObject>(); | ||
|
||
for (var fruit : list) { | ||
|
||
entityList.add(new JsonObject().put("index", new JsonObject() //<5> | ||
.put("_index", "fruits").put("_id", observation.id()))); | ||
entityList.add(JsonObject.mapFrom(fruit)); | ||
} | ||
|
||
Request request = new Request( | ||
"POST", "fruits/_bulk?pretty"); | ||
request.setEntity(new StringEntity( | ||
new NDJson(entityList).encode(), //<6> | ||
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type | ||
restClient.performRequest(request); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void indexBulk(List<Fruit> list) throws IOException { | |
var entityList = new ArrayList<JsonObject>(); | |
for (var fruit : list) { | |
entityList.add(new JsonObject().put("index", new JsonObject() //<5> | |
.put("_index", "fruits").put("_id", observation.id()))); | |
entityList.add(JsonObject.mapFrom(fruit)); | |
} | |
Request request = new Request( | |
"POST", "fruits/_bulk?pretty"); | |
request.setEntity(new StringEntity( | |
new NDJson(entityList).encode(), //<6> | |
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type | |
restClient.performRequest(request); | |
} | |
public void indexBulk(List<Fruit> list) throws IOException { | |
var entityList = new ArrayList<JsonObject>(); | |
for (var fruit : list) { | |
entityList.add(new JsonObject().put("index", new JsonObject() //<5> | |
.put("_index", "fruits").put("_id", observation.id()))); | |
entityList.add(JsonObject.mapFrom(fruit)); | |
} | |
Request request = new Request("POST", "fruits/_bulk?pretty"); | |
request.setEntity(new StringEntity( | |
new NDJson(entityList).encode(), //<6> | |
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type | |
restClient.performRequest(request); | |
} |
Let's try keeping the code format similar to what the rest of the service is doing. BTW, once you add the code to the service in the integration tests I've mentioned above and build it, Maven will take care of formatting it for you.
"POST", "fruits/_bulk?pretty"); | ||
request.setEntity(new StringEntity( | ||
new NDJson(entityList).encode(), //<6> | ||
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ContentType.create("application/x-ndjson"))); //pass apropriate Content-Type | |
ContentType.create("application/x-ndjson"))); // <7> | |
.... | |
<7> Pass the content type that is expected by the search backend for bulk requests. |
let's be consistent and use the callouts just like you've done in the line above.
restClient.performRequest(request); | ||
} | ||
|
||
public void delBulk(List<Long> identityList) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void delBulk(List<Long> identityList) throws IOException { | |
public void delete(List<Long> identityList) throws IOException { |
I'd use simple index(List<Fruit> fruits)
/ delete(List<Long> ids)
method names. Caller not necessarily need to know that we are using the bulk API to perform the operation.
<7> Delete operation in Bulk API requires no following body, so we iterate over fruit's identity list and construct body like shown below | ||
+ | ||
[source, json] | ||
---- | ||
{"delete", {"_index" : "fruits", "_id", "1"}}\n | ||
{"delete", {"_index" : "fruits", "_id", "2"}}\n | ||
... ... ... ...\n | ||
{"delete", {"_index" : "fruits", "_id", "N"}}\n | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<7> Delete operation in Bulk API requires no following body, so we iterate over fruit's identity list and construct body like shown below | |
+ | |
[source, json] | |
---- | |
{"delete", {"_index" : "fruits", "_id", "1"}}\n | |
{"delete", {"_index" : "fruits", "_id", "2"}}\n | |
... ... ... ...\n | |
{"delete", {"_index" : "fruits", "_id", "N"}}\n | |
---- | |
<7> The bulk API's delete operation JSON already contains all the required information; hence, there is no request body following this operation in the Bulk API request body. | |
+ | |
[source, json] | |
---- | |
{"delete", {"_index" : "fruits", "_id", "1"}} | |
{"delete", {"_index" : "fruits", "_id", "2"}} | |
... ... ... ... | |
{"delete", {"_index" : "fruits", "_id", "N"}} | |
---- |
I'd remove the \n
from the example and add the new line at the end, as we are not "escaping" any other characters.
|
||
for (var fruit : list) { | ||
br.operations(op -> op | ||
.index(idx -> idx.index("fruits").id(fruit.id()).document(fruit))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.index(idx -> idx.index("fruits").id(fruit.id()).document(fruit))); | |
.index(idx -> idx.index("fruits").id(fruit.id).document(fruit))); |
The bean in the example only has public fields.
if (result.errors()) { | ||
for (BulkResponseItem item: result.items()) { | ||
if (item.error() != null) { | ||
Log.error(item.error().reason()); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging an exception wouldn't let the caller know that something went wrong. For the sake of this example, I'd just throw an exception that something went wrong:
if (result.errors()) { | |
for (BulkResponseItem item: result.items()) { | |
if (item.error() != null) { | |
Log.error(item.error().reason()); | |
} | |
} | |
} | |
if (result.errors()) { | |
throw new RuntimException("The indexing operation encountered errors."); | |
} |
@marko-bekhta |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
🙈 The PR is closed and the preview is expired. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Thanks for making the changes, I'll try to take a look later today or tomorrow.
No, that's not necessary, unless you want to do so. Or if you want to discuss something in more details 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think we are nearly there 😃
We'd have to update the ndjson examples as they are missing the new-line
at the end; but, otherwise, this looks nice.
{"id": "N", "name": "dragonfruit", "color": "pink"} | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"id": "N", "name": "dragonfruit", "color": "pink"} | |
---- | |
{"id": "N", "name": "dragonfruit", "color": "pink"} | |
---- |
I think we have to have that "extra" new line at the end as that's what Elasticsearch expects to mark the end of the request body. Without it I think the request is rejected, no? (and the same with the other ndjson example)
import jakarta.ws.rs.Path; | ||
import jakarta.ws.rs.PathParam; | ||
import jakarta.ws.rs.QueryParam; | ||
import jakarta.ws.rs.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we try to stay away from the star imports 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it's mistake. Usually we too..
given() | ||
.contentType("application/json") | ||
.body(List.of(fruit)) | ||
.when().post("/fruits/bulk") | ||
.then() | ||
.statusCode(200); | ||
|
||
given() | ||
.contentType("application/json") | ||
.body(List.of(fruit.id)) | ||
.when().delete("/fruits/bulk") | ||
.then() | ||
.statusCode(200); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably wouldn't suggest this one, but since we'd have to update the ndjson examples in the doc file ...
Maybe we can create a new fruit index / check it exists / delete / check it doesn't exist anymore ?
see how just above Fruit result = get("/fruits/1").as(Fruit.class);
is used to get the fruit (we can use that to check if it exists)
you'll need to wait for the index to refresh so you may want to use awaitlity for that instatead of the Thread.sleep()
, e.g.:
await().atMost(5, TimeUnit.SECONDS).untilAsserted(() -> {
get("/fruits/1").statusCode(200);
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, that's nicee
This comment has been minimized.
This comment has been minimized.
2c210f5
to
f31a148
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
37b5550
to
0f5a321
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
0f5a321
to
cb2cc54
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
cb2cc54
to
3a623cd
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
3a623cd
to
b8fd2b6
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@marko-bekhta I've built locally (My machine on Windows 11) As can see, tests are successful. In checks, also tests on Windows are ok. What does it mean
And how i can fix it (( |
14893fb
to
12fb35b
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Finally, build successful ) @marko-bekhta |
Also, i think, got an idea why my code not works but your does. |
See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-get.html#realtime and elastic/elasticsearch#48843 Calling get can start returning the document before the index refresh is completed. If the index has not yet been refreshed, the search will not find the document. So it was either put all the asserts in the await block or put the search query in the await and then get by id outside after the search completes successfully. It worked okay in the past since there was a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks nice, I've added a couple comments inline, I assume the compiler plugin change was introduced while testing and wasn't removed before the push?
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-compiler-plugin</artifactId> | ||
<configuration> | ||
<source>11</source> | ||
<target>11</target> | ||
</configuration> | ||
</plugin> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bit unexpected ...
I'd expect that the compiler plugin configuration would come from the parent poms. We shouldn't override the versions in a test module... was there a particular reason for adding it?
<plugin> | |
<groupId>org.apache.maven.plugins</groupId> | |
<artifactId>maven-compiler-plugin</artifactId> | |
<configuration> | |
<source>11</source> | |
<target>11</target> | |
</configuration> | |
</plugin> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not do this intentionally.
I think I just used some suggestion of IDE and missed, when pushed commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inattention. I accepted the IDE's offer, but then overlooked it when reviewing the commit.
I've deleted these lines, and now I am trying to Quarkus build again.
|
||
@QuarkusTest | ||
public class FruitResourceTest { | ||
private static final TypeRef<List<Fruit>> LIST_OF_FRUIT_TYPE_REF = new TypeRef<>() { | ||
}; | ||
|
||
@Test | ||
public void testEndpoint() throws InterruptedException { | ||
public void testEndpoint() { | ||
RestAssured.defaultParser = Parser.JSON; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we are just not missing a content type header in some request/response ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand the meaning of the question. Do you mean that in this context this line makes no sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😃 what I was trying to say ... I'd expect that RestAssured figures out the parser on its own, and we don't need to specify one explicitly. But if it fails somewhere with an exception that it cannot parse the body, then maybe we are just missing a @Produces
annotation somewhere in our rest endpoint...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, year, got your point finally. Thank you for clarification :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😃 what I was trying to say ... I'd expect that RestAssured figures out the parser on its own, and we don't need to specify one explicitly. But if it fails somewhere with an exception that it cannot parse the body, then maybe we are just missing a
@Produces
annotation somewhere in our rest endpoint...
I've made all the changes.🌚
Added code snippet about Bulk API usage via low level RestClient expanded Bulk API usage description rewritten low level Rest Client example added Java API example added awaitility added test cases for bulk operations
12fb35b
to
c679d63
Compare
Status for workflow
|
Status for workflow
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to both of you! |
Added code snippet about Bulk API usage via low level
RestClient
.I've thought, that this is the important thing, which needs to be described in docs, while It's not so obvious how to use (official docs demonstrate only Java API Bulk API usage).
User should know how to do it, till this is really convenient way to index data with the best performance.
In code snippet, you can see
ByteArrayEntity
. I've triedInputStreamEntity
instead, but gotStream already closed
error.