-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy import for SERVICE #1491
Lazy import for SERVICE #1491
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1491 +/- ##
==========================================
- Coverage 94.15% 94.15% -0.01%
==========================================
Files 347 348 +1
Lines 25627 25698 +71
Branches 3445 3453 +8
==========================================
+ Hits 24130 24196 +66
- Misses 1455 1460 +5
Partials 42 42 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this already looks very nice.
There are some small improvements that I have suggested.
test/ServiceTest.cpp
Outdated
@@ -202,6 +202,7 @@ TEST_F(ServiceTest, computeResult) { | |||
[&](const std::string& result, std::string_view errorMsg, | |||
boost::beast::http::status status = boost::beast::http::status::ok, | |||
std::string contentType = "application/sparql-results+json") { | |||
LOG(INFO) << "MSG: " << errorMsg << '\n'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like some debugging stuff that can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the direction where this is headed, only a few small things are missing.
@@ -20,15 +20,19 @@ namespace ad_utility { | |||
*/ | |||
class LazyJsonParser { | |||
public: | |||
// Generator detail, the first 100 input characters for better error context. | |||
struct Details { | |||
std::string first100_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also want to have the last 100 characters in addition,
it might also happen, that everything starts alright, but in the middle something goes wrong, we also need context for that.
src/util/LazyJsonParser.cpp
Outdated
for (const auto& bytes : partialJson) { | ||
co_yield reinterpret_cast<const char*>(bytes.data()); | ||
co_yield std::string(reinterpret_cast<const char*>(bytes.data()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
co_yield std::string(reinterpret_cast<const char*>(bytes.data()), | |
co_yield std::string_view(reinterpret_cast<const char*>(bytes.data()), |
src/engine/Service.cpp
Outdated
// ____________________________________________________________________________ | ||
void Service::verifyVariables( | ||
const nlohmann::json& j, | ||
const ad_utility::LazyJsonParser::Generator& gen) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It suffices to pass const Details&
as the second argument, I was confused what you need the generator for here and how this should work:)
src/engine/Service.cpp
Outdated
throwErrorWithContext("JSON result does not have the expected structure", | ||
gen.details().first100_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also include the complete "head"
clause in this error message, because that's what causes the problem here, so something like
... does not have the expected structere, as its "head" section is not according to the SPARQL standard. The "head"section is" .... dump the subjson here
.
|
||
if (responseVars != expectedVars) { | ||
throwErrorWithContext( | ||
absl::StrCat("Header row of JSON result for SERVICE query is \"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this message (as well as the one before) also add a "Probable cause: the remote endpoint sent a JSON response that is not according to the SPARQL standard".
src/engine/Service.cpp
Outdated
">: ", sv, ". First 100 bytes of the response: ", | ||
ctx.substr(0, std::min(100, (int)ctx.size())))); | ||
this->throwErrorWithContext(sv, | ||
ctx.substr(0, std::min(100, (int)ctx.size()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.substr(0, std::min(100, (int)ctx.size()))); | |
ctx.substr(0, 100)); |
substr
automatically handles the out-of-bounds case.
src/engine/Service.cpp
Outdated
}; | ||
|
||
// Verify status and content-type of the response. | ||
if (response.status_ != boost::beast::http::status::ok) { | ||
LOG(INFO) << serviceUrl.asString() << '\n'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that remaing here, or was that just for debugging?
src/engine/Service.cpp
Outdated
if (!resultExists || !varsChecked) { | ||
throwErrorWithContext("JSON result does not have the expected structure", | ||
body.details().first100_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make two distinct checks with distinct error messages
(head section was missing
) or results section was missing
.
test/ServiceTest.cpp
Outdated
expectThrowOrSilence( | ||
"{\"head\": {\"vars\": 1}," | ||
"\"results\": {\"bindings\": {}}}", | ||
"JSON result does not have the expected structure."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a test case for results section missing
and results section has the wrong structure
is missing.
In particular we probably want to catch the internal exceptions of the LazyJson parser and make something user-friendly out of them. (People will complain to us, even if its the fault of the other endpoint, so we should help our future selves as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small things left.
src/util/LazyJsonParser.cpp
Outdated
details.last100_ = chunk.substr( | ||
std::max(0, static_cast<int>(chunk.size() - 100)), 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to have the last 100 always.
So overwrite them directly after setting the first100
, but outside the if of course.
That way we also get the context if the parsing of the chunk failed, which is probably the most useful.
src/util/LazyJsonParser.cpp
Outdated
@@ -250,7 +252,7 @@ std::optional<nlohmann::json> LazyJsonParser::constructResultFromParsedChunk( | |||
size_t nextChunkStart = | |||
materializeEnd == 0 ? 0 : std::min(materializeEnd + 1, input_.size()); | |||
if (input_.size() - nextChunkStart >= 1'000'000) { | |||
throw std::runtime_error("Ill formed Json."); | |||
throw Error("Ill formed JSON."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message should be something like
QLever currently doesn't support SERVICE results where a single result row is larger than 1 MB
.
Quality Gate passedIssues Measures |
Integrate the
LazyJsonParser
introduced in #1412 into theSERVICE
Operation, which will help to reduce RAM usage for the import of large results. In particular, the (possibly large) JSON result of a SERVICE will not be fully materialized, but converted to a (possibly much smaller)IdTable
on the fly. This is a preparation for making the SERVICE operation completely lazy.