-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BASE
declarations are now handled correctly
#1660
Conversation
7c98092
to
62f7f96
Compare
When a BASE IRI is defined, relative IRIs should be resolved against it. For example, with `BASE <http://example.org/>`, the relative IRI `<a>` should be resolved to `<http://example.org/a>`. This is a fist step in this direction. So far, most of the test fail because they use relative IRIs extensively, assuming that they are left untouched. PS: UniProt uses BASE extensively, so this is a blocker for parsing UniProt via Turtle.
62f7f96
to
a2f12a4
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1660 +/- ##
==========================================
+ Coverage 89.64% 89.68% +0.04%
==========================================
Files 383 383
Lines 36889 36942 +53
Branches 4167 4174 +7
==========================================
+ Hits 33068 33132 +64
+ Misses 2522 2510 -12
- Partials 1299 1300 +1 ☔ View full report in Codecov by Sentry. |
BASE
declarations are now handled correctly
TODO: I still don't understand how the `prefixMap_` can become empty again, despite the default initialization in the `TurtleParser` class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An initial and thorough round of reviews.
src/parser/RdfParser.h
Outdated
// TODO<hannah> I would prefer to just call `prefixMap_.at(...)`, but then | ||
// some of the tests fails because the keys are not in the map (despite the | ||
// initialization above). | ||
const TripleComponent::Iri& baseForRelativeIri() { | ||
// return prefixMap_.at(baseForRelativeIriKey_); | ||
return prefixMap_ | ||
.try_emplace(baseForRelativeIriKey_, TripleComponent::Iri{}) | ||
.first->second; | ||
} | ||
const TripleComponent::Iri& baseForAbsoluteIri() { | ||
// return prefixMap_.at(baseForAbsoluteIriKey_); | ||
return prefixMap_ | ||
.try_emplace(baseForAbsoluteIriKey_, TripleComponent::Iri{}) | ||
.first->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should definitely fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me, which tests those are (or I can see myself after the next iteration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect, that these test are FRIENDS and explicitly set the prefix map. This can be reimplemented by a setter function which inserts the prefixes if necessary.
src/index/CompressedRelation.cpp
Outdated
@@ -1332,7 +1332,7 @@ auto CompressedRelationWriter::createPermutationPair( | |||
// TODO<joka921> Use `CALL_FIXED_SIZE`. | |||
ad_utility::CompressedExternalIdTableSorter<decltype(compare), 0> | |||
twinRelationSorter(basename + ".twin-twinRelationSorter", numColumns, | |||
4_GB, alloc); | |||
30_GB, alloc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely part of a different PR.
src/parser/Iri.cpp
Outdated
pos = iri_.find('/', pos + 3); | ||
} | ||
// If there is no `/` after the scheme or the first such `/` is the last | ||
// character, the base IRI is the IRI itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This also is the return for the case
no scheme found
. Do you want to assert that we either have an absolute IRI as the base, or the empty base (for backwards compatibility) but nothing ill-formed either ?
(might be that you have this check elsewhere).
test/RdfParserTest.cpp
Outdated
forAllParsers(testWithParser, "<missing> <object> ."); | ||
// Make sure that the `.` is not the final character because we (need to) | ||
// handle that as a special case in `RdfStreamParser<T>::getLineImpl`. | ||
forAllParsers(testWithParser, "<missing> <object> . "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be working again with the discussed end regex fix
- it should be possible to add a test that checks, that we have fixed the uniprot bug with dots in the middle of an IRIref correctly (e.g. in the StreamAndParallelparser test, where we manually fiddle with blocks anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hjhhh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much!
Co-authored-by: Johannes Kalmbach <[email protected]>
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
When a
BASE
IRI is defined, IRIs without a scheme (like<http://...>
) should be resolved. For example, withBASE <http://purl.uniprot.org/uniprot/>
, the relative IRI<UPI001AF4585D>
should be resolved to<http://purl.uniprot.org/uniprot/UPI001AF4585D>
, and the absolute IRI</prosite/PS51927>
should be resolved to<http://purl.uniprot.org/prosite/PS51927>
.NOTE: Without
BASE
declaration, relative IRIs like<a>
are accepted and left unchanged. This is used extensively in our tests. This is not 100% compliant with the SPARQL standard, but we consider being slightly more accepting than the standard harmless in this case.