Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing geo:wktLiteral datatype #1521

Merged
merged 1 commit into from
Sep 27, 2024

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Sep 27, 2024

After the previous commit, all WKT literals except POINTs were missing the geo:wktLiteral datatype. This is now fixed again.

Signed-off-by: Johannes Kalmbach <johannes.kalmbach@gmail.com>
@joka921 joka921 requested a review from ullingerc September 27, 2024 15:25
Copy link

codecov bot commented Sep 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.12%. Comparing base (85793e3) to head (b174038).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1521      +/-   ##
==========================================
- Coverage   88.12%   88.12%   -0.01%     
==========================================
  Files         357      357              
  Lines       26764    26765       +1     
  Branches     3606     3606              
==========================================
- Hits        23587    23586       -1     
  Misses       1941     1941              
- Partials     1236     1238       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ullingerc ullingerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm I missed this detail and all WKT literals except points are currently missing their datatypes in the output. Thank you for spotting and fixing this.

Copy link

@hannahbast hannahbast changed the title Fix the wkt literals again. Fix missing WKT datatype bug from previous commit Sep 27, 2024
@hannahbast hannahbast changed the title Fix missing WKT datatype bug from previous commit Fix missing geo:wktLiteral datatype Sep 27, 2024
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix and the corresponding test!

@hannahbast hannahbast merged commit f39907c into ad-freiburg:master Sep 27, 2024
19 of 20 checks passed
@hannahbast
Copy link
Member

@ullingerc Even after this fix, there is still an inconsistency. Namely, in the TSV export, the points appear as POINT(...) without quotes and without datatype, whereas all other geometry types appear in quotes and with datatype. The TSV format does not require a particular format, but we should of course make it consistent. Do you know how to fix this?

@ullingerc
Copy link
Collaborator

This may be caused by this review: #1506 (comment)
I then removed the full string representation with quotes and datatype. Tomorrow I will look into it and check if this is actually the cause.

@ullingerc
Copy link
Collaborator

@hannahbast I have checked it and the bug has nothing to do specifically with my implementation in #1506 . It also affects the date type. See for example:

"1950-01-01T00:00:00\n",
// should be
// "\"1950-01-01T00:00:00\"^^<http://www.w3.org/2001/XMLSchema#dateTime>\n",
// but that is a bug in the TSV export for another PR. Note: the duplicate
// quotes are due to the escaping for CSV.

@hannahbast
Copy link
Member

@ullingerc Thanks for the info, than this is work for a separate PR. But I just hit another problem, when trying to build the index for the latest version of Wikidata (which never had problems with coordinates in the past):

2024-09-28 15:26:12.867 - ERROR: Parse error at byte position 116541359260: /local/data-ssd/qlever/qlever-code/src/parser/GeoPoint.cpp, line 19: The given value 100 is out of range for latitude coordinates.

Any idea, what the problem is here? Anyway, it would be good to output the whole POINT... string here.

@hannahbast
Copy link
Member

@ullingerc PS: Shortly before the parse fails, I find the following in the dataset: "Point(0 99.999999)"^^geo:wktLiteral. I understand that a latitude should not be larger than 90. But it would be good to be more lenient here and either omit the triple or not "fold" it but leave it as an ordinary literal (maybe without datatype). We already have warning messages for that at other places, for example:

2024-09-20 22:17:08.122 - WARN: IRI ref not standard-compliant: <http://rdf.ncbi.nlm.nih.gov/pubchem/patentipc/C4B 16-02>

@ullingerc
Copy link
Collaborator

@hannahbast It's very helpful that you found the problematic item, thanks. I think we should keep the verification for coordinates, otherwise the folding will no longer work in a meaningful way. However, I agree, the error should be handled more gracefully, similar to the example you have pointed out.

@ldp2211479
Copy link

@ullingerc I am also experiencing the same problem while parsing the latest wikidata, is there any solution for this at the moment?
For example:

ERROR:  Parse error at byte position 106729189004: /app/src/parser/GeoPoint.cpp, line 19: The given value 100 is out of range for latitude coordinates.
The next 500 bytes are:
 ;
        psv:P625 v:34805a86778b1accf78d4c9a01e74da2 ;
        pq:P2241 wd:Q29998666 ;
        prov:wasDerivedFrom ref:f36f7aab3afd293c532f86affede54cadeaee06d .

wd:Q113370244 p:P625 s:Q113370244-e4611765-45df-46e7-6f91-e6bb2c2d27c2 .

@ullingerc
Copy link
Collaborator

@ldp2211479 Yes, I have proposed a solution in PR #1525 , which is unfortunately not yet merged into master.

@ldp2211479
Copy link

@ldp2211479是的,我在 PR #1525中提出了一个解决方案,但不幸的是它还没有合并到 master 中。

Thank you for your reply. I don't know much about this area, could you please provide a direct way to temporarily solve this problem for wikidata? For example, should I need to change the code /app/src/parser/GeoPoint.cpp?

@ullingerc
Copy link
Collaborator

@ldp2211479 You could either directly pull the code from the PR, or alternatively add this code manually for a quick fix:

} catch (const CoordinateOutOfRangeException& ex) {
LOG(DEBUG) << normalizedLiteralContent
<< " could not be parsed as GeoPoint object for the following "
"reason: "
<< ex.what()
<< ". It is treated as a plain string literal without datatype "
"instead"
<< std::endl;
parser->lastParseResult_ = std::move(literal);

@ldp2211479
Copy link

Thank you!

@joka921
Copy link
Member Author

joka921 commented Oct 4, 2024

@ldp2211479 The fix for this will probably be merged into the master today, I have just apllied a final small tweak to it and am now waiting for the CI pipeline.

@joka921 joka921 deleted the fix-wkt-literals branch October 4, 2024 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants