SDO namespace does not match predicates from http #1576
Replies: 16 comments
-
Agreed the SDO forcing https breaks queries. As the OP stated, it seems to be "against the norm". I can add:
I may be missing something important, but I would support/endorse the OP: SDO should refer to SDO = Namespace("http://schema.org/") and NOT SDO = Namespace("https://schema.org/") |
Beta Was this translation helpful? Give feedback.
-
I made the arbitrary choice to use HTTPS instead of HTTP for SDO since I've been using the HTTPS version in all recent work, e.g. W3C groups, but it was a flip of the coin. I do take the point that shcema.org's own examples do seem to tend towards HTTP but I then also think that, going forward, they will be using more of HTTPS. Nothing stopping people doing this: SCHEMA = Namespace("http://schema.org/") I guess if you really want it changed, please put in a PR! (perhaps SDO & SDO2 (for HTTPS)) |
Beta Was this translation helpful? Give feedback.
-
We are migrating towards https-everywhere for schema.org, but until recently all machine-readable representations have used http: for URIs. Only with our most recent release have we made schema dumps available that use https too, see https://schema.org/docs/developers.html In the schema.org FAQ we have said for a few years that there is nothing wrong with using https in markup (RDFa, MIcrodata, JSON-LD); this puts the onus onto data consumers to normalize or map. If I had a time machine I'd go back to 2011 and argue for https from day 1, but that's not where we are today. There is a lot of http://-based schema.org out there. My recommendation would be to normalize into one of the other; https: is the future, but there is a vast amount of http: out there too. |
Beta Was this translation helpful? Give feedback.
-
Thanks @danbri |
Beta Was this translation helpful? Give feedback.
-
maybe use OLDSDO for the http one? Could someone sketch the best-practice few lines of rdflib code that would be needed to canonicalize all URIs in a Graph that begin http://schema.org* to begin https://schema.org instead? |
Beta Was this translation helpful? Give feedback.
-
Do you mean something like this: for s, p, o in g.triples():
if str(s).startswith("http://schema.org"):
g.remove((s, p, o))
g.add((URIRef(str(s).replace("http", "https")), p, o))
if str(p).startswith("http://schema.org"):
g.remove((s, p, o))
g.add((s, URIRef(str(p).replace("http", "https")), p, o))
if str(o).startswith("http://schema.org"):
g.remove((s, p, o))
g.add((s, p, URIRef(str(o).replace("http", "https")))) |
Beta Was this translation helpful? Give feedback.
-
I think so! I was going to suggest s/elif/if/ but it looks like you already fixed that :) |
Beta Was this translation helpful? Give feedback.
-
@nicholascar yep I think that's the simplest way of doing it at a high level. |
Beta Was this translation helpful? Give feedback.
-
For modest (e.g. web page sized) graphs this seems useful. There might be other techniques e.g. SPARQL update for bigger databases of triples. https://www.semanticarts.com/sparql-changing-instance-uris/ has an example that could be adapted? |
Beta Was this translation helpful? Give feedback.
-
@nicholascar this approach can make unintended changes to data. For example, applying that code to the following would alter the description literal:
Adding a test for the object type will help:
|
Beta Was this translation helpful? Give feedback.
-
Actually, another adjustment is needed since if say for s, p, o in g.triples(None):
changed = False
new_s = s
if str(s).startswith("http://schema.org"):
new_s = rdflib.URIRef(str(s).replace("http", "https"))
changed = True
new_p = p
if str(p).startswith("http://schema.org"):
new_p = rdflib.URIRef(str(p).replace("http", "https"))
changed = True
new_o = o
if isinstance(o, rdflib.URIRef):
if str(o).startswith("http://schema.org"):
new_o = rdflib.URIRef(str(o).replace("http", "https"))
changed = True
if changed:
g.remove((s,p,o))
g.add((new_s, new_p, new_o)) Full worked example at: https://gist.github.com/datadavev/994e6a39bded38b75f4e46e88cd70850 |
Beta Was this translation helpful? Give feedback.
-
@datadavev yes, that's better: keeping track of changes and then just doing one remove/add. I did think of one edge case: if a URI like this was ever constructed |
Beta Was this translation helpful? Give feedback.
-
we don't currently encourage any URLs with that pattern
…On Tue, 4 Aug 2020 at 21:50, Nicholas Car ***@***.***> wrote:
@datadavev <https://github.com/datadavev> yes, that's better: keeping
track of changes and then just doing one remove/add.
I did think of one edge case: if a URI like this was ever constructed
http://schemaorg/something?uri=http://example.org then the string replace
would erroneously alter both http bits, but I don't think schema.org can
be used like that so the replace() is fine as it is.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1120 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABJSGNSUCIR3S5QOJT6573R7BYAPANCNFSM4ODXXD2Q>
.
|
Beta Was this translation helpful? Give feedback.
-
Could also make the replace more specific, e.g.:
|
Beta Was this translation helpful? Give feedback.
-
Yeah, that's what I was thinking, but this is an extreme edge case so I don't think it's needed! |
Beta Was this translation helpful? Give feedback.
-
Yep, it's an edge for sure. I did see a |
Beta Was this translation helpful? Give feedback.
-
In
namespace.py
SDO is defined asSDO = Namespace("https://schema.org/")
, and it won't match predicates from HTTP.For example if I create a graph like this:
I would expect this to return a list of length 1, but it returns an empty list instead:
According to schema.org FAQ you can use either http://schema.org or https://schema.org in namespaces. So they should be equivalent.
I'm not sure whether there is a way to treat the two as equivalent in the library. In current usage from the Web Data Commons is seems like
http://schema.org
is more common, but both occur.Beta Was this translation helpful? Give feedback.
All reactions