Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bind_namespaces strategy for Prefix.cc #2239

Open
wants to merge 11 commits into
base: 8.x
Choose a base branch
from
18 changes: 18 additions & 0 deletions examples/bind_prefix_cc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
"""
Prefix.cc is a community curated prefix map. By using `bind_namespace="cc"`,
you can set a namespace manager or graph to dynamically load prefixes from
this resource.
"""

import rdflib

graph = rdflib.Graph(bind_namespaces="cc")

# The Gene Ontology is a biomedical ontology describing
# biological processes, cellular locations, and cellular components.
# It is typically abbreviated with the prefix "go" and uses PURLs
# issued by the Open Biological and Biomedical Ontologies Foundry.
prefix_map = {prefix: str(ns) for prefix, ns in graph.namespaces()}
assert "go" in prefix_map
assert prefix_map["go"] == "http://purl.obolibrary.org/obo/GO_"
assert graph.qname("http://purl.obolibrary.org/obo/GO_0032571") == "go:0032571"
2 changes: 1 addition & 1 deletion rdflib/_type_checking.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,5 @@
else:
from typing_extensions import Literal as PyLiteral

_NamespaceSetString = PyLiteral["core", "rdflib", "none"]
_NamespaceSetString = PyLiteral["core", "rdflib", "none", "cc"]
_MulPathMod = PyLiteral["*", "+", "?"] # noqa: F722
20 changes: 15 additions & 5 deletions rdflib/namespace/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Set, Tuple, Union
from unicodedata import category
from urllib.parse import urldefrag, urljoin
from urllib.request import urlopen

from rdflib.term import URIRef, Variable, _is_valid_uri

Expand Down Expand Up @@ -372,7 +373,6 @@ class NamespaceManager(object):
* note this is NOT default behaviour
* cc:
* using prefix bindings from prefix.cc which is a online prefixes database
* not implemented yet - this is aspirational

See the
Sample usage
Expand Down Expand Up @@ -418,11 +418,14 @@ def __init__(self, graph: "Graph", bind_namespaces: "_NamespaceSetString" = "cor
for prefix, ns in _NAMESPACE_PREFIXES_CORE.items():
self.bind(prefix, ns)
elif bind_namespaces == "cc":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the TLD name for prefix.cc? cc is also the conventional prefix for creativecommons, so this might be confusing.

(I would also prefer having this as a utility (e.g. graph.bind_namespaces(util.get_prefix_cc()) rather than a flag to Graph. That would be more explicit and lets the user control network access and caching.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niklasl I am happy to do whatever the RDFLib team decides on, but this interface and nomenclature was already predefined, so I just filled it in with an implementation as suggested.

I agree that since prefix.cc relies on a network connection that this is a bit of a tricky situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you can achieve the same effect using RDFLib as is, using just:

from rdflib import Graph

graph = Graph()
graph.parse('https://prefix.cc/context.jsonld')

for pfx, ns in graph.namespaces():
    print(pfx, ns)

This yields one very interesting difference though: the go prefix won't work, as it ends in a _, which is not treated as a namespace prefix in JSON-LD 1.1, since it does not en with a URI gen-delim (it has to be explicitly declared using "@prefix": true in the context to be treated as a prefix anyway).

for prefix, ns in _NAMESPACE_PREFIXES_RDFLIB.items():
self.bind(prefix, ns)
for prefix, ns in _NAMESPACE_PREFIXES_CORE.items():
self.bind(prefix, ns)
# bind any prefix that can be found with lookups to prefix.cc
# first bind core and rdflib ones
# work out remainder - namespaces without prefixes
# only look those ones up
raise NotImplementedError("Haven't got to this option yet")
for prefix, ns in _get_prefix_cc().items():
# note that prefixes are lowercase-only in prefix.cc
self.bind(prefix, ns)
elif bind_namespaces == "core":
# bind a few core RDF namespaces - default
for prefix, ns in _NAMESPACE_PREFIXES_CORE.items():
Expand Down Expand Up @@ -719,6 +722,13 @@ def absolutize(self, uri: str, defrag: int = 1) -> URIRef:
return URIRef(result)


def _get_prefix_cc():
"""Get the context from Prefix.cc."""
response = urlopen("https://prefix.cc/context.jsonld")
context = json.loads(response.read())
return context["@context"]
Comment on lines +734 to +738
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern here is that there are some valid "@context" values [ref] that will not be handled correctly here, it may be that they don't appear in prefix.cc, but it may be good to confirm that somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's the case that there are no special values in this context dictionary based on the way that the context is constructed (https://github.com/cygri/prefix.cc/blob/cbc85c00e59e00cf4fee697374109fdd9027231a/templates/format/jsonld.php) and the strict requirements on prefixes (though right now I am having a hard time finding where it's documented that these have to be lowercase strings of alphanumeric characters length <= 10)



# From: http://www.w3.org/TR/REC-xml#NT-CombiningChar
#
# * Name start characters must have one of the categories Ll, Lu, Lo,
Expand Down
16 changes: 16 additions & 0 deletions test/test_namespacemanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,22 @@ def test_graph_bind_namespaces(
assert namespaces is None


def test_graph_bind_cc():
"""Test binding Prefix.cc.

Note that since prefix.cc is an inherently dynamic resource,
that checking an exact equivalence is not applicable.
"""
graph = Graph(bind_namespaces="cc")
namespaces = {*graph.namespaces()}
for namespaces in [
_NAMESPACE_PREFIXES_CORE,
_NAMESPACE_PREFIXES_RDFLIB,
{"go", "atcc"}, # represent some prefixes in Prefix.cc
]:
assert all(ns in namespaces for ns in _NAMESPACE_PREFIXES_CORE)


@pytest.mark.parametrize(
["selector", "expected_result"],
[
Expand Down