This documentation is to assist developers of Prez, not users or installers.
- Developer README
Prez is open source software, and contributions are welcome.
To ensure that the below commit messaging and linting rules are enforced, you need to install pre-commit and then install the hooks for this repository
pre-commit install --hook-type pre-commit
pre-commit install --hook-type commit-msg
Use of Conventional Commits is enforced via pre-commit hooks. This helps the maintainers to keep the commit hisotry clean and to automate the release of new versions.
In short:
fix: and perf:
commits will automatically trigger a new patch version i.e.0.0.1 --> 0.0.2
feat:
commits will automatically trigger a new minor version i.e.0.0.1 --> 0.1.0
, and- commits with a
BREAKING CHANGE:
message will cause a new major version i.e.0.0.1 --> 1.0.0
These rules are as per the default settings for python semantic release.
To keep the codebase consistent, pre-commit hooks are used to enforce formatting and linting standards. The following tools are run before a commit:
tool | description |
---|---|
ruff | check for syntax and logical errors |
black | apply consistent formatting |
isort | import sorting |
SHACL NodeShapes and PropertyShapes are utilised to determine which properties of Focus Nodes should be rendered.
Each Profile must have the following properties:
- A
prof:Profile
type - A type of either
prez:ObjectProfile
(for rendering objects) orprez:ListingProfile
(for rendering lists of objects), or both. - A title, identifier (with datatype xsd:token), and description, using the DCTERMS namespace.
Extensions to SHACL are used to specify the mediatypes and resource formats available for a given profile. These are specified as follows. The namespace used is http://www.w3.org/ns/dx/connegp/altr-ext#, and the prefix used for this namespace is altr-ext
.
The default resource format for a profile can be set with altr-ext:hasDefaultResourceFormat
.
For example:
prez:OGCSchemesObjectProfile
a prof:Profile , prez:ObjectProfile , sh:NodeShape ;
altr-ext:hasDefaultResourceFormat "text/turtle" ;
.
The available resource formats for a profile can be set with altr-ext:hasResourceFormat
.
For example:
prez:OGCSchemesObjectProfile
a prof:Profile , prez:ObjectProfile , sh:NodeShape ;
altr-ext:hasResourceFormat "text/turtle" , "application/ld+json" ;
The classes of object which a profile constrains can be specified with altr-ext:constrainsClass
.
Prez utilises this information when determining whether a requested profile can be used; and the alternate profiles that are available to render a resource. An example is given below:
prez:OGCSchemesObjectProfile
a prof:Profile , prez:ObjectProfile , sh:NodeShape ;
altr-ext:constrainsClass dcat:Catalog ;
.
A default profile can be specified using the altr-ext:hasNodeShape
and altr-ext:hasDefaultProfile
predicates. This is typically done on an "umbrella" profile which indicates default profiles for all classes the API can render. An example is given below:
prez:OGCRecordsProfile
a prof:Profile ;
dcterms:identifier "ogc"^^xsd:token ;
dcterms:description "A system profile for OGC Records conformant API" ;
dcterms:title "OGC Profile" ;
altr-ext:constrainsClass prez:CatPrez ;
altr-ext:hasDefaultResourceFormat "text/anot+turtle" ;
altr-ext:hasNodeShape [
a sh:NodeShape ;
sh:targetClass prof:Profile , dcat:Catalog , dcat:Resource , skos:Concept , geo:Feature , geo:FeatureCollection
, skos:Collection , rdf:Resource , prez:SearchResult , prez:CQLObjectList ;
altr-ext:hasDefaultProfile prez:OGCListingProfile
] , [
a sh:NodeShape ;
sh:targetClass skos:ConceptScheme ;
altr-ext:hasDefaultProfile prez:OGCSchemesListProfile
] , [
a sh:NodeShape ;
sh:targetClass skos:ConceptScheme ;
altr-ext:hasDefaultProfile prez:OGCSchemesObjectProfile
] , [
a sh:NodeShape ;
sh:targetClass prof:Profile , dcat:Catalog , dcat:Resource , skos:Concept , geo:Feature , geo:FeatureCollection
, skos:Collection , rdf:Resource ;
altr-ext:hasDefaultProfile prez:OGCItemProfile
]
.
Note the target classes are shared across both listings of items, and items themselves; the API determines whether a listing or object profile is appropriate based on the endpoint a request is received at.
Direct properties of a focus node are specified via sh:path
.
Example:
sh:property [
sh:path prov:qualifiedDerivation
]
A convenience predicate is provided to specify the inclusion of all predicates, shext:allPredicateValues
.
Example:
sh:property [
sh:path shext:allPredicateValues ;
]
Sequence paths are specified as property shapes with a path representing the linked list of properties from a focus node.
sh:property [
sh:path ( prov:qualifiedDerivation prov:hadRole )
]
Inverse paths are specified on a nested blank node where the first property is sh:inversePath
.
sh:property [
sh:path [ sh:inversePath dcterms:hasPart ] ;
]
Multiple paths can be specified at once using sh:union
.
sh:property [
sh:path (
sh:union (
dcterms:publisher
reg:status
( prov:qualifiedDerivation prov:hadRole )
( prov:qualifiedDerivation prov:entity )
)
)
]
The above property paths when specified without a min or max count must be present for a focus node to be returned. That is, by default, specified paths must be present for the focus node and properties to be returned. The following constructs can be used to specify excluded or optional properties.
Specification: sh:maxCount 0
Interpretation: do not include these paths from the focus node, even if they exist in the data.
Example:
sh:property [
sh:maxCount 0 ;
sh:path dcterms:hasPart
]
Specification: sh:minCount 0
Interpretation: include these paths from the focus node if they exist.
Example:
sh:property [
sh:minCount 0 ;
sh:path dcterms:hasPart
]
A convenience predicate is provided to specify the inclusion of blank nodes to a given depth, shext:bnode-depth
. Note this is specified directly on the profile and not on a property shape as it does not relate to any particular property shape.
Specification: shext:bnode-depth
Example:
prez:OGCSchemesObjectProfile
a prof:Profile , prez:ObjectProfile , sh:NodeShape ;
shext:bnode-depth 2 ;
The following logic is used to determine the profile and mediatype to be returned:
- If a profile and mediatype are requested, they are returned if a matching profile which has the requested mediatype is found, otherwise the default profile for the most specific class is returned, with its default mediatype.
- If a profile only is requested, if it can be found it is returned, otherwise the default profile for the most specific class is returned. In both cases the default mediatype is returned.
- If a mediatype only is requested, the default profile for the most specific class is returned, and if the requested mediatype is available for that profile, it is returned, otherwise the default mediatype for that profile is returned.
- If neither a profile nor mediatype is requested, the default profile for the most specific class is returned, with the default mediatype for that profile.
The SPARQL query used to select the profile is given in Appendix D.
For object endpoints, the (single) focus node is specified at runtime, either as a URL path parameter or query string argument.
For listing endpoints, the following inputs are used to determine which nodes to select:
- Endpoint - an endpoint is mapped to one or more endpoint SHACL NodeShapes.
- CQL JSON - only used as filter, i.e. filters on specified predicates.
- Search term - filters the label properties of focus nodes to a given search term. Currently a REGEX search is supported.
All listing endpoints support these inputs. The inputs are transformed into the SPARQL Grammar, merged together, and combined with SPARQL Grammar for profiles to create a single query. Profiles specify the inclusion/exclusion of properties on focus nodes, and are detailed in "Profile Design".
Determine which nodes to select.
This forms the inner select part of the SPARQL query. The inputs are one or more of: the URL path a query is sent to; a CQL filter expression; and a search term. The outputs are three sets of SPARQL Grammar objects TriplesSameSubject
, TriplesSameSubjectPath
, and GraphPatternNotTriples
. The TriplesSameSubject
are used to form the ConstructTriples
part of the query; the TriplesSameSubjectPath
, and GraphPatternNotTriples
form the WhereClause
.
Considerations:
- Class of objects to list (e.g. for /catalogs, list all items of class dcat:Catalog)
- Relationship to parent objects (e.g. for /catalogs/{parent_catalog_curie}/collections/, list all items that have the relationship dcterms:hasPart from the parent catalog curie)
Implementation:
SHACL shapes are used to represent the how URL path parameters are translated into a SPARQL query.
One endpoint maps to one or more endpoint NodeShapes. For example the items endpoint can render resources of type
skos:Concept
andgeo:Feature
. An example of an endpoint definition is:
ogce:item-object
a ont:ObjectEndpoint ;
ont:relevantShapes ex:Feature , ex:ConceptSchemeConcept , ex:CollectionConcept , ex:Resource ;
.
To determine which NodeShape (under ont:relevantShapes
) should be used to render resources, the class of parents in the URL path is first determined. The logic for this is: 1. Get the classes of all parents in the URL path. Prez caches this class information. 2. Match these to sh:class
statements on the PropertyShapes for the NodeShape. sh:class
is used on nested PropertyShapes to specify a constraint on the class of related nodes, that is, nodes related via the property shape. (e.g. "the class of the first parent is dcat:Resource
, the class of the second parent is dcat:Catalog
, therefore the applicable NodeShape for the listing is the ex:Resource
NodeShape.)
The NodeShape information, once determined, is used for: 1. Query generation - which class of nodes to list (e.g. rdf:Resource
below) 2. Link generation - to determine which endpoints can render a resource of a given class, and, how to find the parents of a given object in order to generate a link (e.g. the parents are all related via dcterms:hasPart
in the example below.)
An example NodeShapes for describing an endpoint is:
ex:Resource
a sh:NodeShape ;
ont:hierarchyLevel 3 ;
sh:targetClass rdf:Resource ;
sh:property [
sh:path [ sh:inversePath dcterms:hasPart ] ;
sh:class dcat:Resource ;
] , [
sh:path ( [ sh:inversePath dcterms:hasPart ] [ sh:inversePath dcterms:hasPart ] );
sh:class dcat:Catalog ;
] .
The hierarchyLevel is used to filter the set of potentially relevant NodeShapes - when a request comes from an endpoint, that endpoint has a corresponding hierarchy level.
A further example for the collections endpoint is provided:
ex:Collections
a sh:NodeShape ;
ont:hierarchyLevel 2 ;
sh:targetClass geo:FeatureCollection , skos:ConceptScheme , skos:Collection , dcat:Resource ;
sh:property [
sh:path [ sh:inversePath dcterms:hasPart ] ;
sh:class dcat:Catalog ;
] .
This means to select the nodes of class dcat:Resource
, geo:FeatureCollection
, skos:ConceptScheme
, or skos:Collection
, which are related to a parent node of class dcat:Catalog
, by the relationship dcterms:hasPart
.
Considerations:
- Mapping of JSON values to URIs.
- Filtering of properties vs. graph pattern matching; the latter supports more complex operations (sequence, inverse paths etc.). Implementation: CQL JSON expressions are translated to JSON LD to allow easy mapping to URIs. Examples are provided in the API docs using test data. A demo instance is available here. CQL JSON documentation is available here:
Properties are assumed to be URIs. Values for properties can be specified as URIs using a JSON LD like "@id", for example:
{
"op": "=",
"args": [
{
"property": "http://www.w3.org/2000/01/rdf-schema#member"
},
{ "@id": "http://example.com/datasets/sandgate/facilities" }
]
}
The following context is "inserted" into CQL JSON to create "CQL JSON-LD".
{
"@version": 1.1,
"@base": "http://example.com/",
"@vocab": "http://example.com/vocab/",
"cql": "http://www.opengis.net/doc/IS/cql2/1.0/",
"sf": "http://www.opengis.net/ont/sf#",
"geo": "http://www.opengis.net/ont/geosparql#",
"landsat": "http://example.com/landsat/",
"ro": "http://example.com/ro/",
"args": {
"@container": "@set",
"@id": "cql:args"
},
"property": {
"@type": "@id",
"@id": "cql:property"
},
"op": {
"@id": "cql:operator"
},
"type": {
"@id": "sf:type"
}
}
The following has been implemented:
- Spatial functions
- String pattern matching
- Property filtering
The following has not yet been implemented:
- Time filtering
Implementation:
The search term is inserted into three different regex expressions which match the search term in different ways, and weights the results. A full search query is generated, and then relevant parts are extracted (TriplesSameSubject
etc. as listed above), to generate a final query.
Prez utilises the sparql-grammar-pydantic library to generate SPARQL queries.
Prez returns:
- RDF data for specified objects
- RDF data for lists of objects
- Annotated RDF* for specified objects
- Annotated RDF* for lists of objects
- Available Profiles
- An alternates profile for every object or listing, listing all available profiles and mediatypes
- OpenAPI documentation for the API
* Annotated RDF is RDF which includes labels, descriptions, explanatory, and other properties for all RDF terms. The predicates Prez looks for are rdfs:label, dcterms:description
, and dcterms:provenance
. The list of predicates Prez looks for can be extended in the profiles.
The objects Prez delivers RDF for have URIs that uniquely identify them. Prez delivers RDF for these objects at URLs on the web. These URLs and URIs are not required to be the same, and frequently are not. For objects that Prez holds information for, it is helpful if Prez tells users the URL of these when they are referenced elsewhere in the API. This is in two places:
- Listings of objects, for example
dcat:Catalog
at the/catalogs
endpoint; and - Links to related objects, where the API holds information on the related object.
In these cases, in the annotated RDF mediatype (text/anot+turtle
) URL paths are provided which link to the related object.
For cases where URIs and URLs for a given object differ, URL redirection can be used to send users to the Prez URL instance which displays information for the object.
Internal links use CURIEs. Prez uses the default RDFLib prefixes, covering common namespaces. Additional prefixes can be specified using the Vann ontology property "vann:preferredNamespacePrefix". These can be added to turtle files in the prez/reference_data/prefixes directory. Any turtle files in this directory will be loaded on startup.
When Prez encounters a URI which is required for an internal link but is not in the current known prefixes, it will generate a prefix using the following logic:
- Get the "second to last part" of the URI; either the part before a fragment if it exists, or the second to last path segment otherwise.
- If this second to last part is less than six characters, use it as is, else:
- Remove vowels from the second to last part and use this as the prefix.
- If this prefix fails to bind for any reason, use RDFLib's default "ns1", "ns2" etc. prefixes.
To get "sensible" or "nice" prefixes, it is recommended to add all prefixes which will be required to turtle files in prez/reference_data/prefixes. A future change could allow the prefixes to be specified alongside data in the backend, as profiles currently can be.
The following SPARQL query can be used as a starting point to check if a namespace prefix is defined for instances of the main classes prez delivers. NB this query should NOT be run against SPARQL endpoints for large datasets; offline options should instead be used. NB. for "short" URIs, i.e. a hostname with no fragments and a "no" path, this query will (correctly, but uselessly) return "http://" or "https://". You will need to otherwise identify what these URIs are and provide prefixes for them should you wish.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX vann: <http://purl.org/vocab/vann/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
SELECT DISTINCT ?namespace
{?uri a ?type
BIND (REPLACE(STR(?uri), "(.*[/#])[^#/]*$", "$1") AS ?namespace)
VALUES ?type { skos:Collection skos:ConceptScheme skos:Concept dcat:Dataset geo:FeatureCollection geo:Feature dcat:Resource dcat:Catalog }
MINUS {?namespace vann:preferredPrefix ?prefix .}
} LIMIT 100
Prez recognises the following kinds of annotation properties, and can return RDF, either via SPARQL queries, or the endpoints as annotated RDF.
When an annotated mediatype is requested (e.g. text/anot+turtle
), Prez will look for the following predicates for
every RDF term in the (initial) response returned by the triplestore. That is it will expand the response to include
the annotations and return the RDF merge of the original response and the annotations.
Additional predicates can be added to the list of predicates Prez looks for in the profiles by adding these predicates to the configuration.
New endpoints can be added to Prez by adding RDF, and minimal addition of FastAPI decorators.
- Add FastAPI decorator,
- For Listing endpoints, add these to the
listings
function inprez/routers/ogc_router
. An example is:
- For Listing endpoints, add these to the
@router.get(
"/catalogs",
summary="Catalog Listing",
name=OGCE["catalog-listing"],
responses=responses
)
See the references in the code for what should be provided for responses
and openapi_extra
; these fields are optional but useful for documentation.
The name is required, and should be a URI. 2. An endpoint definition. 1. The endpoint URI must match the name uri in the decorator. 2. The endpoint must be declared a ont:ListingEndpoint
or ont:ObjectEndpopint
, as Prez uses different application code to render results for these two types of endpoint.
These are in prez/reference_data/endpoints/endpoint_metadata.ttl
. An example is:
ogce:catalog-listing
a ont:ListingEndpoint ;
ont:relevantShapes ex:Catalogs ;
.
- A NodeShape for the endpoint. This describes how nodes should be selected at the given endpoint. An example is:
ex:Catalogs
a sh:NodeShape ;
ont:hierarchyLevel 1 ;
sh:targetClass dcat:Catalog ;
sh:property [
sh:path dcterms:hasPart ;
sh:or (
[ sh:class dcat:Resource ]
[ sh:class geo:FeatureCollection ]
[ sh:class skos:ConceptScheme ]
[ sh:class skos:Collection ]
) ;
] .
This specifies the selection of focus nodes of class dcat:Catalog
which have the relationship dcterms:hasPart
to one or more of the listed classes.
Prez utilises the sparql-grammar-pydantic library to generate SPARQL queries.
For objects, the focus node is specified in a query path as a curie, or in the case of the /object
endpoint, as query parameter with the key "uri".
For lists of objects, the focus node is a variable, fixed within prez to ?focus_node
.
Usage: The focus node is substituted into the main query.
Prez creates a single main query to describe an object or listing of objects.
The structure of the query is as follows:
CONSTRUCT {
<construct_triples + construct_tss_list>
}
WHERE {
# for listing queries only:
{
SELECT ?focus_node <innser_select_vars>
WHERE {
<inner_select_tssp_list>
<inner_select_gpnt>
}
ORDER BY <order_by_direction>(<order_by>)
LIMIT <limit>
OFFSET <offset>
}
# for all queries:
<profile_triples>
<profile_gpnt>
}
The triples to construct. This is taken from the union of:
- Profile_Triples (directly) - i.e. any triple specified in the where clause will be constructed
- Any triples within the Profile_GPNTs object. Prez utilises a convenience function provided by the SPARQL Grammar library which recursively extracts all triples within a given SPARQL Grammar object.
- Additional_Construct_Triples (directly) - these may come from a search query, such as the query result weights, etc.
There is one source of profile triples and profile GPNTs - these are derived from SHACL node and property shapes associated with the selected profile (returned by ConnegP).
At a conceptual level these profile shapes represent the "properties" or "attributes" to be returned for each focus node. At present the following SHACL expressions are covered:
- minCount = 0 (optional property)
- maxCount = 0 (exclude property)
- path
- sequence path
- inverse path
- class
- blank nodes to a specified depth How to specify these is detailed in the Profile Design section.
Inner Select Triples and Inner Select GPNTs are taken from the union of:
- CQL
- Search queries
- Endpoint Nodeshapes These are detailed in the Focus Node Selection section.
- Where an annotated mediatype is requested, Prez returns any annotations it can find from all available repositories (data, systems, and annotations reposoitory).
- These annotations are then cached against the URI they are for.
- The caching utilises aiocache.
- aiocache is currently set up with in memory caches. It could be extended to utilise Redis.
A sequence diagram is shown for annotation retrieval:
sequenceDiagram
Client ->> FastAPI: Request for data with annotated mediatype
FastAPI ->> Repo: send_queries(object/list query)
Repo -->> FastAPI: initial response graph
FastAPI ->> FastAPI: get URIs in initial response
FastAPI ->> Cache: check cache for annotations
Cache -->> FastAPI: return cached annotations (if any)
FastAPI ->> FastAPI: determine cached and uncached URIs
FastAPI ->> Repo: query for uncached annotations
Repo -->> FastAPI: return cached annotations (if any)
FastAPI ->> Cache: cache previously uncached annotations
FastAPI ->> Client: return initial response graph + annotations
Annotations are returned with one of the following mapped prez namespaced URIs.
prez:label: skos:prefLabel, dcterms:title, rdfs:label, sdo:name prez:description: skos:definition, dcterms:description, sdo:description prez:provenance: dcterms:provenance
An abstraction over data providers is provided with "Repositories". Three types are supported; Pyoxigraph (in memory), Oxrdflib, and RemoteSparql.
- Data repository - one of Pyoxigraph, Oxrdflib, or RemoteSparql
- System repository - Pyoxigraph
- Annotations repository - Pyoxygraph
- Check the SPARQL endpoints can be reached. A blank query (
ASK {}
) is used to test this. The SPARQL endpoints are not health checked post startup. - Create in memory profile, prefix, and endpoint graphs, containing all profiles in the
prez/profiles
directory, and any additional profiles available in the triplestore (declared as ahttp://www.w3.org/ns/dx/prof/Profile
) - Look for predefined object counts in the triplestore.
Prez provides a /object
endpoint as an endpoint that supplies any information known about a given URI. If an annotated
mediatype is requested, prez will additionally provide all system links for endpoints which can render the object. The
high level sequence for this endpoint is as follows:
- Get the URI for the object from the query string
- Get the class(es) of the object from the triplestore
- Use prez's reference data for endpoints to determine which endpoints can render this object, and, a template for these endpoints, specifying any variables that need to be substituted (such as parent URIs).
- Get the object information from the triplestore, using an open profile, and in parallel any system information needed to construct the system links.
- Return the response
The OGC Features endpoints can utilise custom SPARQL templates. At present the queries must be of the form:
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sosa: <http://www.w3.org/ns/sosa/>
CONSTRUCT {
...
}
WHERE {
VALUES ?focusNode { UNDEF }
...
}
At present the queries are ONLY substituted with the focus node URI, and only for OGC Features endpoints. It is intended that this functionality will provide the basis for a more general templating system across all OBJECT endpoints in the future. These templates should be declared in the remote repo in this format:
PREFIX prez: <https://prez.dev/ont/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
INSERT DATA { GRAPH <https://prez/system> {
[ a prez:TemplateQuery ;
rdf:value """<template_query>
""" ;
prez:forEndpoint "http://www.opengis.net/ogcapi-features-1/1.0/feature" ;
]
}}
Prez will detect these template queries and when a request comes in on the endpoint with the relevant URI, utilise the template query.
Templates can also be specified in prez/reference_data/xxx.rq
, and mapped to an endpoint using the endpoint_to_template_query_filename
setting (can be set as an environment variable), where xxx.rq
is the filename of the template query. For example:
export ENDPOINT_TO_TEMPLATE_QUERY_FILENAME='{"http://www.opengis.net/ogcapi-features-1/1.0/feature": "xxx.rq"}'
Prez follows the following logic to determine what information to return, based on a profile, and in what mediatype to return it.
- Determine the URI for an object or listing of objects:
- For objects:
- Directly supplied through the /object?uri= query string argument
- From the URL path the object is requested from, for example /catalogs/. abc is a curie, which is expanded to a URI.
- Get all classes for the object or object listing
- Determine the profile and mediatype to use for the object. This is implemented as a SPARQL query and takes into account:
- The classes of the object
- Available profiles and mediatypes
- Requested profiles and mediatypes
- Default profiles and mediatypes The logic used to determine the profile and mediatype is detailed in section x.
- Build a SPARQL query.
- Execute the SPARQL query.
- If the mediatype requested is NOT annotated RDF (
text/anot+turtle
), return the results of 5, else retrieve the annotations:- Check Prez cache for annotations
- For terms without annotations in the cache, query the triplestore for annotations
- Cache any annotations returned from the triplestore
- Return the annotations merged with the results of the SPARQL query in step 5.