Skip to content

We need to say more about p:document properties() and it's friends

Achim Berndzen edited this page Aug 23, 2018 · 4 revisions

In the specs we define p:document-properties and a bunch of other functions to access the document properties assign to an XProc document (being a pair of a representation and it's properties). So far, so good, but remember the signature of p:document-properties() is:

p:document-properties($doc as item()) as map(xs:QName,item()*)

The important point is, that we do not request the document properties from an XProc document, but we can call p:document-properties() on any item, some of them might be associated with an XProc document and some might not.

Easy things first:

Let us start with the obvious cases: The pipeline fragment

<p:load href="some-url" name="loader"/>
<p:identity>
   <p:with-input select="p:document-properties-document(.)" />
</p:identity>

is clearly expected to produce an XProc document (of whatever flavour) in the loader-step and then to produce another XProc document (of the XML document flavour) containing the document-properties of the former. On the output port result of p:identity an XProc document like this should appear:

<p:document-properties>
 <base-uri>the base-uri of the document loaded</base-uri>
 <content-type>the content type of the document loaded</content-type>
</p:document-properties>

OK, this is easy. I think it is also easy to say, what happens, if I do this

<p:identity>
   <p:with-input select="p:document-properties-document(5)" />
</p:identity>

Given the current specs, which do not mention any error conditions, and as 5 is a legitimate instance of item() I will most likely get an empty p:document-properties-document.

But this (at least to my reading) is the point, where the easy cases end. Here are some problems/question that came up to me over time:

What about XML documents, that never were XProc documents?

Suppose an XProc processor finds this pipeline fragment:

<p:identity>
   <p:with-input select="p:document-properties-document(doc('some-uri')" />
</p:identy>

The difference between the case above and this case, is that the document returned by doc() (if any), is an XML document, but not an XProc document. So it is a representation (without) document-properties. It may have some, as we will see soon, but there is no concept of "document properties" defined in XDM. Document properties come from XProc. But surely we could construct a document-property document for XDM document loaded with fn:doc(): Every XDM document has a base-uri property and since it is an XML document, we know its content-type to be "application/xml". So in this case we could construct a document-property-document for XProc.

What about JSON documents, that never were XProc document?

Next up:

<p:identity>
   <p:with-input select="p:document-properties-document(parse-json(unparsed-text('some-uri-to-json'))" />
</p:identy>

compared to:

<p:load href="some-uri-to-json" content-type="application/json" />
<p:identity>
  <p:with-input select="p:document-properties-document(.)" />
</p:identity>

In the second case the XProc processor will hopefully return a p:document-properties-document with the resolved uri as text child of a base-uri-element and "application/json" as text child of a content-type-element. But in the first case there is now obvious way to do this, because the XPath function is called with a map, an array etc. resulting from fn:parse-json() and there is no way (non-magical way) for the XPath processor to know the base-uri it was loaded from or that it was parsed with fn:parse-json(), so probably should have content-type "application/json". (It could be any map, array etc. so why assume it is a JSON document?) The document resulting from calling p:document-properties-document() would most probably have just a root element, but no child elements because nothing is known about the map, array etc. provided as argument.

What about Text documents, that never were XProc document?

The point just made about the asymmetry of the two ways to deal with JSON-document in contrast to the two ways to deal with XML document also applies to the two ways of dealing with text document. The only difference is, that a text document loaded with p:load is a text node, while fn:unparsed-text() produces a string.

What about (former) XProc documents in variables

On of the great improvements of XProc 3.0 is, that variables/options are typed and so we can assign XProc documents to variables, like so:

<p:load href="some-url" name="loader"/>
<p:variable name="var" select="." />

Here I assign an XProc document (a pair of representation and properties) to a variable using an XPath expression. The typed value of $var is "document-node()". The important question now is, if an XProc processor has to make sure, that p:document-properties-document($var) reveals the document-properties that are part of the XProc document loaded in the loader-step. (Remember: $var holds an XDM instance where the concept of "document-properties" is not known.)

If we answer this question with "yes", then something magical has to happen as the document properties associated with the XProc document travel magically to XPath and could be revealed by calling some XPath functions. This could be done (I wrote a prove of concept for the most recent version of Saxon), but I think you have to agree, that it is (at least some kind of) magic. (And probably not stable as XPath processors evolve.)

If we answer "no", we will get the same asymmetry we had above for the "loading"-case: For XML documents we could reconstruct at least two properties, but this would be difficult for JSON- and Text-documents.

And: Some document properties will be lost, even for XML document. The specs currently mention only two instances of document-properties: the content-type and the base-uri. But this is not the complete story as a pipeline author could attach any key-value-pair she likes using the step p:set-properties. Without magic there is no way to reconstruct the author-set key/-value in p:document-properties-document($a) even for XML documents.

What to make of this?

  • I think it is obvious, that we need to say more in the specs to make users understand p:document-properties() and it's friend and to make sure, implementers will do the right thing.
  • We have to decide if we want the document properties associated with an XProc document to travel magically with the (XDM) representation also associated with this document. As I said, this can be implemented (currently). But I find this concept difficult to explain: If they travel, $var in the above example is not really an XDM document-node(), but it is an XDM document-node() with some magic sparkle on it, so p:document-property($var) will return the right thing. There has to be some magic sparkle, because an XML document in a variable loaded with fn:doc() would behave in this way.

Honestly, I am not sure, the magic is worth it: I could live with functions without parameter:

p:document-properties() as map(xs:QName,item()*)

saying that this function will return the document properties associated with the context item of the expression. In XProc the context item of an XPath expression is an XProc document (representation/document-properties), so no magic has to be explained. The only setback I see so far is that this

<p:load href="some-url" />
<p:variable name="representation" select="."/>
<p:variable name="properties" select="p:document-properties($representation)" />

will not work anymore, because the function call in the second p:variable is not allowed (no parameter any more!) But the workaround is easy:

<p:load href="some-url" name="loader"/>
<p:variable name="representation" select="."/>
<p:variable name="properties" select="p:document-properties() pipe='@loader'" />

will do the expected thing.

Last thought:

Something, which occurred to me just last week, is that we could express the concept of an XProc document (as pair of representation and properties) solely using XDM ontology: Since the representation for any content-type can be an XDM item and the properties are a map (an XDM item again), we could understand an XProc document as an array with two entries: representation and properties. If we would follow this line of thought, we could get rid of the p:dodument-properties() function by just saying, that the XProc processor exposes every document as an array (or a map, if you would like this better) to XPath:

<p:variable name="representation" select=".(1)" />
<p:variable name="properties" select=".(2)" />

But for this kind of change it is IMHO far to late and it would require all XProc authors coming from version 1.0 to rethink their XPath expressions.

Note: I made the case for p:document-properties-document() here, but I think the arguments given also apply to p:document-properties() (returning a map) and p:document-property() (returning an entry of the map).


Any questions, objections, comments, or cheers welcome.