Should we allow distinction between nulls and unset slots? #1975

cmungall · 2024-03-16T00:16:19Z

cmungall
Mar 16, 2024
Maintainer

Currently in the LinkML instance metamodel there is no distinction
between the JSON objects below:

{"foo": null, "bar": 1}

and

{"bar": 1}

and (in the case of foo being multivalued)

{"foo": {}, "bar": 1}

and

{"foo": [], "bar": 1}

This may seem odd coming from the perspective of a programming
language such as Python or frameworks like Pydantic, which distinguish
the above. Similarly for object serializations modeled after
programmatic constructs, such as JSON and YAML, where there is a
difference between all of the above.

The reason for this is that LinkML is primarily concerned about the
semantics of data, and there is no semantic distinction between a null
value and a missing value.

However, LinkML is also intended to be a pragmatic and user-friendly
framework, prizing practicality over ideology. We are open to
extending the framework to allow for a distinction here (with the
current behavior remaining the default.

However, if we were to do this, it would introduce complications in
how LinkML is used in combination with other frameworks, such as both
relational frameworks and RDF.

In relational systems, the first two cannot be distinguished without
introducing some kind of special value to mark "unset values" as
distinct from NULLs. (In fact, the inventors of relational databases,
Codd and Date, were both against the introducion of NULL).

In the case where foo is multivalued, a linking table would be
introduced; for example if the parent class is C, then the linking
table might be C_foo. Either this table has rows or it has not, so
there would be no way of distinguishing nulls (form 1) from unset
(form 2) from a zero-length connection (forms 3 or 4), without
introducing some other kind of marker.

In RDF systems there is not even a concept of NULL. However, this is
not needed, RDF can be seen as structurally similar to a normalized
RDBMS, where each triple is a distinct assertion, and the absence of a
triple means the same as NULL. So in RDF there is no way to
distinguish the 4 forms above, without introducing some kind of ad-hoc
non-standard marker.

Note this disconnected has always existed for JSON-LD. JSON-LD
provides a way to model data in a natural JSON form, but with a
mapping to RDF.

The JSON-LD 1.1 specification has this to say in section
1.4 regarding null in
JSON:

"A map entry in the body of a JSON-LD document whose value is null has
the same meaning as if the map entry was not defined"

This means that if we were to introduce a distinction into LinkML,
then this could only be used in combination with a subset of
frameworks. If the user tried to generate SQL DDL or JSON-LD Contexts,
then the framework should throw an error by default. Similarly, if
converting from RDF or similar, the default behavior should also be
error-throwing.

While it might be possible to extend support for RDF and Relational
frameworks via the the use of special marker slots or values, this
would introduce a lot of complication and potential impedance
mismatches. For example, in SQL Alchemy models the special marker
values would leak through, making the SQL Alchemy models no longer
isomorphic to the Pydantic models.

Distinguishing NULL forms may still be added to a later version of the
metamodel, but the tradeoffs should be understood.

cmungall · 2024-03-20T17:35:24Z

cmungall
Mar 20, 2024
Maintainer Author

An addendum to the above. In these examples when I use "foo": {} this denotes an empty collection, i.e foo is multivalued.

In the cases where the range of foo is a class and it is singlevalued, it should be possible to use {} to denote an empty object, which is distinguishable from no object.

1 reply

cmungall Mar 23, 2024
Maintainer Author

Some somewhat relevant discussion about DICOM in RDF and Nulls: https://lists.w3.org/Archives/Public/public-semweb-lifesci/2024Mar/0008.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linked data Modeling Language

Should we allow distinction between nulls and unset slots? #1975

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Linked data Modeling Language

Should we allow distinction between nulls and unset slots? #1975

cmungall Mar 16, 2024 Maintainer

Replies: 1 comment · 1 reply

cmungall Mar 20, 2024 Maintainer Author

cmungall Mar 23, 2024 Maintainer Author

cmungall
Mar 16, 2024
Maintainer

Replies: 1 comment 1 reply

cmungall
Mar 20, 2024
Maintainer Author

cmungall Mar 23, 2024
Maintainer Author