Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discussion on Datatypes and Units of Measure #152

Open
VladimirAlexiev opened this issue Jan 6, 2025 · 3 comments
Open

discussion on Datatypes and Units of Measure #152

VladimirAlexiev opened this issue Jan 6, 2025 · 3 comments

Comments

@VladimirAlexiev
Copy link
Collaborator

Moved from 3lbits/CIM4NoUtility#338 .
@bartkl provided extensive feedback:

Questions of compliance with the CIM

How do these choices affect questions of CIM compatibility?

The master CIM model is still the UML in Sparx EA, and - whether we like it or not (and we don't) - in it the CIM data types are represtented as classes and not scalars. Changing this breaks backward compatibility, which can be worthwile, but we should be aware of this and not take this too lightly.

Note

This is not just theoretical. As discussed before, there are users who actually adhere to the standard as precisely as to represent CIM data types as classes. In fact, the data products created by my team for Netbeheer Nederland do too.

Mapping to QUDT or using it directly

Using QUDT quantity kinds directly

Again, happy to see QUDT being leveraged. Having values be actual scalars of a proper datatype is proper and simply much, much better than the convoluted CIM data type classes we have now.

I do have a few questions though.

You write:

Keep the CIM-specific quantity kind since it often has a more electricity-specific description than is available in QUDT

Just asking out loud: can't we just use the QUDT quantity kind directly and add extra descriptions to it? These extra statements would be part of our ontology, and therefore their scope is clear.

For example:

qk:ApparentPower rdfs:comment  "Product of the RMS value of the voltage and the RMS value of the current." .

or perhaps something more semantically rich like:

cim:description rdfs:subPropertyOf rdfs:comment .

qk:ApparentPower cim:description  "Product of the RMS value of the voltage and the RMS value of the current." .

On the other hand, semantically it does seem more explicit to maintain our own URI (cim:ApparentPower) and make statements about that. Happy to learn from your breadth of experience here!

One last question regarding the alignment between the CIM and QUDT: I take it you deliberately align using SKOS and not more formally using OWL. Why is that? Would a statement like cim:ApparentPower owl:sameAs qk:ApparentPower be too committing because you don't wish to imply that those are the same individuals?

Why cim:unitMultiplier and cim:unitSymbol?

You write:

Link to a global QUDT unit, but also give the multiplier and unitSymbol separately, using cims: props

This regards, specifically, the new cim:unitMultiplier and cim:unitSymbol properties I guess, and their range classes. Why do we want those at all?

Instead of:

cim:UnitSymbol a owl:Class ;
  skos:exactMatch qudt:Unit.

cim:UnitSymbol.VA a cim:UnitSymbol ;
  qudt:hasQuantityKind cim:ApparentPower;
  skos:exactMatch unit:V-A.

Why not:

cim:UnitSymbol.VA a qudt:Unit ;
  qudt:hasQuantityKind cim:ApparentPower;
  skos:exactMatch unit:V-A.

And frankly, here I still wonder why we create our own unit symbol when QUDT has the one we need (as evidenced by the exact match statement.

So why not simply leave out the custom CIM unit symbols and multipliers if we have then in QUDT anyways:

cim:ACDCConverter.baseS a owl:FunctionalProperty , owl:DatatypeProperty ;
  rdfs:domain          cim:ACDCConverter ;
  rdfs:range           xsd:float ;
  qudt:hasQuantityKind cim:ApparentPower;
  qudt:hasUnit         unit:MegaV-A;

Perhaps, similar to with the quantity kind, you want to retain the possibility of providing extra descriptions. But I would doubt the need for that for something like SI symbols and multipliers. Wouldn't we gladly take our hands off of this?

Now, in those cases where QUDT does not cover our needs, we could still create our own QUDT units, symbols and multipliers. That's perfectly valid of course. But again, for those already available to us I don't understand why we would create our own layer of indirection (except for with quantity kinds, where I understand the need for extra descriptions).

Fixing unit symbols and multipliers and defaults

It's not entirely clear to me what we want to do with the fixing of symbols and multipliers. I guess some profiles might want to fix those for some quantity kind at the profile-level, others might have good reason to be more lenient and for example expect mega volt in one place, but kilo volt in another in the same dataset. Have we decided yet how we want to move forward here? And if so, do we know how to represent this technically?

At the very least it seems to me that the vocabulary should not state any fixed multipliers or symbols, nor should it state defaults. It should define the vocabulary that can be used to express such things, but since all of that is use case dependent - and even as specific as equipment dependent - I think this is properly the responsibility of schemas and validation, i.e. of your LinkML and SHACL models. Would you agree?

Final remarks

  • Any ideas how some of the typos (such as trailing spaces) could have gotten into the models? A decent serializer should be able to solve this, right?
  • If not done already, domain experts should perhaps have a look at the final mapping table. But it looks good to me!
  • LinkML supports QUDT right from its metamodel. As part of my work of describing how to map to the LinkML, I at some point would like to include writing about how to map all of the above properly.
@VladimirAlexiev
Copy link
Collaborator Author

VladimirAlexiev commented Jan 6, 2025

(in UML) CIM data types are represtented as classes and not scalars.
data products created by my team for Netbeheer Nederland do too.

Can you provide instance data examples?
All CIM XML instance data that I have seen uses literals for these props.
If the instances don't conform to the ontology then what backward compatibility problems are created by fixing the ontology?
BTW the instances use strings, which #49 converts to floats.

use the QUDT quantity kind directly

We could map to QUDT directly: we should take a vote.
Here are some considerations:

The mapping table is in https://github.com/Sveino/Inst4CIM-KG/tree/develop/rdfs-improved#mapping-quantitykinds-and-units . Most mappings are 1:1 but there are a couple of exceptions:

add extra descriptions to it? These extra statements would be part of our ontology

Some say that adding your own statements to terms in a foreign namespace is "namespace hijacking". I'm more relaxed about it, but if we use a second value of rdfs:comment a consumer would not know which one to display.

cim:description rdfs:subPropertyOf rdfs:comment

Here the consumer should know to use cim:description and not the standard prop rdfs:comment

cim:ApparentPower owl:sameAs qk:ApparentPower?

That's too strong because it will merge the statements of the two resources. And as you see above, a couple of CIM qk map to the same QUDT qk: but we don't want to say cim:AngleDegrees owl:sameAs cim:AngleRadians.
skos:exactMatch makes a much weaker commitment.

new cim:unitMultiplier and cim:unitSymbol properties I guess, and their range classes. Why do we want those at all?

CIM does have classes cim:UnitMultiplier, cim:UnitSymbol with a large set of individuals. But they were not used by the datatype props.
I replaced the prop-specific cim:ApparentPower.multiplier, cim:ApparentPower.unit with a pair of universal props, and made them point to those individuals.

We could declare the CIM individuals as qudt:Unit, but I think that qudt:Units are supposed to have a rich description including conversionMultiplier, which CIM units don't have.

why we create our own unit symbol when QUDT has the one we need
why we would create our own layer of indirection

I agree with you. We can get rid of CIM units and multipliers and use QUDT units directly: we should take a vote.

But only a small part of CIM units are mapped, since only a small part are used by CIM properties, and I didn't bother to research all of them. This means no trace would be left of the ones that are not used.

Wouldn't we gladly take our hands off of this?

I completely agree!
It's not CIM's business to define units, nor it is CIM's business to define geospatial representations (should reuse GeoSPARQL).

we could still create our own QUDT units, symbols and multipliers.

QUDT welcomes the addition of such, so we should do it in QUDT namespaces and thus enrich it. We only need to research the requisite rich characteristics. See qudt/qudt-public-repo#970 for the 4 new qk that we need (and 8 units).


A summary of my replies above:
@bartkl proposes a number of further simplifications. I agree to do them, but I think we need a vote.


what we want to do with the fixing of symbols and multipliers. I guess some profiles might want to fix those for some quantity kind at the profile-level, others might have good reason to be more lenient and for example expect mega volt in one place, but kilo volt in another in the same dataset.

All instance data I've seen carries pure numbers (actually strings), no indication of unit. There was some discussion to allow flexibility, but I haven't seen a list of data props that needs such flexibility.

do we know how to represent this technically?

Yes.

the vocabulary should not state any fixed multipliers or symbols, nor should it state defaults.
all of that is use case dependent - and even as specific as equipment dependent

Maybe you are right in general, but can you cite specific requirements for such flexibility? All instance data I've seen doesn't carry any units.

this is properly the responsibility of schemas and validation, i.e. of your LinkML and SHACL models.

Yes, the qk and unit annotations should originate in LinkML, but we should also express them in RDF. Attaching them to datatype props amounts to annotations on those props, and is a good approach.
As for SHACL, it cannot validate something that's not present in instance data.

@bartkl
Copy link

bartkl commented Jan 6, 2025

Can you provide instance data examples?

I've been relying on colleagues who make this point, but to be honest with you, the more I speak with you, Svein and Todd, the more I grow skeptical about what they claim exactly. I will discuss with them once more. Would love to finally resolve this discussion (at least within our team, haha).

Most mappings are 1:1 but there are a couple of exceptions

Yes, I noticed. I don't see a problem with that though, since we can define our own CIM quantity kinds (and such) where needed. Those can then be proposed to become part of the QUDT standard if suitable, and replace our custom defs in a future release.

Some say that adding your own statements to terms in a foreign namespace is "namespace hijacking".

I would imagine that defining new terms within a foreign namespace is a bad and improper idea, but adding statements about existing definitions in one's own ontology - and most notably mere descriptions - I don't see an issue with. But neither do you, if I read you correctly 🙂.

Here the consumer should know to use cim:description and not the standard prop rdfs:comment

Yes, maybe not the best idea. Was just looking for a way to distinguish between one and the other type of descriptions other than introducing a CIM term for the quantity kind. Reification is also an option, but I really dislike its complexity and it's highly specific to RDF.

skos:exactMatch makes a much weaker commitment

Yes, I understand. Was just making sure that's what we want here.

I replaced the prop-specific cim:ApparentPower.multiplier, cim:ApparentPower.unit with a pair of universal props, and made them point to those individuals.

I agree we should make a move from these specific props to universal props, but I don't see why we cannot make use of qudt:hasUnit and qudt:prefix (might not be the exact right ones, but you get the idea) directly instead of introducing cim:unitSymbol and cim:unitMultiplier. I'd like to avoid as much as possible having our own definitions to maintain.

QUDT welcomes the addition of such, so we should do it in QUDT namespaces and thus enrich it. We only need to research the requisite rich characteristics. See qudt/qudt-public-repo#970 for the 4 new qk that we need (and 8 units).

I agree this is the preferred route. But if it takes a long time for a new release of QUDT to appear and we don't want to wait, I would propose using the cim namespace and not hijack a QUDT one until they've adopted our addition.

Maybe you are right in general, but can you cite specific requirements for such flexibility? All instance data I've seen doesn't carry any units.

I lack the experience to properly address this, although colleagues have expressed some preferences/ideas here. I will ask for their input. @Sveino , I'm pretty sure I've seen you talk about this as well. Could you perhaps weigh in too?

But in an attempt to be a bit more concrete myself: even if the instance data would not carry units, the vocabulary will tell you what the appropriate unit and multiplier is (cim:isFixed I believe). I think at the very least we should move this to the LinkML/SHACL, which is a more proper place as well as a more flexible one, since it would allow using different units for the quantity kind used in several places in the model.

For example:

Building:
  attributes:
    height:
      range: decimal
    unit:
      ucum_code: m  # meters in this case
      has_quantity_kind: qk:Height
Person:
  attributes:
    height:
      range: decimal
    unit:
      ucum_code: cm  # centimeters in this case
      has_quantity_kind: qk:Height

This flexibility of using the same quantity kind with different units within the same profile I feel should absolutely be possible. And even if instance data does not (necessarily) make use of it, users still can reference it.

Yes, the qk and unit annotations should originate in LinkML, but we should also express them in RDF. Attaching them to datatype props amounts to annotations on those props, and is a good approach.

In the spirit of having one master representation I would envision generating the SHACL from the LinkML here. Do you agree? This could of course involve making changes to the standard LinkML SHACL generator, or creating our own (which shouldn't be too hard).

As for SHACL, it cannot validate something that's not present in instance data.

But one will have the option of including it in instance data, right? As you say, they are just annotations basically, so nobody's hurt (compatibility-wise) if they are added. And people could still refer to the docs to read about the units used.

@VladimirAlexiev
Copy link
Collaborator Author

CIM props are overspecified (which I don't like) so you'd have Building.height vs Person.height, and you can attach specific units to those RDF props.
(BTW, bSDD captures universal prop definitions that you can override at the class level, including unit, min, max etc).

AFAIK, you cannot express prop units in SHACL. You can express it as prop annotations, as I've done in the CIM ontologies.

If you want to capture per-prop-instance unit, then you need an intermediate node.

  • for uniformity, you should always use such node, even if for most observations the default unit is used (so can be omitted)
  • or you could split each prop in two, like sosa:hasResult vs sosa:hasSimpleResult
  • or could use RDF-star instead of intermediate node, but that's a different beer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants