Replies: 12 comments 44 replies
-
Variation on the proposal: Instead of requiring the canonical unit symbol to be the "one and only one 1 factor" in I propose Real-world example: data_types:
colonial-length:
derived_from: float
base_units:
inch: 1.0
inches: 1.0
'"': 1.0
foot: 12.0
feet: 12.0
"'": 12.0
yard: 36.0
yards: 36.0
canonical_unit: '"' MORE! Note that there is another possibility for grammar that has the advantages of both approaches:
It's a bit more complicated to explain but balances ease of use while still allowing edge cases. If users get it wrong they will get a clear error and can decide how to fix it, either by adding a |
Beta Was this translation helpful? Give feedback.
-
The above broadly accords with my understanding of our discussion in the meeting and I can see that it is a reasonable simpilifaction of the syntax. However:
|
Beta Was this translation helpful? Give feedback.
-
point 1. My main concern is to have a word which is not 'unit' for that map as in some cases the indexes in teh map are in fact only a substring of the final unit string. How about calling the map 'multipliers'? I am OK with the concept of canonical_unit_symbol but it is quite a long keyname and you prefer short names, It should only be allowed where 'prefixes' is not used (my point 5) in which case 'canoical_unit' is enough. |
Beta Was this translation helpful? Give feedback.
-
When 'prefixes' is used then unit is derived from the key associated with the mulitplier and the prefix key. When 'prefixes' is not used then unit is equal to the key associted with the muliplier. i.e. the unit for kilometers is Km. So it is incorrect to call either the K or the m a unit. Therefore table containing the multipliers must have a name other than unit. |
Beta Was this translation helpful? Give feedback.
-
I've drafted the required changes in my repo, |
Beta Was this translation helpful? Give feedback.
-
Yes I confirm that when we use prefixes there can only be exactly one base-unit. |
Beta Was this translation helpful? Give feedback.
-
I think having a canonical unit is necessary for any real-world usage. It seems you two are not convinced, but I haven't heard any specific response to the rationales I provided: representation, data storage, data transfer, calculation precision. Consider a specific "transfer" use case of updating a scalar-unit attribute. In what unit should the number be? I actually proposed 3 different ways to determine the canonical unit: an automatic heuristic, a mandatory keyname, or a combination of both (non-mandatory keyname). I was hoping to hear from you folk what seems best, because I didn't have a strong opinion. However, I think I'm leaning on the "both" proposal variant. For most uses the heuristic will just work. When it doesn't work, there will be a clear error message and designers can just add the keyname. The only real disadvantage of this proposal variant is that it's a bit harder to explain in the spec. Somehow I missed the point that prefixes only apply to one single base unit, but it does make sense! I tried to think of a real world use case in which there might be more than one, but couldn't think of any. (Although I wonder if the fact that I can't think of any is just due to my limited knowledge...) Here is my new proposal. There would be two ways to define a scalar unit type:
And there is an additional non-mandatory keyname:
|
Beta Was this translation helpful? Give feedback.
-
New proposal based on ad hoc discussion. A new TOSCA type named {optional whitespace} + {a number in YAML notation} + {whitespace} + {a unit symbol} + {optional whitespace} Note that we should support all ways that numbers can be represented in YAML. This includes floating points and integer representations, scientific notations, hex, octal, etc. So implementations should best use their YAML parser to parse the number string. The When you derive from For maintaining calculation precision, scalar data must be either Simple examples: data_types:
Frequency:
derived_from: scalar
# default to data_type: float
units:
hz: 1.0
khz: 1000.0
mhz: 1000000.0
DataSize:
derived_from: scalar
data_type: integer
units:
b: 1
kib: 1024
mib: 1048576 At least one of the units must have a factor of 1.0 (or 1), called the canonical unit. This is used for translating raw numbers back to the scalar representation (which might have to happen in TOSCA as well as in cloud and storage systems). For example, if you subtract "1 b" from "1 kib", the result would be represented as "1023 b", and indeed be stored as such. Note that for If there is more than one unit with a factor of 1.0, then the data_types:
ColonialLength:
derived_from: scalar
units:
inch: 1.0
inches: 1.0
'"': 1.0
foot: 12.0
feet: 12.0
"'": 12.0
yard: 36.0
yards: 36.0
canonical_unit: '"' It is also possible to auto-generate the units via the addition of prefixes to all units. Note that an empty-string prefix must be explicitly specified if appropriate: data_types:
Frequency:
derived_from: scalar
units:
hz: 1.0
Hz: 1.0
prefixes:
'': 1.0
μ: 0.0001
m: 0.001
k: 1000
M: 1000000
g: 1000000000
canonical_unit: Hz Note that |
Beta Was this translation helpful? Give feedback.
-
I'm in general agreement with the summary proposal but would make the following comments: As written it is not clear to the reader why we have the prefixes keyword. I would like to keep an example which re-uses the same prefix map across more than one scalar derivation. The current spec references such an example from the dsl_defintions section so it servers two purposes. The examples provided by Tal are look a bit like SI but are not quite right: The given example for datasize could use the built-in datatype of bytes The current spec needs to adapt the range example as it currently uses scalar-unit.frequency which no longer exists. We discussed including a warning that implementor may want to guard against word overflow when performing calculations on multipliers with many digits. Mulitpliers for scalars with a data_type value of float may be given as integers because of the TOSCA rule that integers can be converted to floats (see float section). Time will need to be editted to this new syntax. Do we want to reinstate time as an in-built data type but specified using this syntax? As an aside I was interested to see the term 'Colonial length' which I've not come across before. In the UK we would call this Imperial length. Still OASIS uses American English so it can remain :-) |
Beta Was this translation helpful? Give feedback.
-
I support making all of these changes, but I also would like to get a final draft of the spec done before the weekend. Any chance we can make that happen? |
Beta Was this translation helpful? Give feedback.
-
Regarding validation. The working branch currently has a conflict - I disallowed validation of scalars because it seemed to hard to address the complexities at this late stage but failed to remove one from the example - see line 4722. That has a format rather different to your suggestion but closer to what I think a user would like. |
Beta Was this translation helpful? Give feedback.
-
I think that we should allow scalar type derivation - but only for canonical_unit. e.g. allow TimeInSeconds and TimeInYears to be derived from Time. That would sensible presentation of say ProtocolTimeout and PersonAge BUT ALSO allow comparision of any unit derived from Time by refernce to the canonical unit of the common parent. More implementation complexity though. |
Beta Was this translation helpful? Give feedback.
-
Following up on our ad hoc discussion, I propose the following tweaks:
To create a scalar-unit type you need to do two things at minimum:
integer
orfloat
or any derivative of them (Note: does it make any sense to derive from another scalar-unit type? Could that be useful in any way, e.g. to add additional unit symbols? Probably an unnecessary complication, but if we do support it we have to explain how theunits
are "refined" in inheritance.)base_units
keyname (Proposed rename fromunit_symbol_map
; rationale: shorter, and also the rows are actually both unit symbols and factors andunit_symbol_factor_map
would be way too long)The
base_units
keyname is a map of unit symbols to factors (multipliers). Depending on the base type, the factors must be either integers or floats. (Note that factors could potentially be negative. Hard to think of real-world use cases, but no reason to forbid it. Zero would also be weird, but possible.)One (and only one) of these entries must have a 1 factor (or 1.0 for float). Otherwise it will be an error. (Implied:
base_units
can never be an empty map, it needs at the very least a 1 or 1.0 unit symbol). For other entries, there could be more than one unit symbol mapped to the same factor, allowing for notation variations.This required 1 (or 1.0) factor unit symbol is called the "canonical unit". It can optionally be used by TOSCA implementations to convert all data to the canonical unit for calculations (preserving precision), uniform storage, representation, etc.
Rules for unit symbols (must be validated by implementations):
.
. (Otherwise parsing a scalar-unit string could break in crazy ways. A numeric digit in the middle or the end shouldn't cause any problems. Weird, but unproblematic.)Also note that unit symbols are case sensitive, if relevant (they don't have to include letters). To support different case variations, just add all the variants to
base_units
and map them to the same factor.Minimal example:
Note that you can add a
validation
keyname just like with any other type, which is often a good idea for scalar-units for quantities:AUTO-GENERATING SYMBOLS VIA PREFIXES
One can optionally add a
prefixes
keyname to automatically fill in the units map with many entries. The syntax forprefixes
is identical to that ofbase_units
(but without the requirement of having a canonical symbol).The way it works is that the actual units will actually be every prefix with every base unit combination (written consecutively, with no whitespace between), with both factors multiplied.
Important! If you want to have entries with no prefix, you explicitly have to add an empty string prefix with a factor of 1 or 1.0!
Simple example:
The actual units for this example would be all the combinations. If we wrote them explicitly without
prefixes
it would be equivalent to this:Important little implementation note: When
prefixes
is present, the "canonical unit symbol" must be checked for after generating all possible units. This is to make sure the final result indeed has a single entry with a 1 or 1.0 factor.The current example in the spec of using YAML anchors to reuse prefixes is excellent!
Beta Was this translation helpful? Give feedback.
All reactions