Replies: 1 comment
-
Moving simultaneous comment from: #1961 Is your feature request related to a problem? Please describe. The generators each support some different subset of the metamodel, and it can be tricky to know when you can rely on a generator to give you a faithful representation of a given schema. This also makes it difficult to track propagation of changes in the metamodel to the generators - the motivating example in this case being array support. Generators should have some way to declare what features they support. Challenges:
Some related/illustrative issues:
Describe the solution you'd like A start would be to add a classvar to class Condition(BaseModel):
type: Literal['parameter', 'etc']
key: str
value: Any
class Feature(BaseModel):
when: Optional[Condition] = None
class ArrayFeature(Feature):
anyshape: bool = False
labeled: bool = False
class GeneratorSupports(BaseModel):
arrays: bool | ArrayFeature | list[ArrayFeature] = False where we might allow something like class PydanticGenerator:
supports: ClassVar[GeneratorSupports] = GeneratorSupports(
arrays = [
ArrayFeature(when={'type':'parameter', 'key': 'array_representation', 'value': 'Numpydantic'}
anyshape = True
# ...
), # ...
]
) or we could just flatten the whole thing out. might be easier to start with that since it would be simpler. Then we would be able to simplify all the special casing in the How important is this feature? Select from the options below: let's call this "would make our lives easier, but would require a decent amount of refactoring" When will use cases depending on this become relevant? Select from the options below: |
Beta Was this translation helpful? Give feedback.
-
The concepts of metamodel profiles and monotonicity are linked, but first some general background. This is intended for people thinking deeply about the framework.
Currently the LinkML metamodel has a semi-formal notion of profiles, these are listed on the metamodel site, e.g https://linkml.io/linkml-model/latest/docs/BasicSubset/
Currently these are represented as subsets in the metamodel. Their current uses are primarily for documentation. The goal has always been to use these programmatically, to formally define profiles that can be used to declaratively answer the question of a piece of linkml tooling "does T support profile P?", and by extension "are there features of my schema that are not supported by tool T?".
For example, if I am using LinkML to describe a standard that is used in a stack that also uses protobuf, I would like to know if there are parts of my schema that cannot be represented in protobuf, so I can act accordingly (e.g. limit the expressivity of my schema, or modularize things such that the additional expressivity is an add-on and not required).
When thinking about target frameworks it's not a simple binary yes/no answer for whether a feature is supported. Broadly there is a spectrum:
For example, if you use
any_of
to express a variety of ranges, this cannot be directly expressed in the pure relational model. If you useis_a
it cannot be directly expressed in many target frameworks (strict relational, jsonschema, ...).However, in some cases there may be an entailment-preserving transform
T
that maps the schema to something that is expressible. This is exemplified by relmodel_transformer which performs well-understood rewrites to something that is expressable; e.g.multivalued
into additional classes with backlinksThese transforms are not restricted to any one generator such as sqlddl. logical_model_transformer implements standard logic rewrites to obtain a "normal form" schema, where hierarchies are translated to
all_of
and simplification rules are applied to reach normal form (unsatisfiable classes are also detected this way).So each of the options 1-3 above has sub-options for direct vs indirect.
It gets more nuanced. In some cases a transformation may be possible, but still undesirable as it may introduce non-isomorphism between frameworks. Consider again the case of mapping to the pure relational model. The rewrite for mapping
multivalued
to a backreference on a linking table is transparent to a user, they can use the generated sqlalchemy classes as if they were (largely) the same as classes generated by pydanticgen (because SQLA has a mechanism for equivalent rewrites). However, in contrast, if our schema has a slots
with a rangeint | string
this could be rewritten as two slots (s_int
ands_string
); however, AFAIW this mapping "leaks out" and the object models is no longer isomorphic with the pydanticgen one.This brings us back to monotonicity. Most frameworks can only express - directly or indirectly - a subset of the LinkML metamodel. It is acceptable for a generator to implement an incomplete mapping, it should never implement an invalid mapping; it should produce no new entailments.
A consequence of this is that there should be no overrides. Being aware of this helps mentally reason over some linkml behavior.
For example, people coming to the language may end up advertently or inadvertantly doing something like this:
The interpretation they want is that the any-of takes precedence. However, this is non-monotonic, i.e introduces overrides. To see why this is desirable, consider a generator that can't directly translate the any_of construct. It would still interpret the
range
construct. But the result would be more restricted than the intended model.The correct interpretation of the above is that all constraints are applied, and as a result the slot is unsatisfiable, and should be reported as such.
instead what the user should do here is layer on constraints, e.g.
In this case a generator that does not translate
any_of
still produces a target representation that is valid, it is just less complete.Ultimately we want a more declarative way to represent profiles supported by generators, such that this is more transparent. The simplest way to do this is via a feature matrix of language construct x generator, where the values are from an enum such as DirectSupport, Mappable, Coerced, Ignored, ...
This is essentially what https://github.com/orgs/linkml/discussions/1549 does. However, it uses a strategy of pytest combinatorics plus programmatic X-ing out of the matrix, with results written in a format that could be used to generate a website showing the natrix. This has some advantages but is ultimately a bit unsatisfying since really we want the generators to be introspectable with what is supported. I think we can move towards this strategy incrementally.
This is what I am thinking, but this may be overly complicating things:
This maximizes DRY, no need to maintain large feature matrix profiles for each generator, just maintain where support deviates from the profile.
The fact that some generators are parametrizable complicates the picture a bit. E.g. sqlddlgen supports more features when pg is the target database than when sqlite is. OWL-DL can't support
Any
without punning unless the user choosestype-objects
. But I think here we assume the maximal profile and allow individual options to document their own exceptions.Finally, when I say "constructs" I don't just mean individual metamodel elements. If you look at the existing compliance tests, there are combinations of elements that constitute an individual feature, so we have to account for that.
Beta Was this translation helpful? Give feedback.
All reactions