Namespacing support within LinkML #1739

cmungall · 2023-11-16T18:06:17Z

cmungall
Nov 16, 2023
Maintainer

LinkML uses namespacing/prefix aware identifiers when referencing anything outside the model (e.g. class_uri: schema:Person). It also optionally uses namespacing/prefixes to refer to other schemas that can be imported.

However, for any particular schema, the namespace is flat. Furthermore, imports work like import * and everything is imported into the same namespace.

This has a lot of advantages in terms of simplicity. However it can confound users.

those from a semweb background assume they can reference classes etc via CURIEs
those from a programming background expect to be able to:
- selectively import, analogous to import Person from schema_org
- locally alias, analogous to import schema.Person as SchemaPerson
- import a module leaving the current namespace intact (e.g. import schema_org), and refer to elements using namespaced identifiers (e.g. schema_org.Person)

Having one or more of these mechanisms would make it easier to reuse schemas, without worrying about name clashes

We should discuss ways of enabling one or more of these. We should also reason through the implications. For example, slot aliasing or namespacing could be problematic when working with json, which doesn't support namespacing.

Note also that some of the renaming use cases may be better served by profiling / linkml-transformer.

Some existing issues:

Clarify semantics of imports #29

saxomoose · 2023-11-17T08:50:06Z

saxomoose
Nov 17, 2023

those from a semweb background assume they can reference classes etc via CURIEs

Related issues:

0 replies

saxomoose · 2023-11-18T14:06:58Z

saxomoose
Nov 18, 2023

I believe this discussion is quickly going to point towards package managers and registries + versioning. Managing dependencies between LinkML models like code libraries would be great. Idea has already been discussed elsewhere, but this discussion may be good place to reopen it. If so, it might be worth restating the title of the discussion in those terms.

A few pointers:

LinkML registry: https://linkml.io/linkml-registry/registry/
plow: https://registry.field33.com/
Chris' 2014 blog entry: https://douroucouli.wordpress.com/2014/03/30/the-perils-of-managing-owl-in-a-version-control-system/

I'm curious about using established tools such as pip and PyPI for that purpose. Is it sensible? What would be missing?

1 reply

saxomoose Nov 19, 2023

docassemble using PyPI for similar use case: https://docassemble.org/docs/packages.html

cmungall · 2023-11-20T16:00:06Z

cmungall
Nov 20, 2023
Maintainer Author

I like the idea of piggy backing off an existing system but not sure python package management is the best model. FHIR makes use of npm tooling which might be worth looking into. I have meaning to look more into plow

…

On Sat, Nov 18, 2023 at 6:07 AM Mathieu Tulpinck ***@***.***> wrote: I believe this discussion is quickly going to point towards package managers. Managing dependencies between LinkML models like code libraries would be great. Idea has already been discussed elsewhere, but this discussion may be good place to reopen it. If so, it might be worth restating the title of the discussion in those terms. A few pointers: - LinkML registry: https://linkml.io/linkml-registry/registry/ - plow: https://registry.field33.com/ - Chris' 2014 blog entry: https://douroucouli.wordpress.com/2014/03/30/the-perils-of-managing-owl-in-a-version-control-system/ I'm curious about using established tools such as pip and PyPI for that purpose. Is it sensible? What would be missing? — Reply to this email directly, view it on GitHub <#1739 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAMMOISZJ3Y5NH4NEW7TZTYFC6I3AVCNFSM6AAAAAA7OSKMTOVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMMBWGYZTI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

2 replies

saxomoose Nov 20, 2023

Thanks to branches, tags and releases, using git and github is also an option. For a minimal package management system, it would cut it.

bartkl Mar 12, 2024

Although very interesting, if I'm understanding correctly this will not solve the namespacing issue originally discussed here, right? It's not going to solve the issue of potential name clashes and inflexible importing for instance.

Certainly tangentially related and useful though. Plow seems interesting in particular!

sneakers-the-rat · 2024-01-24T00:21:10Z

sneakers-the-rat
Jan 24, 2024
Collaborator

Fwiw, I dont address namespacing within a schema, but I did write a schema provider system that can handle multiple versions of a schema
https://github.com/p2p-ld/nwb-linkml/blob/main/nwb_linkml/src/nwb_linkml/providers/schema.py

Including providing them from a git repo: https://github.com/p2p-ld/nwb-linkml/blob/main/nwb_linkml/src/nwb_linkml/providers/git.py

And that was specific to a particular format (NWB), but I did find that working with linking together multiple versions of a schema (that in turn link to multiple versions of another schema) that I needed to do something like an NPM-like structure, except instead of recursive directories with symlinks I handled that in the provider class, mostly because I didnt want to rewrite npm lol.

I think one reason this is confusing is that similar semantics are used for linkML imports as for LD prefixes - with linkML imports we can assume a like-kinded schema that schemaview knows about, but doing something similar like being able to eg. Subclass from a FOAF model would be awesome but impossible most of the time. LD made a sort of a mess of actually using ontologies, unfortunately.

As far as import syntax goes I think it would be nice to be able to do something like this:

imports:
  - mySchema:
    from:
      git:
        repo: https://git.example.com
        path: dir/subdir/mySchema.yaml
        ref: v0.1.0 # (or any tag or commit hash)
    include:
      - myClass # regular kind
      - class: myOtherClass
        as: myOtherClassWSuffix

(pretend I formatted that right, im on mobile)

I think that would get sorta bonkers to implement with the current flat schemaview, but if we made it recursive so each schema was resolved in a contained way then it would be v possible. See: #1839

0 replies

kervel · 2024-02-02T18:44:03Z

kervel
Feb 2, 2024
Collaborator

Hello,

What would be the steps necessary to introduce proper namespacing ?

Adapt schemaview so that you can use a uri/curie in is_a / range. This doesn't seem to work yet.
Make the schemaview collections of fields/classes have a schema in the key. Still prevent 2 classes or slots with the same name even in different schema. Make sure that if we use a curie in a range or in a is_a we enforce the correctness.
Introduce new schema-aware API in schemaview
Make schemaview opt-in for namespacing so that only generators that are namespace-aware can accept schema definitions where you have multiple types with the same name in different schemas.
Start changing the generators ?

I guess this is a gross oversimplification. I don't know what a schema aware API would even look like. But trying to get the discussion started.

greetings,
Frank

2 replies

sneakers-the-rat Feb 27, 2024
Collaborator

imo the easiest path to implementation is to make schemaview imports recursive (ie. rather than one class doing everything for a whole import tree, have each schema in the tree be its own SchemaView object), and then adapt the imports_closure method to control imports, namespacing, etc. that would keep the combinatorics of imports needing to be computed for every imported schema from exploding

kervel Mar 12, 2024
Collaborator

i don't think i fully agree (or i don't understand you completely). the purpose of schemaview is to expose a complete schema as something as a whole. There are already classes (used by schemaview) that represent the contents of one single file. they are lower level, schemaview is supposed to be high level directly useable by the generators.

I don't think in the end there is an alternative for having qualified slot, enum and and class names in schemas. while "as" syntax as you propose could be used as a workaround, i think its not ideal when you import schema's that are not under your control: they might add a slot or class colliding with one of yours only in a later version without you noticing. Also, in case of transitive imports it becomes a mess (you can easily getting the same thing aliased multiple times)

sneakers-the-rat · 2024-02-27T04:27:29Z

sneakers-the-rat
Feb 27, 2024
Collaborator

Here's a proposal for a syntax, what u think @cmungall :

Requirements

Current behavior remains unchanged - import all into current namespace
Selective import: from x import y
Namespace aliasing: import x as name - doesn't inject all objects into current schema namespace, instead becomes name.Object
Selective import & aliasing: from x import y as name

Nice to have

Exclusion: from x import * except name
Providers: import from local file, git repository, or URL.

Implementation

make schemaview recursive - https://github.com/orgs/linkml/discussions/1739#discussioncomment-8599940
optional __init__ parameter imported for the Imports object created by the metamodel
when top-level schema instantiates imported schemaview, pass Imports object
imported schemaview uses self.imported to filter/rename what is yielded in imports_closure or get_dict
- aliasing: rename yielded objects
- namespacing: prepend namespace to yielded objects names like namespace.ObjectName

Choices

Recursive Schemaview - Making schemaview recursive avoids substantial complexity in handling import trees - each schemaview only needs to know about itself and the specification for its immediate imports. this could also be done within a flat schemaview, but it would be a recursive function anyway, and it would fix a bunch of problems imo aside from imports to make the whole class recursive, eg. previous problems in handling relative paths across multiple import layers. This would, i think, match intuition as well: each schema is "resolved" on its own and is only modified by how it is imported. e
Object renaming - To avoid special casing handling imports eg. behind a dictionary or something, we just literally give them that name, so they behave like any other object in the schema. We might need some additional field like "source_name" and I'm pretty sure there is some notion of source_schema already. The alternative would be to not change the actual object representations themselves and just having schemaview handle that internally - that would involve displacing that job onto the generators, which is potentially good in case different target formats can/can't represent some of the import syntax, but would be a ton more work and potentially introduce inconsistent behavior.
Include/exclude syntax - I think it makes sense to use include as the term for selecting, because then we also naturally get exclude, and the from term can be used to specify a source for the schema.
namespace prop vs aliasing - do we want a special property like namespace: true for inducing imports within a namespace, or do we want to just have a single as property that can be reused for aliases? I prefer the latter, even if it seems repetitive, because it is obvious, simple to implement, and concise. If we were to add a prop, i think it should be to the top-level schema definition like import_mode: namespace to default all imports to namespacing or not.
nth-layer imports - related to recursive schemaview, isolate imported schemas, so eg if i do a namespace import of schema_a and it has a namespace import of schema_b, then it would be schema_a.schema_b.ObjectName. If i wanted schema_b in my current schema i could import it directly, and that won't interfere with schema_a's use and potential modification. by default, generators would render the same class imported in multiple ways multiple times, but addition of a source_class or some other parameter would allow generators to make aliases if they support them.

Syntax

Imports

Extending the example above ( https://github.com/orgs/linkml/discussions/1739#discussioncomment-8226761 ):

Regular - import all objects into current schema:

imports:
- mySchema

Namespaced - import all objects as schemaName.ObjectName . Same syntax as aliasing - import all objects as alias.ObjectName

imports:
- mySchema:
    as: alias

Include - only import the listed objects (implicit exclude all)

imports:
- mySchema:
    include:
      - myClass
      - my_slot

Exclude - import all objects except excluded

imports:
- mySchema:
    exclude:
      - myClass
      - ...

Include & Exclude: only import listed objects except excluded objects (apply include and then exclude) (not sure why you would want to do this, but for the sake of completeness...)

imports:
- mySchema:
    include:
      - ClassA
      - ClassB
      - slotA
      - slotB
    exclude:
      - slotB
      - ClassA  

# imports ClassB and slotA

Aliasing and Include/Exclude:

only import listed objects, with alias prefixed.
same aliasing syntax for included objects

imports:
- mySchema:
  as: schemaAlias
  include:
    - ClassA:
        as: ClassAlias
    - slotB

# alternatively...
  include:
    - name: ClassA
      as: ClassAlias
    - slotB

# imports schemaAlias.ClassAlias and schemaAlias.slotB

Providers

Currently import schemaName is "import from schemaName.yaml in the same directory as current schema," and also supports relative/absolute paths.

Support ability to specify where a schema comes from using a from property. This makes for a neat place for plugins - other packages could define their own providers if they want to (or pull them to main) that declare they provide schema when using a specific type of from.

from is mutually exclusive to implicit path syntax - if the from specification fails, throw exception rather than implicitly loading from local path.
Don't try and parse different types of from as strings, even if it's a little more verbose, require one specify from: url: https://example.com - otherwise expectation might be that we expect all kinds of URIs like from: git@https://example.com etc.
Allow multiple from in list form to allow for future extensibility, eg. to allow for importers to choose a preferred form of provider, allow the schema to be hosted on multiple protocols.
if dictionary, first key indicates the type of provider (corresponding to a plugin that advertises it implements eg. url import). If list, for each entry, first key (ibid.). Could also have explicit type field, idk.
plugins can allow a single positional argument that is passed when called like url: arg,
otherwise, for a single-layer dict, all entries after the first are parameters. This is to avoid having to do url: url: https://example.com while also supporting the dict form of the git import

First pass implementation:

Import from URL
Import from git repository

Import from local path: either use current syntax, or

imports:
- mySchema:
    from:
      path: ../../mySchema.yaml

Import from URL

imports:
- mySchema:
    from:
      url: https://example.com/v1.0.1/mySchema.yaml

Import from URL with hash validation

imports:
- mySchema:
    from:
      url: https://example.com/v1.0.1/mySchema.yaml
      hash:
        sha256: (long sha string)

Pass other parameters to the provider plugin (eg. setting request headers)

imports:
- mySchema:
    from:
      url: https://example.com/v1.0.1/mySchema.yaml
      headers:
        "Accept": "text/yaml"

Import from HEAD in default branch of git repository (dictionary form, plugin does not allow for positional argument)

imports:
  - mySchema:
    from:
      git:
        repo: https://git.example.com
        path: dir/subdir/mySchema.yaml

Import from specific ref of git repository (including branches, tags, hashes. we can make aliases for branch etc. that are just handled by ref if we want)

imports:
  - mySchema:
    from:
      git:
        repo: https://git.example.com
        path: dir/subdir/mySchema.yaml
        ref: v1.0.1

Multiple import locations - try in order:

imports:
- mySchema:
    from:
    - url: https://example.com/v1.0.1/mySchema.yaml
    - git: 
        repo: https://git.example.com
        path: dir/subdir/mySchema.yaml
        ref: v1.0.1

Caveats

Let's not do a whole version and dependency resolution system rn. this can be made totally orthogonal, here are some examples showing how we might do that in the future:

- mySchema:
    version: v1.0.1
    from:
    - url: https://example.com/{version}/mySchema.yaml

- mySchema:
    version: v1.0.1
    from:
      git: 
        repo: https://git.example.com
        path: dir/subdir/mySchema.yaml
        ref: "{version}"

We might also want to be able to override how another schema is sourcing a given schema. Say we have a vendored copy and don't want a schema we import to source it from HTTP. we could also do something like:

provides:
  - localSchema:
      from: ./schema/localSchema.yaml

0 replies

jsnyder-csdisco · 2024-03-12T02:52:33Z

jsnyder-csdisco
Mar 12, 2024

I am a bit new here, but I think I understand the metadata model enough to suggest that a SchemaView might want to support a namespace declaration that in turn specifies the resolution mechanism for the namespace imported schema (i.e. use the namespace as an indirection). Using this approach, it would be the responsibility of the importing schema to resolve the namespace collisions that may occur.
If this idea is worth pursuing, I could work up a few examples.

Actually, as I read the above description, I think it's very similar. However, I do this it is worth considering NOT making the schemaview recursive (e.g. schema1.schema2.schema3.MyClass). At least not directly recursive. I think allowing the namespace to exist as a first class construct and even mapping more than one "imported schema" to map to the same namespace name would be very powerful and align to code generation more closely (e.g. Java or Python package names, etc).

I will admit I am very "ignorant" of the implementation costs, but it seems like having a "relationship slot" like "binds_to_namespace" and Namespace as a formal metamodel element will smooth over a lot of modeling complexity (but maybe not implementation complexity).

2 replies

sneakers-the-rat Mar 12, 2024
Collaborator

Maybe im not getting what youre saying, but I dont see how theyre in conflict really? Recursive schemaview gets you both: ability to declare whatever mappings you want in current schema and ability for imported schemas to retain predictable behavior (ie. Current schema doesnt override behavior of imported schemas except in the current schemas namespace).

Can you do a pseudocode/pseudoschema example of what you mean?

jsnyder-csdisco Mar 13, 2024

I will try to work something up that should make it clear if we are the same things. Thanks for the reply.

bartkl · 2024-03-12T10:49:27Z

bartkl
Mar 12, 2024

This topic is very important to our company for adopting LinkML, so I'd like to weigh in. Before getting more acquainted with the technical side of things, I'd also like to discuss a potential different direction than those proposed here.

LinkML already supports LD URI mappings and CURIEs, and promotes itself as LD modeling language, so why can't we leverage that for native/local namespacing? This way we rely on only one namespacing mechanism in the entire schema (and a battle tested and well-documented, famous one), as opposed to introducing an entire custom construction with its own syntax (e.g. ns.Object) that might cause more confusion as well as extra tables in the header (for those namespaces).

For example, this might look like:

prefixes:
  ex: http://example.com#
  schema: https://schema.org/
imports:
  - ./other_schema.yaml  # Contains `other:occupation`
default_prefix: ex

classes:
  ex:Person:
    class_uri: schema:Person
    name: Bart  # ex:name
    other:occupation: Data Architect

I think this greatly simplifies everything, but this may be wishful thinking without carefully considering the technical implications/difficulties. There may be YAML parsers who don't like the colon for example.

Also, it's very well possible that my familiarity with Semantic Web tech makes me blind towards how this might make LinkML less accessible again. For example, it may trip people up that the class has the URI ex:Person, but also has a URI mapping to schema:Person. This involves understanding the difference between the referenced vocabulary term (schema:Person here) and the logically constraining class (ex:Person here). Then again, in a way that distinction exists already anyways...

Anyways, I think this is a worthwile way forward to look at. Please let me know if this has any merit to it or simply won't work in your opinion. I'll read up the other proposals in more depth soon as well.

Thanks!

PS I'm aware this doesn't solve everything about the inflexibility of the import mechanism, but it solves the identification and namespacing issue neatly.

4 replies

kervel Mar 12, 2024
Collaborator

i think this is the (only) way to go ... the colons should not be a problem, that's valid yaml. your example is not valid linkml, but i guess that's because you are still evaluating linkml :-)

bartkl Mar 12, 2024

Oh the schema was a quick sketch to make the point, so kind of deliberately invalid ;). Thanks for your feedback :)

sneakers-the-rat Mar 12, 2024
Collaborator

I think there are several different things that are separable and worth considering independently here - the colon syntax is good and quasi-universal for prefixes which means literally "{prefix}{suffix}" as a URI in LD world (as u know). So as a shorthand for URIs it is the right (only) call, but I think its worth keeping separate from an import and aliasing syntax from within linkml, though I dont see why we shouldnt use it when it makes sense, if we can balance potential ambiguity. The problem is that imports within linkml are a qualitatively different thing than prefixes and URI references - unfortunately the "URIs are things and dereferencing the things should get you something useful" part of LD is, uh, inconsistent, so the safest implementation bet is to treat them effectively as strings, where linkml imports (that can also be from URIs) should be expected to carry a full schema. That schema could have one or many prefix-URI annotations for a given object that take various forms in the different generated versions of a schema, etc.

So it could be the case that if one import from schema.yaml as prefix pre then one could refer to the objects in that schema like pre:objectName and that would probably be the right call, as long as we enforced uniqueness across imports and prefixes, and leave it to the author to be clear on which they were using where, but I think that the implicit import of other:occupation in your example is to be avoided at the level of a linkML schema. implicit names like that sound like hell for both implementation and authorship. How do we handle conflicting prefixes in imported schemas? What if we accidentally redefine one of the prefixes in our schema? So while the prefixes should be resolved and emitted in the generated forms of the schema, I think its worth keeping the notion of "linkml object" separate from "general prefixed URI linked data object." The dot syntax isnt exactly a custom construction, linkML being made in Python, so if one wanted to "reach into" the prefixes of an imported schema one could do something like pre.other:occupation which would be unambiguous, or pre:other:occupation which is not the standard way to use colons as delimiters in this context but could also work if we prefer it, just would require a little more smarts in the parser.

Whatever we do, I think importing and prefixing should be something that is only n=1 deep, in the same sense that if in python one imports module a, and it imports module b, then one can refer to n>1 "hops" while still having the complexity of the namespace in the importing schema be n=1 depth (the whole idea of namespaces!). The delimiter syntax matters less to me than that - colons or dots, or both.

Another strategy, if one wants to eg. Have some schema file that lists all the prefixes used across a bunch of schema, might be to allow one to explicitly import a whole top-level object from a schema, like from schema import prefixes (but with some differentiating syntax in case there is a class/slot named prefixes, which is allowed) which is less explicit but might be convenient.

Anyway tl;dr I am not any kind of authority, or anything, but I think overloading the colon delimeter to be for URI prefixes and imported namespaces could be fine and as long as we have some clear implementable rules for being able to differentiate them. There are multiple constraints here, and good reasons to not treat linkML as being a concrete RDF syntax and give it syntactic constructs that can compile into typical LD constructs without directly embodying them.

bartkl Mar 13, 2024

The problem is that imports within linkml are a qualitatively different thing than prefixes and URI references

I agree. My suggestion regards only the identification and syntax parts, not the import mechanism. I think LinkML should indeed have its own import semantics, but I think that's perfectly compatible with having LD identifiers for schema elements.

so the safest implementation bet is to treat them effectively as strings, where linkml imports (that can also be from URIs) should be expected to carry a full schema

Precisely :). Again, my point would be that I don't see use for a second syntax of identifiers, but I do agree LinkML should have its own (great) import semantics.

but I think that the implicit import of other:occupation in your example is to be avoided at the level of a linkML schema

Oh that was pure sloppiness in my example! I definitely agree. The other prefix should have been explicitly defined in the prefixes section in this example file. Sorry for causing quite some confusion as to what I meant!

I think its worth keeping the notion of "linkml object" separate from "general prefixed URI linked data object."

To emphasize: I agree, I just feel like even the LinkML object can use LD URIs as identifiers (with CURIE syntax) as well. I still don't quite see why we would go for two different namespacing solutions and notations (and btw, that's a fair point regarding dot notation being commonplace as well).

Have some schema file that lists all the prefixes used across a bunch of schema

Honestly I think it's best to keep prefixes local to files, and not have the ability to import them. However, I see now that isn't trivial. I mean, in my mind it was like how RDF does it, but of course there everything is identified by a URI, and omitting a prefix means you simply reference the full URI. That could get messy and confusing here. I would have to think longer about that.

In response to your TL;DR: I wasn't suggesting to just use the colon syntax, but URIs and CURIEs to solve identification and namespacing in total. Note: I'm not talking about the import system or dereference semantics, just the namespacing and URIs as IDs.

I definitely agree with you however that this shouldn't turn into too much of an LD situation, since the very power of LinkML is to be accessible where RDF 'failed' to be.

Anyways, thanks for your insights. You gave me a lot to think about. I wasn't aware of certain of the complexities as soon as you start importing ;).

kervel · 2024-03-15T08:26:58Z

kervel
Mar 15, 2024
Collaborator

I had a closer look at the metamodel.

In the linkml metamodel, slots and classes are an Element, for instance the "range" slot

an "element" is defined here and has a name which now is the identifier

the Element also has a definitionURI which is not the identifier, but which is filled in by the schemaloader/schemaview. It can be a CURIE in addition to an URI, and would be the right identifier to support namespacing. If, internally, all CURIES are expanded to URIs then aliasing (as suggested above) would be easy to support.

I think the most difficult question remains how to make those changes without breaking the whole schemaview/schemaloader API.

9 replies

sneakers-the-rat Mar 18, 2024
Collaborator

hi, in your example i would assume that just using "Person" as range would refer to a schema_b:Person. But for that to work, the schema_b prefix needs to be known.

even if schema_a also defines Person, as in the example? i would expect the local class to override an imported class

but a prefix is just a short hand, so if the prefix is not known everywhere, and the only "universal" way to refer to the class Person in schema_a would be to use the URI.

this is the purpose of namespacing - that there isn't necessarily a universal way to refer to things, but we give them names that are local to a specific scope. RDF's answer has historically been that every object should have a single, universal identifier (ie. a URI), but that doesn't really work in practice for the simple reason that most people can't maintain a system of URIs, and for other reasons including the complexities of versioning in URIs, etc. so again I think requiring all objects having a URI at the level of the linkml schema definition is to be avoided - URIs can come downstream for particular formats that need them.

so let me check to see if i'm following what you're saying here:

import schema_a as a relative path, but that file could be named anything: schema_xyz whatever, the only thing that matters is its URI set in id
set prefix schema_a: {the id set in schema_a.yaml} in schema_b
resolve schema_a:Person by converting to URI https://example.com/schema_a/Person and then dereferencing
since we have a schema imported with a matching prefix, schemaview knows to use that instead of actually dereferencing via HTTP.

So then questions

what happens if we import schema_a and set schema_a's prefix to something that doesn't match its id? ie schema_b.prefixes.schema_a != schema_a.id ?
what happens if schema_a's ID changes, does our schema break?
how do we differentiate prefixes that are to be treated as LinkML schemas vs. prefixes that are to be treated as mere strings because they don't dereference to something linkml understands?
how do we refer to the other items in schema_a from schema_b? do we break the current behavior and make it so we always refer to schema_a:Item?
how do we refer to the other items in schema_a that might have set a specific URI that doesn't match the id? ie if one of our classes has a definition_uri set? (and see the rule for derived URIs)
what about the ability for objects to have multiple URIs? ie. if we also have a class_uri set - do we have to have a prefix for every prefix in the imported schema?
what about the same URI used for multiple objects in different linkML schemas, or in the meaning slot, or the other ways that objects can be associated with URIs?
how does this work for the many schemas that just use placeholder ID's because they don't intend to be used as linked data models, do we have to ensure that every schema actually has a unique URI then, even if it's fake? how do we coordinate that among different schema authors?

I also wonder what advantage doing it this way, indirectly through prefixes:

flowchart LR
    subgraph schema_b
    subgraph prefixes
    sch_b[schema_a: URI]
    end
    subgraph imports
    sch_b_yaml[schema_a.yaml]
    end
    schema_a:Person
    end

    subgraph schema_a
    direction TB
    id
    Person
    id -- "expand" --> Person
    end

    sch_b -- "match" --> Person
    schema_a:Person -- "expand" --> sch_b
    Person -- "retrieve" --> sch_b_yaml

has over just using the import directly?

flowchart LR
    subgraph schema_b
    direction LR

    subgraph imports
    sch_b_yaml[schema_a.yaml]
    end

    schema_a:Person -- "retrieve" --> sch_b_yaml
    end

kervel Mar 19, 2024
Collaborator

Hi,

nice chart! didn't know you could do that with github. Let me try to answer some of the questions.

what happens if we import schema_a and set schema_a's prefix to something that doesn't match its id? ie schema_b.prefixes.schema_a != schema_a.id ?

I think it should refer to default_prefix and not to id. This is indeed a case where we would break backwards compat, as now, without namespaces, all classes from the imported schema become available in the same namespace, whereas now it would not. We could fix that by having all non-conflicting class names come in current namespace, but i don't think that's very clean (but maybe needed in order to support existing schemas). Since there is no namespacing right now, i assume that all schemas now carefully work around naming conflicts.

what happens if schema_a's ID changes, does our schema break?

if the default_prefix changes yes. And if it does, lots of other things break too (everything that uses URIs like generated owl, jsonld, but also python support if you use type designators)

how do we differentiate prefixes that are to be treated as LinkML schemas vs. prefixes that are to be treated as mere strings because they don't dereference to something linkml understands?

wherever you have an ElementName (eg in the "range" or in a "name") i would assume that there would never be a mere string.

how do we refer to the other items in schema_a from schema_b? do we break the current behavior and make it so we always refer to schema_a:Item?

this could be an issue indeed. there is a fix (see above) which i don't like very much.

how do we refer to the other items in schema_a that might have set a specific URI that doesn't match the id? ie if one of our classes has a definition_uri set? (and see the rule for derived URIs)

the derived_uri (i think) cannot be set as it is set when the schema is loaded. the class_uri can. in the API this is the "native" vs "non native" uri. We can choose whether we also want to support non-native uris. In case we want, there are multiple ways to refer to a class or slot, which doesn't have to be a problem.

what about the ability for objects to have multiple URIs? ie. if we also have a class_uri set - do we have to have a prefix for every prefix in the imported schema?

if you don't have a prefix you can still refer to a class using its full uri, i would think, just like how the "uriorcurie" type works.

what about the same URI used for multiple objects in different linkML schemas, or in the meaning slot, or the other ways that objects can be associated with URIs?

i think that if you have URI conflicts you already have a problem right now.

how does this work for the many schemas that just use placeholder ID's because they don't intend to be used as linked data models, do we have to ensure that every schema actually has a unique URI then, even if it's fake? how do we coordinate that among different schema authors?

i think the schema's that use placeholder ids are probably not published as schema's that are referenced from a lot of places. other than that, take for instance the domain name of your org in the URI, guaranteed to be unique (and common practise).

sneakers-the-rat Mar 19, 2024
Collaborator

I guess the question that remains to me, given these answers, is what is the advantage of an indirect lookup through URIs vs. handling linkml imports at the level of the linkml schema and resolving URIs out of those definitions at serialization/generation time?

ie. why is this:

id: http://example.com/schema_b
prefixes:
  schema_a: https://example.com/schema_a/
  schema_b: https://example.com/schema_b/
default_prefix: schema_b

imports:
  - schema_a

slot:
  my_slot:
    range: schema_a:Person

better than this:

id: http://example.com/schema_b

imports:
  - schema_a

slot:
  my_slot:
    range: schema_a:Person

edit for tone, more aggro than i intended on second reading, my bad

kervel Mar 19, 2024
Collaborator

i think it boils down to how linkml already uses prefixes and the default_prefix and being consistent with the rest of linkml. But i guess i'd better let the core linkml team answer this question.

sierra-moxon Mar 19, 2024
Maintainer

Thanks @sneakers-the-rat @kervel @bartkl - core dev team is listening here and would like to give this topic some time at our community call. I'll reach out on slack to the three of you to coordinate.

turbomam · 2024-03-18T17:50:10Z

turbomam
Mar 18, 2024
Collaborator

@pkalita-lbl are you monitoring this discussion? Do you think namespacing would make it easier to manage the relationships between the nmdc-schema and the submisison-schema?

0 replies

cmungall · 2024-03-19T16:22:42Z

cmungall
Mar 19, 2024
Maintainer Author

Sorry to answer out of thread, a lot of great discussion here.

Regarding schemaload/schemaview: the former will gradually be eclipsed by the latter, and the latter can always be extended in a backwards compatible way e.g. new options on calls.

LinkML already has a mechanism for mapping local names in an individual schema file to global names. It so happens that these global names are IRIs but this can be hidden from the user.

A convenient way to see the mapping table for any schema file is to run gen-jsonld-context over it

There are ways to alter the mappings using elements such as:

See also the compliance tests introduced in #1987 (still incomplete) that test for various combinations of these in combination with import.

(unfortunately the outputs of these tests are not yet visible unless you run them but the goal is to publish these on a separate compliance suite site)

This gives the schema author fine grained control over the names in the schema but less over imported names. But this could be addressed by

structured_imports (marked as status:testing and not yet implemented)
allowing elements to be referenced by curie/uri rather than only be local name

As far as I can tell the only impedance mismatch with package namespaces in programming languages is the case where in linkml you can have two schema files with the same default_prefix whereas AFAIK there is no way in Python or other languages to have two separate files/modules sharing the same namespace.

Note that the average developed doesn't need to know much about URIs. They of course need to provide a stable URI for their own namespace but that is already the case.

This would be the canonical way of handling a local name clash:

schema_a.yaml:

id: https://example.org/a
prefixes:
  linkml: https://w3id.org/linkml/
  schema_a: https://example.org/a/

default_prefix: schema_a

classes:
  Person:

schema_b.yaml:

id: https://example.org/b
prefixes:
  linkml: https://w3id.org/linkml/
  schema_b: https://example.org/b/
  schema_a: https://example.org/a/

default_prefix: schema_b
  
# not yet supported, and we can also explore some syntactic sugar for this
# unlike `imports` this will not automatically bring everything into the namespace
structured_imports:
  - import_from: schema_a
  
classes:
  Person:
    is_a: schema_a:Person. ## we could explore using `.` as syntactic sugar instead

the pydantic would look like:

schema_a.py

class Person(BaseModel):

schema_b.py

import schema_a

class Person(schema_a.Person):

the json-ld-context would be like

schema_b.context.yaml

@context:
  schema_a: https://example.org/a/
  schema_b: https://example.org/b/
  Person: https://example.org/b/Person

This has always been the intent, though not well documented. I will review all the discussion here before Thursday's community call

5 replies

cmungall Mar 21, 2024
Maintainer Author

Slides for tomorrow https://docs.google.com/presentation/d/1cRsS0R1H1J71rl6YaMsk-yylJo_BTFkNTUuwmGAoP2c/edit#slide=id.g2c50965836e_0_223

sneakers-the-rat Mar 21, 2024
Collaborator

when is the call/how do i join?

cmungall Mar 21, 2024
Maintainer Author

https://bit.ly/linkml-community-agenda -- sorry I thought you were on the list for these!

sneakers-the-rat Mar 21, 2024
Collaborator

aha it's at 8am, that explains why i had banished it from my mind ;). i'll see how i'm feeling around then

sneakers-the-rat Mar 21, 2024
Collaborator

not gonna make it, but my here's my asynchronous input on namespacing after checking out the slides: as usual, we are pretty much on the same page, and i am especially reassured at the emphasis on abstraction away from being a concrete rdf syntax, and like the structured_imports draft.

the thing i would add with reference to my previous proposal before seeing structured_imports is that I see the need to both do programmatic things and also to be able to refer to things via URIs (ideally extending to a best-effort "remote import" of non-linkml ontologies from OWL/etc. with some work on parsers and crawlers), think that extending the import metaphor to a providers metaphor is a better fit than trying to come at it from prefixes and the RDF POV, and most of my reasoning for that is the historical and syntactical baggage of prefixing - too much is coupled in the syntax of prefix:entity meaning literally f"{prefix}{entity}", the simplest example being versioning. that said having support for prefixes obvious is necessary and also useful, including deduplicating linkml entities by matching resolved URIs! and that's why i think it's good to keep that as a dedicated interface rather than overloading it - having a non-ambiguous, single use way to be backwards compatible with standard linked data prefixing and syntax seems better to me than losing separation of concerns.

a providers metaphor gives both, and also allows for extension into more interesting directions. providing from a git repo is a straightforward example, but providing from content addresses and other more flexible resolution systems (via plugins) i think will open up a lot of doors down the line. tying URIs (in practice URLs) too deeply into the syntax rather than the parsing, generation, and interpretation i think limits that in the reality of the implementation. as u note in the slides, uris have an unambiguous derivation (i think?) from schemas, so by importing one gets export/import to/from LD by design.

I think the structured_imports draft is as compatible with that idea as my draft is. I am gathering that I am more of a fan of structural typing than you are, so my draft extends imports with different forms rather than new terms, but they are ultimately i think getting at the same thing - ability to do schema+alias/entity+alias imports. I am still unclear how you are thinking about sourcing the schema, current behavior being through treating them like a path/URL, but ya as above and along the lines of my feelings re: arrays, I feel like the sourcing should be contained within/annotated alongside the import, but as with arrays i think our disagreements pretty much boil down to style and aesthetics rather than substance, which is fine and cool and good :).

ttys and glhf with the daylight, wish my best to ben and that wonderful dashboard

rly · 2024-03-21T15:56:19Z

rly
Mar 21, 2024
Collaborator

Just to add - namespace support would also be useful for adoption of LinkML as the schema language in NWB (neurophysiology) because the current NWB system allows users to create their own schema/namespaces, and there could be occasional name clashes when using multiple extensions.

0 replies

nlharris · 2024-03-22T01:34:19Z

nlharris
Mar 22, 2024
Maintainer

aha it's at 8am, that explains why i had banished it from my mind ;). i'll see how i'm feeling around then

@sneakers-the-rat, I just wanted to say that as a fellow night owl, I feel your pain. We had to choose a time that works for people in North America but also in other parts of the world, and 8am PT is a good compromise.

Thanks for all your contributions to LinkML!

1 reply

sneakers-the-rat Mar 22, 2024
Collaborator

totally understood, ah the things we do for global collaboration <3

sneakers-the-rat · 2024-04-16T21:35:46Z

sneakers-the-rat
Apr 16, 2024
Collaborator

Just jotting this down while I remember: linkml already does use the dot notation for selecting props/slots within models - https://linkml.io/linkml-model/latest/docs/specification/02instances/#instance-accessor-syntax - so we already are in a case where we might be mixing delimiters ( schema:Class.prop.param) which seems good to me.

0 replies

giovannidegani · 2024-06-08T18:38:58Z

giovannidegani
Jun 8, 2024

I’m evaluating LinkML at my company to use it for building an internal federated ontology as the base for a Knowledge Graph, we see having proper namespace support as a must, would love to contribute, but quite nee to LinkML

0 replies

vincentkelleher · 2024-06-20T06:59:27Z

vincentkelleher
Jun 20, 2024

Just subscribed to this issue as this is still a feature we would greatly appreciate for our Gaia-X Ontology, I previously opened an issue about this which I managed to circumvent but another similar case appeared recently.

We are willing to contribute to this if you need some help 😉

0 replies

sneakers-the-rat · 2024-06-20T10:45:25Z

sneakers-the-rat
Jun 20, 2024
Collaborator

I made an incrementally implementable, backwards compatible proposal that as far as I can tell addresses the needs articulated here with a bunch of room for feature add benefits a few months ago but not really sure where this decisionmaking process stands since there was no comment on it
https://github.com/orgs/linkml/discussions/1739#discussioncomment-8600477

0 replies

Namespacing support within LinkML #1739

cmungall Nov 16, 2023 Maintainer

Replies: 17 comments · 26 replies

cmungall Nov 20, 2023 Maintainer Author

sneakers-the-rat Jan 24, 2024 Collaborator

kervel Feb 2, 2024 Collaborator

sneakers-the-rat Feb 27, 2024 Collaborator

kervel Mar 12, 2024 Collaborator

sneakers-the-rat Feb 27, 2024 Collaborator

Requirements

Nice to have

Implementation

Choices

Syntax

Imports

Providers

Caveats

sneakers-the-rat Mar 12, 2024 Collaborator

kervel Mar 12, 2024 Collaborator

sneakers-the-rat Mar 12, 2024 Collaborator

kervel Mar 15, 2024 Collaborator

sneakers-the-rat Mar 18, 2024 Collaborator

kervel Mar 19, 2024 Collaborator

sneakers-the-rat Mar 19, 2024 Collaborator

kervel Mar 19, 2024 Collaborator

sierra-moxon Mar 19, 2024 Maintainer

turbomam Mar 18, 2024 Collaborator

cmungall Mar 19, 2024 Maintainer Author

cmungall Mar 21, 2024 Maintainer Author

sneakers-the-rat Mar 21, 2024 Collaborator

cmungall Mar 21, 2024 Maintainer Author

sneakers-the-rat Mar 21, 2024 Collaborator

sneakers-the-rat Mar 21, 2024 Collaborator

rly Mar 21, 2024 Collaborator

nlharris Mar 22, 2024 Maintainer

sneakers-the-rat Mar 22, 2024 Collaborator

sneakers-the-rat Apr 16, 2024 Collaborator

sneakers-the-rat Jun 20, 2024 Collaborator

cmungall
Nov 16, 2023
Maintainer

Replies: 17 comments 26 replies

cmungall
Nov 20, 2023
Maintainer Author

sneakers-the-rat
Jan 24, 2024
Collaborator

kervel
Feb 2, 2024
Collaborator

sneakers-the-rat Feb 27, 2024
Collaborator

kervel Mar 12, 2024
Collaborator

sneakers-the-rat
Feb 27, 2024
Collaborator

sneakers-the-rat Mar 12, 2024
Collaborator

kervel Mar 12, 2024
Collaborator

sneakers-the-rat Mar 12, 2024
Collaborator

kervel
Mar 15, 2024
Collaborator

sneakers-the-rat Mar 18, 2024
Collaborator

kervel Mar 19, 2024
Collaborator

sneakers-the-rat Mar 19, 2024
Collaborator

kervel Mar 19, 2024
Collaborator

sierra-moxon Mar 19, 2024
Maintainer

turbomam
Mar 18, 2024
Collaborator

cmungall
Mar 19, 2024
Maintainer Author

cmungall Mar 21, 2024
Maintainer Author

sneakers-the-rat Mar 21, 2024
Collaborator

cmungall Mar 21, 2024
Maintainer Author

sneakers-the-rat Mar 21, 2024
Collaborator

sneakers-the-rat Mar 21, 2024
Collaborator

rly
Mar 21, 2024
Collaborator

nlharris
Mar 22, 2024
Maintainer

sneakers-the-rat Mar 22, 2024
Collaborator

sneakers-the-rat
Apr 16, 2024
Collaborator

sneakers-the-rat
Jun 20, 2024
Collaborator