Replies: 4 comments
-
Separation of concerns is IMHO a good idea. In my software archicture training i use a classical as an example architecture. There you have cabinets per class/entity and cards per record/instance. The main interfaces are:
And if these interfaces can be made compatible with your local storage technology we are all set. For queries i am a fan of named queries to avoid specifying the actual native query text and aim for compatibility again. Otherwise a simplte find_by_id() and find_by_key_value() interface already goes a long way. |
Beta Was this translation helpful? Give feedback.
-
Im arguably writing something like this rn, and the idea of a generalized tool seems nice - maybe as a scoping/feasibility exercise it might be worth comparing the existing SQLAlchemy generator to eg something like this for triple stores (havent had much time to work on this yet, so wip) Seems also related to: https://github.com/orgs/linkml/discussions/1820 And another example from a previous life, I didnt manage to get this on its feet, but eg. See how I was encoding extra metadata for how a given model field should behave with a particular serialization here: So currently all of these ORM tools do something that is unique to their underlying DB - SQLAlchemy and SQLModel use specific base classes, types, fields, and need extra fields for eg. Relations. But if one were to make an abstract LinkML Field class (or, in pydantic 2, this would probably be a special annotated type) that held all that metadata then we would be able to do that virtualization to the various db backends at runtime. Another strategy is that most ORM tools have some notion of an engine or a session, so we could put the logic there of how to translate to different db backends. I think its sort of a question "where" these models live - currently the strategy with the generators is to have several different types of models for different applications, and that is nice but its a lot of maintenance to keep feature parity between them. This seems like a way to consolidate some of those generators and dumper/loader functionality into a single package and maybe reduce the sprawl a bit. It could also help with documentation, because the current docs on how to actually use data with linkml are sort of scattered (understandably! There's a lot there!) And that is a real hindrance towards linkML being seen as a useful runtime tool (which it is! Once I made perf adjustments to schemaview and pydanticgen it became quite tractable to generate models on demand #1604 ), rather than just a schema modeling tool. This also lines up well with my proposed hackashop project - id be down to be on the team or take initial lead on this if you dont already have someone working on this |
Beta Was this translation helpful? Give feedback.
-
I am not too happy with the generated python classes and declarations along technical dependencies. In https://github.com/WolfgangFahl/pyLoDStorage/blob/master/lodstorage/sample2.py i am experimenting with a more "pythonic" approach. It allows to use standard python declarations and annotations. I'd love to be able to add the RDF/SPARQL or wikibase specific mapping information in an non-invasive way. Also note the "specification by example" style """
Created on 2024-01-21
@author: wf
"""
from dataclasses import field
from datetime import date, datetime
from typing import List, Optional
import json
from lodstorage.yamlable import DateConvert, lod_storable
@lod_storable
class Royal:
"""
Represents a member of the royal family, with various personal details.
Attributes:
name (str): The full name of the royal member.
wikidata_id (str): The Wikidata identifier associated with the royal member.
number_in_line (Optional[int]): The number in line to succession, if applicable.
born_iso_date (Optional[str]): The ISO date of birth.
died_iso_date (Optional[str]): The ISO date of death, if deceased.
last_modified_iso (str): ISO timestamp of the last modification.
age (Optional[int]): The age of the royal member.
of_age (Optional[bool]): Indicates whether the member is of legal age.
wikidata_url (Optional[str]): URL to the Wikidata page of the member.
"""
name: str
wikidata_id: str
number_in_line: Optional[int] = None
born_iso_date: Optional[str] = None
died_iso_date: Optional[str] = None
last_modified_iso: str = field(init=False)
age: Optional[int] = field(init=None)
of_age: Optional[bool] = field(init=None)
wikidata_url: Optional[str] = field(init=None)
def __post_init__(self):
"""
init calculated fields
"""
self.lastmodified = datetime.utcnow()
self.last_modified_iso = self.lastmodified.strftime("%Y-%m-%dT%H:%M:%SZ")
end_date = self.died if self.died else date.today()
self.age = int((end_date - self.born).days / 365.2425)
self.of_age = self.age >= 18
if self.wikidata_id:
self.wikidata_url = f"https://www.wikidata.org/wiki/{self.wikidata_id}"
@property
def born(self) -> date:
"""Return the date of birth from the ISO date string."""
born_date = DateConvert.iso_date_to_datetime(self.born_iso_date)
return born_date
@property
def died(self) -> Optional[date]:
"""Return the date of death from the ISO date string, if available."""
died_date = DateConvert.iso_date_to_datetime(self.died_iso_date)
return died_date
@lod_storable
class Royals:
"""
Represents a collection of Royal family members.
Attributes:
members (List[Royal]): A list of Royal family members.
"""
members: List[Royal] = field(default_factory=list)
@classmethod
def get_samples(cls) -> dict[str, "Royals"]:
"""
Returns a dictionary of named samples
for 'specification by example' style
requirements management.
Returns:
dict: A dictionary with keys as sample names and values as `Royals` instances.
"""
samples = {
"QE2 heirs up to number in line 5": Royals(
members=[
Royal(
name="Elizabeth Alexandra Mary Windsor",
born_iso_date="1926-04-21",
died_iso_date="2022-09-08",
wikidata_id="Q9682",
),
Royal(
name="Charles III of the United Kingdom",
born_iso_date="1948-11-14",
number_in_line=0,
wikidata_id="Q43274",
),
Royal(
name="William, Duke of Cambridge",
born_iso_date="1982-06-21",
number_in_line=1,
wikidata_id="Q36812",
),
Royal(
name="Prince George of Wales",
born_iso_date="2013-07-22",
number_in_line=2,
wikidata_id="Q13590412",
),
Royal(
name="Princess Charlotte of Wales",
born_iso_date="2015-05-02",
number_in_line=3,
wikidata_id="Q18002970",
),
Royal(
name="Prince Louis of Wales",
born_iso_date="2018-04-23",
number_in_line=4,
wikidata_id="Q38668629",
),
Royal(
name="Harry Duke of Sussex",
born_iso_date="1984-09-15",
number_in_line=5,
wikidata_id="Q152316",
),
]
)
}
return samples
@lod_storable
class Country:
"""
Represents a country with its details.
Attributes:
name (str): The name of the country.
country_code (str): The country code.
capital (Optional[str]): The capital city of the country.
timezones (List[str]): List of timezones in the country.
latlng (List[float]): Latitude and longitude of the country.
"""
name: str
country_code: str
capital: Optional[str] = None
timezones: List[str] = field(default_factory=list)
latlng: List[float] = field(default_factory=list)
@lod_storable
class Countries:
"""
Represents a collection of country instances.
Attributes:
countries (List[Country]): A list of Country instances.
"""
countries: List[Country]
@classmethod
def get_countries_erdem(cls)->'Countries':
"""
get Erdem Ozkol's country list
"""
countries_json_url = "https://gist.githubusercontent.com/erdem/8c7d26765831d0f9a8c62f02782ae00d/raw/248037cd701af0a4957cce340dabb0fd04e38f4c/countries.json"
json_str=cls.read_from_url(countries_json_url)
countries_list=json.loads(json_str)
countries_dict={"countries": countries_list}
instance=cls.from_dict(countries_dict)
return instance
@classmethod
def get_samples(cls) -> dict[str, "Countries"]:
"""
Returns a dictionary of named samples
for 'specification by example' style
requirements management.
Returns:
dict: A dictionary with keys as sample names
and values as `Countries` instances.
"""
samples = {
"country list provided by Erdem Ozkol":
cls.get_countries_erdem()
}
return samples
class Sample:
"""
Sample dataset provider
"""
@staticmethod
def get(dataset_name: str):
"""
Get the given sample dataset name
"""
samples=None
if dataset_name == "royals":
samples = Royals.get_samples()
elif dataset_name == "countries":
samples=Countries.get_samples()
else:
raise ValueError("Unknown dataset name")
return samples I am in the process of writing a LinkML generator based on this: see https://github.com/WolfgangFahl/pyLoDStorage/blob/master/lodstorage/linkml_gen.py. If you are interested in the details give me a positive feedback on this comment and i'll open a new dicussion. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
LinkML provides a way of specifying the structure and semantics of data without committing to a particular technology or serialization format. It has been successfully integrated into different architectures that variously involved MongoDB, Neo4J, PostgresDB, triplestores, etc. However, in many of those projects there is still specific plumbing.
Would it make sense to generalize this into a common CRUD abstraction layer that would support
With bindings for different backends (including a simple in-memory one)
This would not sit in the linkml core but would be a purely optional additional layer, e.g. linkml-store
In fact, curategpt already does this, with a virtual store layer and a chromadb implementation - https://github.com/monarch-initiative/curate-gpt/tree/main/src/curate_gpt/store - it would be easy to make this a separate module. There would need to be thought given to scope (curategpt needs vector embeddings but this could be generalized to a general index structure)
Beta Was this translation helpful? Give feedback.
All reactions