-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-41878: Implement RemoteButler.get() backed by a single FileDatastore #912
Changes from all commits
97cc073
c16f8b3
2cf08cd
6a1aa9e
dfe02c4
b779efb
833686a
77b8e99
d718a09
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,13 +27,14 @@ | |
|
||
from __future__ import annotations | ||
|
||
__all__ = ("StoredDatastoreItemInfo", "StoredFileInfo") | ||
__all__ = ("StoredDatastoreItemInfo", "StoredFileInfo", "SerializedStoredFileInfo") | ||
|
||
import inspect | ||
from collections.abc import Iterable, Mapping | ||
from dataclasses import dataclass | ||
from typing import TYPE_CHECKING, Any | ||
|
||
from lsst.daf.butler._compat import _BaseModelCompat | ||
from lsst.resources import ResourcePath | ||
from lsst.utils import doImportType | ||
from lsst.utils.introspection import get_full_type_name | ||
|
@@ -214,7 +215,7 @@ def __init__( | |
"""StorageClass associated with Dataset.""" | ||
|
||
component: str | None | ||
"""Component associated with this file. Can be None if the file does | ||
"""Component associated with this file. Can be `None` if the file does | ||
not refer to a component of a composite.""" | ||
|
||
checksum: str | None | ||
|
@@ -260,6 +261,13 @@ def to_record(self, **kwargs: Any) -> dict[str, Any]: | |
**kwargs, | ||
) | ||
|
||
def to_simple(self) -> SerializedStoredFileInfo: | ||
record = self.to_record() | ||
# We allow None on the model but the record contains a "null string" | ||
# instead | ||
record["component"] = self.component | ||
return SerializedStoredFileInfo.model_validate(record) | ||
|
||
def file_location(self, factory: LocationFactory) -> Location: | ||
"""Return the location of artifact. | ||
|
||
|
@@ -307,6 +315,10 @@ def from_record(cls: type[StoredFileInfo], record: Mapping[str, Any]) -> StoredF | |
) | ||
return info | ||
|
||
@classmethod | ||
def from_simple(cls: type[StoredFileInfo], model: SerializedStoredFileInfo) -> StoredFileInfo: | ||
return cls.from_record(dict(model)) | ||
|
||
def update(self, **kwargs: Any) -> StoredFileInfo: | ||
new_args = {} | ||
for k in self.__slots__: | ||
|
@@ -320,3 +332,26 @@ def update(self, **kwargs: Any) -> StoredFileInfo: | |
|
||
def __reduce__(self) -> str | tuple[Any, ...]: | ||
return (self.from_record, (self.to_record(),)) | ||
|
||
|
||
class SerializedStoredFileInfo(_BaseModelCompat): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still think it's a bit odd that we now have a dataclass defining the datastore record content, a pydantic model defining the datastore record content, and a method in FileDatastore for creating the database table that must match the other two but uses neither to define it. If There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. StoredFileInfo isn't actually a dataclass, it's a subclass of the abstract base class StoredDatastoreItemInfo with a property that can't be instantiated without injecting configuration and several business logic methods. And the representation of some of the fields varies between the various cases (e.g. component is I had considered making StoredFileInfo have-a SerializedStoredFileInfo and forwarding the properties to it, but it didn't seem to buy much because you still have to have the duplicate properties to do the forwarding. I also think some of the representations might need to vary here (e.g. I kinda think SerializedStoredFileInfo shouldn't have a I also considered making StoredFileInfo inherit-from SerializedStoreFileInfo or an additional shared pydantic model for the duplicated fields, but inheritance seems harder to understand than duplicating 5 string properties. And if we add more properties, they will likely be non-nullable on StoredFileInfo but will have to be nullable on the serialized one to handle backwards compatibility. I do think that the methods on StoredFileInfo could be moved to FileDatastore and StoredFileInfo could become an actual dataclass. Or like if we were using the SqlAlchemy ORM layer StoredFileInfo could just be the ORM object. But I know you guys had plans with exposing these more places which is why you added the abstract base class, so to my mind that kind of change is way outside the scope of this ticket. |
||
"""Serialized representation of `StoredFileInfo` properties""" | ||
|
||
formatter: str | ||
"""Fully-qualified name of Formatter.""" | ||
|
||
path: str | ||
"""Path to dataset within Datastore.""" | ||
|
||
storage_class: str | ||
"""Name of the StorageClass associated with Dataset.""" | ||
|
||
component: str | None | ||
"""Component associated with this file. Can be `None` if the file does | ||
not refer to a component of a composite.""" | ||
|
||
checksum: str | None | ||
"""Checksum of the serialized dataset.""" | ||
|
||
file_size: int | ||
"""Size of the serialized dataset in bytes.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we mark this method as requiring keyword args after the
id
?