otk/utils: introduce Annotated* objects #280

schuellerf · 2024-10-14T14:47:32Z

The Objects (in src/otk/annotation.py) represent all basic types
but with "hidden" annotations to be able to
track the source of the data from the originating
yaml file.

Some of the annotations are already used in error messages, other error messages can be extended in the future with this information.

This is replacing the initial approach #271

schuellerf · 2024-10-14T14:51:55Z

one of the examples of a "new" error message is in
test/data/error/12-two-lonely-keys.err

otk file contains top-level keys without a target:
 * "targetless" in test/data/error/12-two-lonely-keys.yaml:6
 * "targetless2" in test/data/error/12-two-lonely-keys.yaml:9

supakeen

I'm still really happy about the possibilities that having this metadata unlocks in the way of error messages and user friendliness but also still very on the fence of the current approach of pretending-to-be-builtin-objects-but-not really.

Did you consider any other approaches and if so; why does the current approach fit best?

You've looked into this a lot more than I have but let me say very naively, would something like this:

class Annotated[T]:
    value: T

and then passing Annotated around and adjusting our code to refer to Annotated.value fit? If we need to we can still implement all the dunders on Annotated but I'd prefer to avoid that.

bcl · 2024-10-14T21:27:26Z

src/otk/document.py

-        targetless_keys = [key for key in deserialized_data
-                           if not key.startswith(PREFIX)]
+        targetless_keys = [
+            f" * \"{key}\" in {deserialized_data[key].get_annotation('src')}"


If I drop in a key, goobers: into the top level I get a traceback from this:

File "/root/otk/src/otk/document.py", line 38, in __init__ Omnifest.ensure(tree) File "/root/otk/src/otk/document.py", line 53, in ensure f" * \"{key}\" in {deserialized_data[key].get_annotation('src')}" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'get_annotation'

so it probably also needs a None check :)

I ran all possible tests and examples - how did you do that, so I can reproduce?
Not sure if the error message makes sense in this case then :-/

schuellerf · 2024-10-15T08:18:15Z

@supakeen

You've looked into this a lot more than I have but let me say very naively, would something like this:
class Annotated[T]:
    value: T

I'm pretty sure that with this approach you would have to re-implement even more, including all handling of the tree, write-to-file operations and json serialization to pass to externals etc. 🤔

In my current implementation it mainly was changing the types and some special cases like list comprehension

supakeen · 2024-10-15T08:38:39Z

@supakeen
You've looked into this a lot more than I have but let me say very naively, would something like this:
class Annotated[T]:
    value: T
I'm pretty sure that with this approach you would have to re-implement even more, including all handling of the tree, write-to-file operations and json serialization to pass to externals etc. 🤔

What would change concretely in the handling of the tree?

The write-to-file and JSON serialization would be be a 5-line thing (unless I'm missing something it'd need a custom encoder class that typechecks and returns .value) and adjusting callsites of json.dump(s) so I don't see those as major problems.

In my current implementation it mainly was changing the types and some special cases like list comprehension

schuellerf · 2024-10-15T11:02:32Z

@supakeen I did some small unit-test-like things, your approach could also work.
The code (when using the value/annotations) might be less readable but
the typesystem is for sure easier with your approach as it's not trying to be transparent

supakeen · 2024-10-15T14:10:48Z

@schuellerf I do like explicitness. Let's wait for @mvo5 to chime in as well if he feels that that might be a better approach.

schuellerf · 2024-10-18T08:17:17Z

@supakeen @mvo5 I put up the "other" version as a "wrapper class" as Simon suggested to try.
I like the type override (this PR) better and there are less changes while using (unless you need the annotations of course)

e.g. context.py
https://github.com/osbuild/otk/pull/280/files#diff-d3985204642fa97498214ff309912ab52685064b67bbe032f3d67950e8cbed1a
here v.s
https://github.com/osbuild/otk/pull/282/files#diff-d3985204642fa97498214ff309912ab52685064b67bbe032f3d67950e8cbed1a

pre-commit does not seem to support PYTHONPATH which is needed to find things like "otk.command" so we'll workaround by adding `src` with an "init-hook" of pylint

The Objects represent all basic types but with "hidden" annotations to be able to track the source of the data from the originating yaml file.

to be able to test older versions of python all directories with `venv.*` are ignored

as we explicitly support python 3.9 and the implementations of pathlib are different we need to have special support as we want to inherit from pathlib.

achilleas-k

I went through this a bit quickly so I only have some surface-level comments for now.

achilleas-k · 2024-11-05T11:14:51Z

src/otk/annotation.py

+    def squash_annotations(cls, annotation_list: list) -> dict:
+        """ just aggregates all annotations in the list.
+
+        new annotation keys get appended to the resulting dict
+
+        duplicate annotations will get converted to string
+        (if you need an exception for a key please implement here)
+
+        for "src" there is an exception
+        this is expected to be in the form "filename:line_number"
+        and will be sanely unified
+        """


This is a rather poor docstring for a method that isn't very obvious or easy to read.

The first line starting with "just" doesn't help. "Aggregates all annotations in the list" would be a good first line for it, the "just" makes it sound like things will be simple, but the function body suggests otherwise. But I would also add "into a dictionary keyed by [something]".

"New annotation keys get appended to the dict". Isn't every key here new? How can there be old annotation keys? This looks like a pure function (it's annotated as a classmethod but it's actually static right? cls isn't used).

"Duplicate annotations will get converted to a string". Does this mean unique annotations are different types? What does that mean for the result?

"(if you need an exception for a key please implement here)". I would change this to:
"Exceptions for specific keys are implemented here."

and then follow it with:

Key exceptions: - src: ...

But also, I'm not entirely clear what the standard behaviour is like so I don't really understand what the exception is excepting.

achilleas-k · 2024-11-05T11:32:04Z

src/otk/annotation.py

@@ -176,29 +177,25 @@ def squash_annotations(cls, annotation_list: list) -> dict:


 class AnnotatedList(Generic[AnnotatedListT], AnnotatedBase, list):
-    def __init__(self, seq: list[Any] | None = None) -> None:
+    def __init__(self, seq: Optional[list[Any]] = None) -> None:


This change (and the other type annotation changes in this commit) doesn't seem relevant to the commit and its message.

achilleas-k · 2024-11-05T11:35:56Z

src/otk/annotation.py

+if sys.version_info >= (3, 10):
+    class AnnotatedPath(AnnotatedPathBase, pathlib.Path):
+        def __init__(self, *args: Any, **kwargs: Any) -> None:
+            pathlib.Path.__init__(self, *args, **kwargs)
+            AnnotatedPathBase.__init__(self)
+else:
+    class AnnotatedPath(AnnotatedPathBase, pathlib.PosixPath):  # pylint: disable=abstract-method
+        def __init__(self, *args: Any) -> None:  # pylint: disable=W0613
+            pathlib.PosixPath.__init__(self)
+            AnnotatedPathBase.__init__(self)


Is there a common behaviour we can use here instead of having potentially different behaviour when switching Python versions? Does using PosixPath everywhere cause issues?
I'm looking at this and thinking it'll be a pain to troubleshoot AnnotatedPath behaviour if it does slightly different things based on interpreter version.

The implementation of pathlib changed in those python versions and only has to be handled explicitly by code that directly calls the "init" method which you usually don't do but in this case it's needed as we inherit here and the object doesn't seem to be properly instantiated otherwise.

mvo5

Thank you for the PR and thank you for improving the errors and the path information.

I spend a lot of time looking at this, I have many comments but I don't have the time to write them up in a nice and proper way (sorry!). I wrote down a bunch though and I hope some are helpful.

Overall I would like that we try to simplify it more, there is a lot going on and I see the risk that the extra complexity will make our life in the future harder when we work with transform.py. The fact that we have to subclass every basic type shows the cost for the feature is high and we run into the issues like bools or None that cannot be subclassed (and OtkBool(True) is True will be false therefore) and we now need to be conscious about the annotations in many places make me uneasy. Even though I spend a good deal of time I don't think I spend nearly enough. I pushed a bunch of ideas into https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1 which is a variant of your code/ideas with (hopefully but this is of course subjective) reduces some of the complexity by (trying) to hide some details. But that branch does not have enough tests (I think test_annotate(tmp_path) needs the various joins/merges as well for completeness) so it's hard to say if it reasonable or not (or what is missing). I added some inline comments, some/a bunch are "addressed" (or tried to get addressed) in the branch I linked to.

Sorry that I'm so difficult and I hope this is useful nevertheless. I understand that my comments are probably quite terse and maybe hard to understand, we can have a hangout or other sync to talk in person if you prefer that and if there are questions.

mvo5 · 2024-10-31T10:05:26Z

src/otk/annotation.py

+            for key, value in self.items():
+                self[key] = self.deep_convert(value)
+
+    def get_annotation(self, key: str, default: Any = None) -> Any:


I understand the desire to write a nice generic way to do annotations. At the same time, going more into YAGNI/KISS land might be beneficial here, AFAICT we only have a single annotation that we care about ("src") and the rest is there because we might need them later. We could probably archive the same via a single otk annotation that becomes just a dataclass so we can still expand it.

My strawman would be to have Otk*classes with just otk_src as property. So Otk{Dict,List} all simple subclasses with just a mixin.

I played around with this branch a bit in a version and I think just using a otk_src property in a base mix is probably all we nee, somethng like https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-7a465d7d1dc04950d6fb0e7f8e2a314b952542824dd3ef0145b7dc00947f67d8R9

mvo5 · 2024-10-31T10:21:19Z

src/otk/annotation.py

+
+        return ret
+
+    def add_source_attributes(self, key_data, prefix=""):


This add a tight coupling of AnnotatedBase to an yaml node, i.e. Annotated node now needs to know details of yaml nodes, having the helper outside would reduce this coupling.

mvo5 · 2024-10-31T19:23:16Z

src/otk/annotation.py

+
+    @classmethod
+    def deep_convert(cls, data: Any) -> Any:
+        """Recursively convert nested dicts and lists into AnnotatedDict and HiddenAttrList."""


AFAICT this is only needed when we we load data from the externals where it comes in via json. might be worth a comment. For the yaml the values are constructed from the ground up, so if we hook into the constructoin we always have fully Annoated* things.

Right, I think this might be obsolete when also all tests use annotated objects when interacting with internals. I wanted to collect feedback first before changing all tests.

mvo5 · 2024-10-31T19:24:18Z

src/otk/annotation.py

+
+    @classmethod
+    def deep_dump(cls, data: Any) -> Any:
+        """ Converting to builtin types is usually not necessary.


AFAICT this can be done in a slightly simpler way with a custom json encoder, c.f. https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-7a465d7d1dc04950d6fb0e7f8e2a314b952542824dd3ef0145b7dc00947f67d8R65 - it is only needed for custom bools (which is a bit sad)

mvo5 · 2024-10-31T19:41:02Z

src/otk/annotation.py

+        self.set_annotation(f"{prefix}column", column)
+        self.set_annotation(f"{prefix}column_end", column_end)
+
+    def __add__(self, o):


afaict the various *add is only used in a test, could we YAGNI it?

I think it will also be used when implementing more otk operations?

mvo5 · 2024-10-31T20:12:00Z

src/otk/transform.py


-    return [resolve(ctx, state, val) for val in tree]
+    # list comprehension need a separate copy of all annotations
+    new_state.set_annotations(tree.get_annotations())


Same comment as above about that the user of the list now needs to remember to get/set, in https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-7e2cad722099400fcad150a92cd04d88d586940ecd51139b26037b58365ccc34R137 it is:

- return [resolve(ctx, state, val) for val in tree] + return tree.otk_dup([resolve(ctx, state, val) for val in tree])

mvo5 · 2024-10-31T20:13:41Z

src/otk/transform.py

+]
+
+
+def save_yaml_constructors():


We already have a custom loader by using it instead of adding to SafeLoader we can avoid the save/restore helpers here and also make this (IMHO) slightly more uniform, see https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-6c473e728657cf3b050b5222403efc9dcbb4a91902e878a476b5ead8d8ea7db2R13

mvo5 · 2024-10-31T20:15:49Z

src/otk/transform.py

+
+    # callbacks to store information about the source of all data in the yaml files
+    saved_constructors = save_yaml_constructors()
+    yaml.SafeLoader.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, annotated_dict_constructor)


See my suggestion above about how the loader could be slightly more uniform by extending the already existing custom class

mvo5 · 2024-10-31T21:30:02Z

src/otk/context.py

        validate_var_name(name)

        cur_var_scope = self._variables
        parts = name.split(".")
        for i, part in enumerate(parts[:-1]):
            if not isinstance(cur_var_scope.get(part), dict):
                self._maybe_log_var_override(cur_var_scope, parts[:i+1], {".".join(parts[i+1:]): value})
-                cur_var_scope[part] = {}
+                cur_var_scope[part] = AnnotatedDict()


I suspect this area needs more work (and a test), I played with it in https://github.com/osbuild/otk/compare/main...mvo5:annotations-2?expand=1#diff-ad6f996bbd08d84913b570b67a441454f57432e4129c7b8601e0454652501e0fR1 and I think we will need to tweak the way we define dicts in process_defines to make this fully work (and my branch is full of hacks)

mvo5 · 2024-11-05T16:12:36Z

.pre-commit-config.yaml

@@ -33,3 +33,4 @@ repos:
    hooks:
      - id: pylint
        additional_dependencies: ["PyYAML", "types-PyYAML", "pytest"]
+        args: ["--init-hook", "import sys; sys.path.insert(0, './src')"]


This seems unrelated to the PR(?) and could be it's own?

in my setup this is needed to run pytest, it could be split but I need to merge this first because otherwise it fails to commit.
I'll to a separate PR, merge and rebase :-D

schuellerf force-pushed the implement-annotated-classes branch 2 times, most recently from 4fae95d to 9d8acdf Compare October 14, 2024 14:51

schuellerf force-pushed the implement-annotated-classes branch from 9d8acdf to ed674bf Compare October 14, 2024 14:54

schuellerf requested review from supakeen and mvo5 October 14, 2024 17:47

supakeen reviewed Oct 14, 2024

View reviewed changes

bcl reviewed Oct 14, 2024

View reviewed changes

schuellerf mentioned this pull request Oct 18, 2024

[WIP] annotation wrapper class #282

Draft

schuellerf force-pushed the implement-annotated-classes branch from 33c167f to 7e9bf68 Compare October 18, 2024 08:19

schuellerf force-pushed the implement-annotated-classes branch from 3d368d0 to 898b131 Compare October 28, 2024 14:41

schuellerf force-pushed the implement-annotated-classes branch from 898b131 to 580e4da Compare November 5, 2024 08:38

schuellerf added 5 commits November 5, 2024 10:25

pre-commit-config: add PYTHONPATH to pylint

2499d72

pre-commit does not seem to support PYTHONPATH which is needed to find things like "otk.command" so we'll workaround by adding `src` with an "init-hook" of pylint

otk/utils: introduce Annotated* objects

5fb3563

The Objects represent all basic types but with "hidden" annotations to be able to track the source of the data from the originating yaml file.

spellcheck: support multiple venv directories

dc2afea

to be able to test older versions of python all directories with `venv.*` are ignored

src/otk/annotation: implement support for python 3.9

95798b8

as we explicitly support python 3.9 and the implementations of pathlib are different we need to have special support as we want to inherit from pathlib.

otk/command: fix linter problem

5acf377

schuellerf force-pushed the implement-annotated-classes branch from 580e4da to 5acf377 Compare November 5, 2024 09:27

achilleas-k reviewed Nov 5, 2024

View reviewed changes

mvo5 reviewed Nov 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

otk/utils: introduce Annotated* objects #280

otk/utils: introduce Annotated* objects #280

schuellerf commented Oct 14, 2024 •

edited

Loading

schuellerf commented Oct 14, 2024 •

edited

Loading

supakeen left a comment •

edited

Loading

bcl Oct 14, 2024

schuellerf Oct 15, 2024 •

edited

Loading

schuellerf commented Oct 15, 2024 •

edited

Loading

supakeen commented Oct 15, 2024 •

edited

Loading

schuellerf commented Oct 15, 2024

supakeen commented Oct 15, 2024

schuellerf commented Oct 18, 2024

achilleas-k left a comment

achilleas-k Nov 5, 2024 •

edited

Loading

achilleas-k Nov 5, 2024 •

edited

Loading

achilleas-k Nov 5, 2024 •

edited

Loading

schuellerf Nov 7, 2024

mvo5 left a comment

mvo5 Oct 31, 2024

mvo5 Oct 31, 2024

mvo5 Oct 31, 2024

schuellerf Nov 7, 2024

mvo5 Oct 31, 2024

mvo5 Oct 31, 2024

schuellerf Nov 7, 2024

mvo5 Oct 31, 2024

mvo5 Oct 31, 2024

mvo5 Oct 31, 2024

mvo5 Oct 31, 2024

mvo5 Nov 5, 2024

schuellerf Nov 7, 2024


		return ret

		def add_source_attributes(self, key_data, prefix=""):

otk/utils: introduce Annotated* objects #280

Are you sure you want to change the base?

otk/utils: introduce Annotated* objects #280

Conversation

schuellerf commented Oct 14, 2024 • edited Loading

schuellerf commented Oct 14, 2024 • edited Loading

supakeen left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schuellerf Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

schuellerf commented Oct 15, 2024 • edited Loading

supakeen commented Oct 15, 2024 • edited Loading

schuellerf commented Oct 15, 2024

supakeen commented Oct 15, 2024

schuellerf commented Oct 18, 2024

achilleas-k left a comment

Choose a reason for hiding this comment

achilleas-k Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

achilleas-k Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

achilleas-k Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mvo5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schuellerf commented Oct 14, 2024 •

edited

Loading

schuellerf commented Oct 14, 2024 •

edited

Loading

supakeen left a comment •

edited

Loading

schuellerf Oct 15, 2024 •

edited

Loading

schuellerf commented Oct 15, 2024 •

edited

Loading

supakeen commented Oct 15, 2024 •

edited

Loading

achilleas-k Nov 5, 2024 •

edited

Loading

achilleas-k Nov 5, 2024 •

edited

Loading

achilleas-k Nov 5, 2024 •

edited

Loading