Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: jsonasobj2 vendoring #361

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ticapix
Copy link

@ticapix ticapix commented Jan 13, 2025

As proposed here linkml/linkml#1966

@ticapix
Copy link
Author

ticapix commented Jan 13, 2025

Any idea how to run the CI locally ?

@dalito
Copy link
Contributor

dalito commented Jan 13, 2025

Tox is great for testing different Python versions locally. Setting up a local CI for full cross-platform tests is probably not worth the effort.

I updated the outdated tox.ini in #357 to work with current tox (4.23.2), see this commit: 822eb5e

@sneakers-the-rat
Copy link
Contributor

Can we add deprecation warnings to all the jsonasobj2 behavior while we do this? We want to say "hey if your code relies on using the special jsonasobj stuff, these will soon become just regular dataclasses, see xyz page in the docs for an upgrade guide" with plenty of lead time. That should be visible even if people don't regenerate models during the deprecation window if they don't have an upper bound on linkml-runtime in CI.

I think we would want that on the dunder methods in the jsonobj class, as well as in every function like as_dict etc. We dont have the nice tidy deprecation warning system from main linkml package here so we will be emitting redundant warnings and will have to just deal with the noise until we finish removing jsonasobj, but imo it's better for us to deal with some noise than downstream ppl being caught by surprise

@ticapix
Copy link
Author

ticapix commented Jan 13, 2025

Tox is great for testing different Python versions locally. Setting up a local CI for full cross-platform tests is probably not worth the effort.

I updated the outdated tox.ini in #357 to work with current tox (4.23.2), see this commit: 822eb5e

image

@ticapix
Copy link
Author

ticapix commented Jan 13, 2025

@sneakers-the-rat like that 185e6ed ?

I used typing_extensions.deprecated - which is already a dependency - on all methods not starting with an _, and yes ... running the tests, it's bit verbose :)

============================================= 384 passed, 278 skipped, 1 xfailed, 301698 warnings in 52.26s ==============================================

@sneakers-the-rat
Copy link
Contributor

Haha oh boy, ya that might be too verbose. maybe we emit one deprecation warning when importing that module?

@ticapix
Copy link
Author

ticapix commented Jan 14, 2025

@sneakers-the-rat

(few hours of sleep later) fixed in 73effc9 by using warning.warn at the top of the module and switch to PendingDeprecationWarning

====== 384 passed, 278 skipped, 1 xfailed, 46 warnings in 52.32s ===========

Copy link
Contributor

@sneakers-the-rat sneakers-the-rat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all this is great and it is exactly what needed to be done to move us forward here, so thank you for that!

I AM SORRY for my hassle here, but it appears this would break existing generated modules, see comments for details. I proposed a few options but we can discuss further and I'll help out implementing them since this is proving to be a lot. we may want to pull into a develop branch and do this in a few chunks :)

@@ -8,6 +8,7 @@ MODEL_DIR = ../linkml-model/linkml_model/

update_model:
cp -pr $(MODEL_DIR)/* linkml_runtime/linkml_model
sed -i 's/from jsonasobj2/from linkml_runtime.utils.jsonasobj2/g' linkml_runtime/linkml_model/*.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha i am assuming this is a "temporary fix until we update PythonGen" thing right? we should remember to write that down somewhere or else merge this at the same time as we make that small change in pythongen upstream? wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it's "temporary".
I'll add a comment that it should be deleted after PythonGen is updated.
Worst case, the command has no effect if we forget to remove it immediately.

@@ -29,3 +29,13 @@ It also includes the [SchemaView](https://linkml.io/linkml/developers/manipulati
## Notebooks

See the [notebooks](https://github.com/linkml/linkml-runtime/tree/main/notebooks) folder for examples

## Development
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotta love turning issue discussion into ambient docs fixes, thanks for this :)

hide = ['_if_missing', '_root']

# https://github.com/linkml/linkml/issues/1966
warnings.warn("This module is going to be replaced by native Python dataclasses.", PendingDeprecationWarning)
Copy link
Contributor

@sneakers-the-rat sneakers-the-rat Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I AM SORRY to keep belaboring this, and i know this was literally my proposal and you've done it twice already, but just to do a final double check - does this do what we want it do?

So the squirrely case that we want to protect with lots of people using generated code that might stay static for a long time: say someone has ancient dataclass models that import jsonasobj2 but they update linkml-runtime (or don't have an upper bound & just get latest). That generated code isn't really subject to updates the same way normal dependencies are (although working on project config that will make that possible). It's made a bit more complicated by how the generated code is a mix of behavior from the parent classes and from code in the module (ie. the as_dict calls are in the generated model code, not in YamlRoot.__init__()).

What this does correctly: the package is imported, and the module will definitely be imported, so the warning would still be emitted though even if one was using old models, so it accomplishes the goal of letting people know they need to update their models (i think we need a little more information here but i'll put that in separate comment).

Using the metamodel as an example, JsonObj is imported but not used, but as_dict is used commonly. jsonasobj2.as_dict checks for being an instance of jsonasobj2.JsonObj, so it will just return the object unchanged - i think this is what Chris was talking about in the issue. i think it would be pretty broken - in both directions, our linkml_runtime.utils.jsonasobj2 functions being used on jsonasobj2 objects and vice versa wouldn't work.

We need a specific regression test for this in any case: validating that we can use some frozen kitchen sink models with our functions and etc. Honestly what we might want to do is just make a test data directory that has frozen copies of all the generated models and have a global autouse parameterized fixture that runs the tests twice against current and old models - the linkml-runtime tests are pretty quick iirc.

maybe that's overkill, but what i'd like to get a sense of is
a) is it actually broken or am i wrong (i haven't tested locally, will do) and
b) how broken is it.

If it's mostly fine, we can fix it with smaller tweaks, but if it completely breaks everything, we would want to actively handle it being in the environment. some initial ideas:

  • We could not immediately remove the jsonasobj2 dependency, but use this module as a passthrough and do stuff like have our JsonObj inherit from jsonasobj2, or override the __instancecheck__ method and swap in the matching function/etc. from jsonasobj2 with a warning or smth. edit: actually would it work if we just overrode __instancecheck__ and just returned True if we were checking against jsonasobj2.JsonObj? i would do it but it's getting late and i still have some other work to do

  • Since we have control of the jsonasobj2 repo, we could also update the instance checks there to check if we're a linkml object, emit a warning, and proceed as if it was a jsonasobj2 object.

  • One last idea would be to do some aggressive monkeypatching and force replace jsonasobj2. This is pretty cursed, but also might be the most complete (as in would allow the versions between now and removal to be used with most existing generated models) one while moving fastest on deprecation? we can't guarantee that linkml-runtime is imported before jsonasobj2. in fact if there isn't some code that imports linkml-runtime for other reasons before importing a generated module, we are guaranteed to be imported after it. If i'm not mistaken we will need to go and find every other namespace that could have imported jsonasobj2 and patch it there. from the importlib docs:

    Other references to the old objects (such as names external to the module) are not rebound to refer to the new objects and must be updated in each namespace where they occur if that is desired.

    For generated modules, that's straightforward, we just need to ascend into the containing scope and patch it there, but say someone has some other module in a downstream package that imports jsonasobj2 and uses its functions/objects but not linkml-runtime, and a third module then imports the second module along with a generated module - either before or after. Then jsonasobj2 wouldn't be in our call tree anymore, so we would need to inspect the scope of all the other imported modules too which would be either way too expensive or impossible to do. anyway there are a few more things we'd need to do but you get my point it would be hard to do and probably fragile.

anyway first thing's first, let's write some tests to see if there even is a problem here, but i am thinking we might need to hold onto jsonasobj2 as a dependency and patch it in here in some way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed this does break generated modules, e.g. I replaced the from linkml_runtime.utils.jsonasobj2 import with from jsonasobj2 and i'm getting the expected error, getting returned the original JsonObj, and this does prevent the __init__ method from running.

FAILED tests/test_issues/test_linkml_runtime_issue_1317.py::TestRemoteModularSchemaView::test_imported_classes_present - TypeError: linkml_runtime.linkml_model.meta.ReachabilityQuery() argument after ** must be a mapping, not JsonObj

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental issue I see, which will appear again, is the cycling dependency:
linkml-runtime -> linkml-model -> linkml -> linkml-runtime.

Unfortunately, I can't help much more on my side as I'm not able to reproduce the Github CI of the 3 repos locally.

About breaking isinstanceof call, maybe that's an opportunity for a major release v2.0.0 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes yes that is the problem for within linkml packages, we could have some better sync tooling it is true, but it is resolvable by doing synchronized PRs which isn't too bad. This problem is more about what happens to all the python dataclass modules that have been generated in other projects, which are both independent code but also entangled with linkml-runtime, so we have to make sure to give lots of runway when making changes in linkml-runtime so people notice and regenerate their models without them suddenly breaking.

The bug can be reproduced by swapping this line https://github.com/ticapix/linkml-runtime/blob/73effc99c2a2d76cd9c6270ccfc04f64e20c9e80/linkml_runtime/linkml_model/meta.py#L35

to the previous

from jsonasobj2 import JsonObj, as_dict

which simulates what we would expect an old (or current) pythongen output to be, and how it would behave if someone were to install an updated version of linkml-runtime with this PR merged. I used the metamodel there because it's used in like every test but it could be any generated module.

Unfortunately, I can't help much more on my side

Totally fine, I think i know how to fix this (thanks to you having vendored the module, we have new flexibility we didn't before!) and will add a specific regression test for it too.

I'm not able to reproduce the Github CI of the 3 repos locally.

that's a good point, this is pretty hard to do, we do need some metacode for development, regenerating the metamodel and handling sync'd prs is pretty inaccessible

hide = ['_if_missing', '_root']

# https://github.com/linkml/linkml/issues/1966
warnings.warn("This module is going to be replaced by native Python dataclasses.", PendingDeprecationWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate note on this, I think we will want to give some instructions on what to do here - we are warning someone that something will happen, but don't tell them what to do about it (so then the only way they can silence the warning is to filter it, but then they might miss if it gets updated with instructions). Are the instructions we want to give here just "replace jsonasobj2 stuff with equivalent dataclasses stuff"? if so i think we'll probably want to write an update guide for the docs.

@cmungall we might want to give a version warning too, like "jsonasobj2 will be removed in x version" so people can either set version caps or know when they need to update by, any input on what that version should be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants