v4 issues and design ideas #629

thodkatz · 2024-09-03T09:43:15Z

Hey hey!

bioimageio.spec==0.5.3.2
bioimageio.core==0.6.8.2

I am not sure if we still want to support v4, but I was trying to use it, for backwards compatibility for testing tiktorch, since there are some models defined as v4 still.

I have experienced the following issues.

My project structure is:

├── myspec
│   ├── dummy_network.py
│   ├── __init__.py
│   ├── model.py
│   └── src
│       ├── __init__.py
│       └── main.py
└── setup.py

And in case you want to reproduce this setup, here is the repo https://github.com/thodkatz/spec-playground.

All the issues are based usually when I want to create a v4 object from scratch, not reading a yaml file.

It is worth mentioning that v5, I don't experience these issues.

Networks's file path can't be absolute

# myspec/model.py

from myspec.dummy_model import DummyNetwork # 'DummyNetwork' is my nn.Module

dummy_model_module = /home/username/repos/myspec/myspec/dummy_model.py

v0_4.WeightsDescr(
            pytorch_state_dict=v0_4.PytorchStateDictWeightsDescr(
                source=Path(weights_file.name),
                architecture=f"{dummy_model_module}:{DummyNetwork.__name__}",
                architecture_sha256="cb1e8501cca44a63f3acd91cf35da121063156d4bd8213d466614f6a84bbfbfa"
            )
        )

The validation will fail because absolute paths are not accepted. This can be not so convenient, if the network is defined not in a relative convenient path.

But let's try to make it relative...

Network's relative path isn't realized in model descriptor's definition

Having as reference the file structure mentioning in the beginning, expected relative path of my network's definition should be dummy_model.py:{DummyNetwork.__name__}

# myspec/model.py

v0_4.WeightsDescr(
            pytorch_state_dict=v0_4.PytorchStateDictWeightsDescr(
                source=Path(weights_file.name),
                architecture=f"dummy_model.py:{DummyNetwork.__name__}",
                architecture_sha256="cb1e8501cca44a63f3acd91cf35da121063156d4bd8213d466614f6a84bbfbfa"
            )
        )

# myspec/src/main.py

my_model = get_my_descriptor() # `get_my_descriptor` returns the v4 object defined in `myspec/model.py`

then the validation will fail with /path/to/myspec/myspec/src/dummy_network.py does't point to an existing file, meaning that the relative path was resolved based on where the module was called, and not defined.

But even if this is desired behavior, we can fix this by using ../dummy_model.py, but then the next issue will come.

Serialized object keeps relative path

# myspec/model.py

v0_4.WeightsDescr(
            pytorch_state_dict=v0_4.PytorchStateDictWeightsDescr(
                source=Path(weights_file.name),
                architecture=f"../dummy_model.py:{DummyNetwork.__name__}",
                architecture_sha256="cb1e8501cca44a63f3acd91cf35da121063156d4bd8213d466614f6a84bbfbfa"
            )
        )

# myspec/src/main.py

from myspec.model import get_bioimage_model_v4
from bioimageio.spec import save_bioimageio_package, load_description

model = get_bioimage_model_v4()
zip_file = save_bioimageio_package(model)
ret_model = load_description(zip_file)

, the load_description will fail with /tmp/tmpdkby6kms.bioimageio.zip.unzip/../dummy_network.py, meaning that the relative path of ../dummy_network was retained when we have attempted to save the package with save_bioimageio_package. I think it should be updated based on the structure that the save_bioimageio_package is crafting.

What I currently use and works nicely

providing modules in the architecture, works nicely :) It is imported as expected.

weights=v0_4.WeightsDescr(
    pytorch_state_dict=v0_4.PytorchStateDictWeightsDescr(
        source=Path(weights_file.name),
        architecture="myspec.dummy_network.DummyNetwork",
    )
)

Idea

Also, regarding the design of the v5 and v4 models, I think that they are very much coupled with files, and an expected yaml file as a reference point. Maybe an object ModelDescr should be assumed as the ground truth, and instead of using an object to describe a file, the object should just be expressed as a file.

Also instead of relying on a path to get the data, the object should be content dependent. For example, if we have as a dependency, a numpy array, then we need a numpy array and not a file that points to a numpy array. That way the dependency is decoupled, and there could be many ways to get a numpy array.

I am trying to say that maybe creating a ModelDescr object should be possible without any file? We could have a nice decoupling, and testing won't rely on disk operations as well.

From a design perspective, a builder pattern could be also very useful, since these objects have many arguments, and some could be optional, making the process of creating objects like that quite sophisticated. A builder pattern could be very handy. I am not sure if pydantic supports this.

edit: the builder pattern isn't really valid, essentialy the features of optional, keyword, required arguments with potential classmethods provides the functionality of a builder pattern. Python supports all of the above. Nice read: https://python-patterns.guide/gang-of-four/builder/

The text was updated successfully, but these errors were encountered:

FynnBe · 2024-09-04T12:18:21Z

for time constraint reasons here is just a brief comment after skimming this issue.

The ModelDescr object is supposed to be the in-code represenation of the rdf.yaml content.
It may have some added @propertyies and convenience functions, but should not deviate too much from the war yaml content.

we used to have this separated into RawNode and Node, where Node was defined in core and RawNode. (The implementation was based on marshmallow). It was dropped mostly to simplify the code and avoid a lot of duplication -- at the cost of issues like this one.

I think the builder pattern is the answer here.
It offers easy use in code while staying transparent with what metadata is saved in the rdf.yaml.

I would not focus on model 0.4, but rather start implementing builder methods for model 0.5 parts etc.

thodkatz mentioned this issue Sep 3, 2024

Update submodules to the latest changes ilastik/tiktorch#212

Merged

thodkatz changed the title ~~v4 issues~~ v4 issues and design ideas Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4 issues and design ideas #629

v4 issues and design ideas #629

thodkatz commented Sep 3, 2024 •

edited

Loading

FynnBe commented Sep 4, 2024

v4 issues and design ideas #629

v4 issues and design ideas #629

Comments

thodkatz commented Sep 3, 2024 • edited Loading

Networks's file path can't be absolute

Network's relative path isn't realized in model descriptor's definition

Serialized object keeps relative path

What I currently use and works nicely

Idea

FynnBe commented Sep 4, 2024

thodkatz commented Sep 3, 2024 •

edited

Loading