-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using Windows containers in Containerd the windows layerFolder is null and the root is blank #1185
Comments
As noted in the meeting today, a PR clarifying/fixing this would definitely be useful (even if only to have something slightly more concrete to discuss around 🙇). Is there a specific reason to prefer |
I'm separately curious, how is It's possible I'm misunderstanding something, as I'm not recently-familiar with host process containers so I am unaware of any particular mounting-behaviour differences. Edit: Trying to educate myself through GitHub's web interface, I found in containerd/containerd#6618 (comment) that So is this containerd doesn't appear to have populated this field since the runhcs runtime was dropped in favour of the v2 runtime. So right now, it appears containerd sends an out-of-spec OCI container config, and always has done in runtime v2. The hcsshim v2 shim is a willing participant in this, as it doesn't try to use that field. My mild concern with making that field nullable or empty-list-okay is that such a container config will be rejected by the rest of the hcsshim machinery, if you pass it by something other than runtime v2 shim's Create; conversely, there's no way for containerd to populate that field usefully because the runtime shim's processing of the So perhaps a workaround is that the field is made nullable, but the spec notes that in this case, the receiver may reject the spec if it cannot populate that field from pre-existing/extrinsic knowledge. There's already a few cases in this same flow where the spec and containerd can agree, but hcsshim then rejects or locally-fixes the provided configuration for non-spec-visible reasons. Anyway, it doesn't look like this is actually related to host-process containers to me, but it might be easier to trigger there? Note that moby/moby does mount and populate layerFolders, which is also how the runhcs runtime in containerd used to work. It's possible (but not certain) that when the containerd image store work comes to Windows (which would make the containerd backend mandatory), then this will stop being the case, if the containerd backend even still supports this caller-managed-mount flow. |
🤣 Thanks as always for your attention to details
I created a process isolated container via cri to double check, and found that it is also writing a
What do you think about fixing this in containerd? Would that be the right approach? I am working on a different shim and I can certainly make it work this way but it feels wrong to work around it. |
I'm not sure we can fix this in containerd, because the correct values don't actually exist when containerd passes the container config down to the shim, if it's using the We'd either need to revert containerd to do all the layer folder mounting itself before calling the runtime shim, or change the runtime shim protocol so that layerFolders contains dummy paths that are overwritten by the Create message's paths. This would require changes in the shim implementation in hcsshim (an ABI break!) since it currently (incorrectly?? Semantically that makes no sense) appends to layerFolders. Very late edit: hcsshim 0.12 fixes this to reject a create request with both a RootFS in the containerd request, and I think making Longer-term, it might be feasible to subsequently deprecate layerFolders entirely, as apart from the existing calls in Docker (which will hopefully be replaced by containerd anyway), it's currently only ever populated internally by hcsshim long enough to flow into other parts of hcsshim and be converted out again. I noticed while poking at this that the spec file in the original post has another invalid aspect: It has "root": {
"path": ""
}, and for a process-isolated Windows container, that string is a mandatory volume path, which again, containerd doesn't have before it calls the shim as it's not doing the mounting itself. This does provide an example of a 'formally-optional' field that has semantically-define requiredness which is ignored when passing from containerd into the runtime shim. A final option is to say that this config.json is not a valid OCI spec, but merely a part of one being passed around as part of the containerd runtime v2 shim API. Since you're implementing a shim, this seems like the worst possible outcome for you, by punting a bunch of extra parsing/handling logic for the potential differences. Given the redundancy between the OCI container spec and the runtime v2 API it probably should have been originally documented that way. |
thanks for the analysis. Took me awhile to digest it all as this area is new for me. I did a deep dive and see what you are saying. This issue of the runtime config not being correct is the same for both HostProcess and process isolated. A few comments.
The root path is generated by hcsshim during the "create" and then passed when the container is system call that creates the container. task create call and spec set during create. This happens since the file system needs to be turned in to a volume mount which containerd isn't doing at this time as you point out. I am not to sure what to do about this one. It seems like it is no longer really required in the containerd flow and I am not sure what scenarios it would be useful in. I am likely not going to need it either (due to the same layering situation).
I was looking into this and ctr generated these by calling get snapshot and generating mounts. The shim then just re-orders these into the correct order for the layers field. We could do the same when generating the runtime config and the result would be the same. But as you point out hcsshim appends the layers from the rootfs request. So It would require changes in hcsshim either way. I am not sure I understand why changing hcsshim to not append would be a breaking change though... I believe the result would be the same. I.E. if layersFolder present then skip otherwise if empty take from Request field and do current logic. What am I missing?
It seems that if we were able to fill in the layerFields this would be make the runtime config.json valid. In other words another runtime could use that config.json to do all the work required (since mounting the root and getting it's path could be optional field). |
True, but then we're just duplicating work (and implementation) on both sides, and introducing more WIndows-specific code into containerd cross-platform code (rather than hiding it in the WCOW snapshotter as we do now with the parentLayerFolders JSON sidechannel). It also duplicates the information in the containerd runtime V2 shim use-case, which is likely to be the only use-case once Docker migrates to containerd on Windows. Since there's only currently one consumer of this field (
Because newer containerd with older hcsshim would end up with a double-list of layerFolders, and fail elsewhere (I suspect the failure would only happen inside hcs, as nothing in hcsshim is going to check the layerFolders are a valid chain of layers from base to scratch, it just translates from OCI format to HCS format). I personally think a forced hcsshim/containerd lockstep upgrade needs more justification than "sending extra information to hcsshim that is just going to be immediately overwritten with the existing information". Anyway, I think the approach of changing containerd and hcsshim's behaviour here would be better discussed on one of their bug-trackers. As I mentioned before, I do agree that appending the Mounts to the LayerFolders (rather than overwriting or validating that they are the same) is probably a mistake (I can't think of a situation where this is actually going to be valid... prepending maybe), but the maintainers of those stacks are going to be better-placed to look into the blast-radius and potential upgrade problems for users of changing that. And fixing the spec to match the real-world behaviour (or simply declaring that this config.json is not required to be a valid OCI container) seems like a simpler thing to change in the meantime. Honestly, I'd love to get rid of layerFolders entirely, and make |
Is the intent of the config.json and runtime spec that a runtime should be able to configure a container from the config.json? The layers that make up a container are required to properly configure a container. Using Containerd's V2 flow means that other runtimes would need know and adapt to Containerd's runtime does and that the config.json doesn't have enough info to create a container. Even though this is windows specific code, it is similar to what containerd has to do to generate a compliant runtime config.json for Linux specific config.
Agree, though what is decided in the spec should drive that behavior.
This does seem a good approach but I don't know If I have enough context to know where this should go. |
First up: I checked, and per the README, the bundle directory is documented to be an OCI bundle. That said,
Yes, that is the intent of the OCI Bundle and config.json, but
we're talking about the config.json generated by containerd to pass to implementations of the containerd v2 runtime shim API, this is already the core requirement of those implementations. They can't ignore the RPCs from containerd and just take the bundle directory as a ready-to-start OCI Bundle, particularly because they'd have no rootfs. They can (and do) ignore pieces of config.json that either don't apply or conflict with the RPCs from containerd, such as the rootfs path. We're not talking about random config.json found lying around in the world.
I think this is backwards. The spec should follow implementation experience, which is why I suggest discussing this with the implementors. My current feeling is that requiring that the CWD of the runtime shim be a valid OCI Bundle is probably a misspecification, as apart from the runc case where the shim just passes config.json through unchanged, nothing else cares if this is a valid and complete OCI Bundle, as not all fields are relevant to how all runtimes function. (Also, as a nit-pick, if rootfs.path is set, then it's not a valid OCI Bundle, as that would require the rootfs to be already mounted, but the runtime v2 shim spec explicitly requires the shim to take care of any mounting.) Event for the runc shim, I feel like this is risky design, as there's two paths carrying putatively the same information ( kata-containers, for example, loads the config.json and then proceeds to overwrite pieces, and cherry-pick the things it needs from it, ignoring other pieces of favour of I'd suggest that maybe the config.json in the containerd v2 runtime shim be respecified as "OCI container spec except the following fields are optional because they also exist in the On the other hand, maybe the containerd maintainers will feel it's worth trying to get more of the config.json populated with information that also appears in the containerd v2 runtime shim protocol, sufficient to let a spec-compliant syntax parser work, even if the semantics and platform-varying requirements are not checked (in some cases because they cannot be validly populated, and in some cases because very few JSON parsers can actually validate fields whose valid states depend on other fields), and they do not actually match the behaviour of the shim (i.e. config.json is unreliable for introspection), and we need to require a specific minimum version of the Windows v2 shim because the existing back/forwards ABI compatibility is broken because the Windows shim (faultily) depends on receiving a spec-invalid config.json. Perhaps another, better solution will be discovered? I can't see any further value being derived from just poking at the spec doc at this point, myself, since at this point any changes made to the spec for this use-case are simply carving out exceptions for how containerd uses config.json in its own API, but in turn make OCI Bundles less-reliably-runnable for their actual use-case, being a complete and self-contained runnable container spec. (Making layerFolders optional feels mostly like this , although there's an independent case to be made that the current root.path spec for Windows means layerFolder should have been optional when the root is passed in as a volume name, as layerFolders data has already been done by the caller to create that volume. But I'd have to check hcsshim to be sure how it resolves that conflict today, in the non-containerd case.) An option would be to take the time to review the config.json spec in the face of the world having gotten more complicated than runc, crun, and (briefly) runhcs, which is a much wider task and could lead to cleanups as I mentioned above around removing LayerFolders and generalising |
Thanks for the details, I've learned a ton from this thread. That all makes sense to me. It seems as though there has been some drift in the implementation and the spec (specifically the runtime bundle) and how it is used.
I don't know what the community's appetite for this would be or where to best begin the conversation. In the meantime, for my case, I can use similar approaches where I can build the layers I need from the Containerd V2 Is it worth opening a PR here to make those fields we discussed ( |
Opening such a PR might be worth it, as it'll make the proposed changes concretely visible, which is probably a good basis for discussion. I wouldn't wait for the resolution though; getting the Rust spec reader to not panic when a non-optional field is missing seems reasonable. The outcome might be the weakening of the field from "minimum 2 entries" to "minimum empty list" and then fixing the places (like |
If windows layerFolders in null string then loading of the runtime spec fails. This happens when containerd serializes Windows{} portion of the spec due to the way golang serializes lists. Although the runtime spec says this is a required field, containerd 1.6 and 1.7 do not fill this field resulting in a null value when serialized to disk. See opencontainers/runtime-spec#1185 for discusion of this field in the runtime spec. Signed-off-by: James Sturtevant <[email protected]>
If windows layerFolders in null string then loading of the runtime spec fails. This happens when containerd serializes Windows{} portion of the spec due to the way golang serializes lists. Although the runtime spec says this is a required field, containerd 1.6 and 1.7 do not fill this field resulting in a null value when serialized to disk. See opencontainers/runtime-spec#1185 for discusion of this field in the runtime spec. Signed-off-by: James Sturtevant <[email protected]>
If windows layerFolders in null string then loading of the runtime spec fails. This happens when containerd serializes Windows{} portion of the spec due to the way golang serializes lists. Although the runtime spec says this is a required field, containerd 1.6 and 1.7 do not fill this field resulting in a null value when serialized to disk. See opencontainers/runtime-spec#1185 for discusion of this field in the runtime spec. Signed-off-by: James Sturtevant <[email protected]>
Propose to update the Runtime spec to allow for null or empty in the
Windows.layerFolder
field. I found an issue where this wasn't being parsed correctly: youki-dev/oci-spec-rs#126. It works for HostProcess Containers in the go implementations due the way go serializes lists.The schema and spec for windows state the
layerFolder
should be a min of 1 item:There was recently work to enable Host Process containers for Windows and a scratch image was created for it. When running that image the runtime config doesn't have a layer folder:
The text was updated successfully, but these errors were encountered: