-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to use public IP for the pod VM in Azure #2035
Conversation
This change allows CAA to use the public IP of the pod VM to make a connection to the kata-agent. A static public IP is created and attached to the VM NIC. Dynamic public IP is not working as the IP is not available immediately. Network Security Group (NSG) should be adjusted to allow connectivity to port 15150 from the specific IP range of the systems running cloud-api-adaptor (CAA). Note that the communication between CAA and pod VM uses TLS. Signed-off-by: Pradipta Banerjee <[email protected]>
The static public IP doesn't get deleted automatically, hence delete it post the VM deletion. Signed-off-by: Pradipta Banerjee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering whether we could have those resources being created (and deleted) implicitly in a createVM call instead of explicitly creating and deleting them (using a VirtualMachinePublicIPAddressConfiguration
).
See this code sample: https://github.com/mkulke/mkosi-playground/blob/e7bdeed71f8a3820fa265bf1ca74c7fec2e0e6cb/launch-vm/main.go#L158-L176
I'm not sure why we weren't doing that for the NIC in the first place, so I need to test that. If it works like this we wouldn't have to manage the lifecycle of public ips and nics manually, which can be prone to race conditions.
Makes sense. On the public IP front I tried to do something similar but it didn't work for me. |
i see, let me try that. it shouldn't be too hard to convert the code to implicit nic creation. |
I have played around with implicit creation of nics, it seems to work for me, at least I didn't encounter problems after some casual testing: https://github.com/confidential-containers/cloud-api-adaptor/compare/main...mkulke:cloud-api-adaptor:mkulke/az-use-implicit-nic-creation?expand=1 |
@mkulke thanks, let me try with your changes. |
yeah, that would be interesting. I'm currently observing network problems after more thorough testing. I can't really explain that yet, since the infra looks similar when created implicitly. that's pretty curious, and it would be good to get to the bottom of this problem. but that might not be trivial, and if it's urgent we could consider merging the explicit management of IP addresses in this PR. there is a similar resource leak risk as it exists for NICs currently, but since this setting is off-by-default and should not be turned on casually, it might be tolerable. |
@mkulke my initial test using your code resulted in the following error. I cherry-picked the changes on top of 0.9.0 to work on my current setup.
Is this what you are seeing? I'm yet to debug it though |
hmm no, that's not what I am seeing. my vm is created successfully, but there are network connectivity issues once the vm is created (it works initially, but fails during image-pull). the error is interesting, though. |
I tried to reproduce that error. if the vnet we are attaching the vm to does not have a NAT gw, it will refuse to start the vm. this is expected behaviour. see this link |
I found the origin of my issues. it turned out to be specific to the network I was testing in. the implicitly created NICs were subject to outbound traffic restrictions, while the explicitly created NICs were not. |
@bpradipt I pushed some changes to that branch, there's a commit that (always) adds a public IP (the above error should be gone, even if you have no NAT gw on your subnet) please test if you have time. If that works for you, I'd open a discrete PR with the implicit-NIC-creation and we can base a |
I dove a bit more into this, it turns out this is actually expected, albeit a bit surprising. The reason we are having outbound connectivity atm is because when we create a NIC and a VM seperately the NIC will get "default outbound access" via a transparent public ip. This will not be the case if we crate the NIC as part of a VM. Implicitly assigning a public IP is not great security-wise and hence this behaviour is being retired. So, either way we have to make sure that podvms will be able to pull images from the internet (or not - depending on whether a user would want that in their deployment) by using explicit network configuration |
Superseded by #2056 |
This is useful for environments where the K8s cluster is run on a developer workstation and peer pod is created in Azure. Useful for testing, working on AI models requiring large VMs etc.
Similar functionality also exists for the AWS provider. So this PR also brings functional parity.