Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ content library support proposal #1839

Closed

Conversation

adam-jian-zhang
Copy link
Contributor

What this PR does / why we need it:

Proposal for content library support.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1838

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:


Support content library as the source for VM clone.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 27, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @adam-jian-zhang. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 27, 2023
@adam-jian-zhang
Copy link
Contributor Author

@srm09 please have a review.

Copy link
Contributor

@srm09 srm09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
@yastij what do you think?

docs/proposal/content-library-support.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 6, 2023
@srm09
Copy link
Contributor

srm09 commented Apr 6, 2023

/assign @yastij
/ok-to-test

The verify markdown job is failing, could you please fix that.

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 6, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 6, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 13, 2023
@adam-jian-zhang
Copy link
Contributor Author

@srm09, @yastij , @rikatz, @vrabbi I have addressed the comments, and fixed lint, please have a review.

@rikatz
Copy link
Contributor

rikatz commented Apr 13, 2023

/lgtm
Thank you!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 13, 2023
@randomvariable
Copy link
Member

/assign @vrabbi
/assign @randomvariable

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a reasonable feature and implementation looks pretty straight forward.

Should we additionally take care via validation, that if the Template refers a library, that the CloneMode is set to fullClone?

From testing perspective: I think we could just switch over a single e2e test to clone from a local content library in the test environment, which then uses the same template as before but just from a different source.

// It is optional since enabling access control for the content library is optional. It's up to content library admin to decide if
// it is required to enable access control.
// +optional
LibraryCredentials string `json:"library_credentials"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this of type corev1.TypedLocalObjectReference?

This way we would be able to in future support other object types too without having to adjust the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are credentials for the library needed? A sequence diagram of interactions between VA-API and CLS that go from library item to VM creation should make this clear.

The content library will already have been subscribed to in vCenter with whatever http credentials it needs, we should only need to pass it's UUID to CreateTemplate or CreateOVF

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-reading the doc, as the overall design states that For the content library management itself users can reference vSphere documentation, I agree the credential should not be required.

Also, we should probably add that the "Management of content library items and subscriptions" is a nongoal, so it is clear that the VI admin should first subscribe/create a content library and CAPV will only use it.

Copy link
Member

@randomvariable randomvariable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to see quite a bit more detail before approving this one. Please respond to comments.

// It is optional since enabling access control for the content library is optional. It's up to content library admin to decide if
// it is required to enable access control.
// +optional
LibraryCredentials string `json:"library_credentials"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are credentials for the library needed? A sequence diagram of interactions between VA-API and CLS that go from library item to VM creation should make this clear.

The content library will already have been subscribed to in vCenter with whatever http credentials it needs, we should only need to pass it's UUID to CreateTemplate or CreateOVF


##### VSphereVM clone changes

We need to enhance the [clone](https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/19c5deb79e97a8cd9d546ec39db3f29231bbcb91/pkg/services/govmomi/vcenter/clone.go#L75) function to recogize the scheme we defined, and choose appropriate API to clone the VM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it needs to download the image content from Content Library when deploying.
We need to be careful here as it may run multiple times and create a heavy workload to vCenter if we need multiple VM instances.
I think it would be better to import the image as a separate template and do the clone as what we have today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When creating a subscribed content library, there are two options for dealing with the content, download immediately/when needed.
I think we should advise user to use eager option(download immediately).
If user choose lazy option(when needed), I think the work round here is to deploy a test vm before creating clusters, or create a small test cluster to fetch the library items to local.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with download immediately is that we may end up subscribing to CL with a lot of images that may be problematic for long term storage space.

Anyway, can we agree this is not an enhancement question but more an implementation detail?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a content library is on-demand subscribed, an OVF template in it can be still deployed, it will take longer as it needs to download the image from the publisher first and may delay the cluster deployment, what other problems are we trying to avoid?

@randomvariable
Copy link
Member

/assign @rikatz

@randomvariable
Copy link
Member

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 15, 2023
Content library can help template management in this case, it supports subscriptions among different content libraries and provides sync capability to help distribute OVA/templates to downstream content libraries. For example, we can set up a content library on the primary vCenter that hosts the management cluster, put the OVA/templates in the content library which serves as the version of truth for the OVA/templates we intended to use. And then for other vCenters that
hosts workload clusters, we set up a content library and subscribe to the primary content library, and sync the content library items(templates/OVAs) to it. In this way it is easy to manage templates/OVAs and have confidence in the content of the templates/OVAs.

Using content library as the source of templates has another little benefit that enables using local datastore as destination for the VMs, currently we are limited to shared datastore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a problem solved by content library, but mostly because of a problem on the way we used to clone our templates: #2134 as we always set the datastore as the same from the template (while VC does support cloning between different datastores).


### Goals

* Use ContentLibrary items as the VM image template.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to explore a bit more on goals and non goals, are we going to support at this time "OVA and OVF templates" and "VM templates"?

The "OVA and OVF templates" deploy should be the one simpler, where we just "Import the item" on the content library, while the 2nd one is when we deploy a VM, customize it on vSphere and clone it back as a template to CL.

It would be good to make it clear what we want to support right now :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: do we want to provide an official subscribable content library for OVAs? If not, probably this should be a non goal to make it explicit.

@rikatz
Copy link
Contributor

rikatz commented Aug 21, 2023

hey @adam-jian-zhang , just took a look into the proposal again and left some comments.

Additionally, out of the proposal but IMHO should be part of the implementation, I think we need to consider that:

  • This workflow should be as easy and transparent as possible to the user. This means that from an API change, I think it is fine to have just the scheme/itemname support and nothing else.
  • While not part of the direct implementation, I think that lowering the adoption bar of this feature would be great for users, so I would expect that there's a quick start documentation on using Content Libraries, some real next/next/finish like "Create content library, import OVA item, use it on your CAPV cluster".

I'm happy and also interested to help you move this proposal forward, let me know if you need something from me to keep this moving!

@chrischdi
Copy link
Member

@rikatz maybe worth talking directly to @adam-jian-zhang to figure out when and who will continue on this one or if it makes sense to take this over :-)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from randomvariable. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Sep 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.89%. Comparing base (d37d6f9) to head (76f9abb).
Report is 640 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1839      +/-   ##
==========================================
- Coverage   60.96%   60.89%   -0.08%     
==========================================
  Files         164      164              
  Lines        9469     9469              
==========================================
- Hits         5773     5766       -7     
- Misses       3285     3290       +5     
- Partials      411      413       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +137 to +138
* Parse the field as a URI. If just a "Host" is present, it means user is requesting an item name, so it should be queried by item name
* If both host and item are present, then host is the content library name (or uuid) and path is the item name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get it right that it would be possible to use it via (noob content library question):

  • not specify the library name or id: library://ubuntu-22.04-k8s-1.28.0
  • specify the library name or id: library://foo/ubuntu-22.04-k8s-1.28.0

Adding direct examples here would make it a better read :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you do! but if there are two items named ubuntu-22.04-k8s-1.28.0 on different libraries it should be an error, as CAPV should not select for the user one of two IMHO

Same for 2 content libraries with same names on VC.

Let me add some examples here!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, thx

Copy link
Contributor Author

@adam-jian-zhang adam-jian-zhang Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, we need a way to unique identify the library item user want, the proposed scheme library://library_name_or_id/itemname should be able to address that.
library_id is what we actually needed here to resolve ambiguity here, allowing user to use library_name instead or even omitting the library_id/library_name when the item name is unique is just for enhancing user experience.

Personally I think the full path is better( although it is harder to use), as it can avoid us problem down the road. I remember we have several similar issues for templates, as we allow user to just specify the template name not the full path of the template, and use tkr-resolver to update it to the full path later, which may break some time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've discussed with @adam-jian-zhang and I'm also adding to the status field of vspherevm what was the "resolved" library id / item name

I was even thinking about adding the item ID, but I think in this case is too much, as we are not allowing users to add the item ID anyway, wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what would the status field get used for? If it is just for visualization: I think having a log line would also be okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, IIRC vspherevm is not mutable after cloned right? So it would just be a "I'm based on this image" field, which can be a log line yes (unless I'm forgetting some other condition)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't actually use the field I slightly tend to only log it yes 👍

I assume the image could get moved in the content library, so the status field may get outdated.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Content Library, the name can change, but the ID of a library or library item does not, so showing the resolved ID in the status can be useful in certain scenarios.

// ERROR deploying the content library item
}

finder := find.NewFinder(vim25Client)
Copy link
Member

@chrischdi chrischdi Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume DeployLibraryItem is not instantly ready and would result in a task on vsphere? so finding the object is after the task did run right? And the logic below here should then be the same as for the normal clone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it appears to not return a task, I'm checking here

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not show the task in the deploy result though it does create a vCenter task, you can find it by query vCenter task API, and may check how VM operator handle it.


func deployContentLibraryAsVM(itemID string, additionalargs...) (*object.VirtualMachine, error) {
m := vcenter.NewManager(restclient)
ref, err := m.DeployLibraryItem(ctx, itemID string, vcenter.Deploy{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function call is basically the equivalent to calling tpl.Clone right? (

tpl, err := template.FindTemplate(ctx, ctx.VSphereVM.Spec.Template)
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct :) And the customization will need to happen differently, as we know what exists on a template, but the OVF deployment is different (I need to figure out actually if there's a way to get the spec and customize accordingly, I didn't wanted to go that down on the spec right now).

So, for template we call lastly tpl.Clone that generates a new task.

And for library we call DeployLibraryItem that does that directly (I'm checking if there's a way to generate a task as well)

@rikatz
Copy link
Contributor

rikatz commented Sep 25, 2023

Question: should we add a featureflag for it?

@chrischdi
Copy link
Member

Question: should we add a featureflag for it?

Good question: I tend to say no, because you have to explicitly use the library:// prefix to make use of it. There's no magical "gets used".

When adding a feature gate for it I think we would also have to verify it on creation (if the feature gate is disabled) that no library:// prefix gets set.

@rikatz
Copy link
Contributor

rikatz commented Nov 20, 2023

@chrischdi 👋

I'm coming back to this (after almost 2 months of hiatus, sorry!)

Can you take a new look just to see if I'm missing something?

I think the most important piece right now is that the "cloning" from a Content Library should happen on a different workflow, as it does not use a template/object reference but instead https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/1839/files#diff-05d6df7ed3848d35aa732cfdeb480d8a394f42809bc3489bc87a9bcdcc0fcc6dR219 and also as you pointed, it is "sync" (does not return a task).

Thanks!

@chrischdi
Copy link
Member

Hi @rikatz , I think this looks good to continue.

The only question coming up for me after reading all again is this part here:

https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/1839/files#r1335335234

About "do we have to pre download it in some way" but I'd let it up to you if there is something to think about here. It would be more obvious to me when we have code :-)

// the virtual machine. It can be a vSphere template, or a content library item. For example:
// vsphere template: [vsphere://location of the vsphere template], vsphere:// can be omitted for backward compatibility
// content library: [library://library_name_or_id/content_library_item_name]
// note that this has impact on cloneMode field, since it is only possible to do full clone from content library items,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full clone holds true for OVF templates in Content Library, Content Library has the checkin/checkout feature for VMTX templates (supported within single VC) that can do linked clone ([checkin checkout])(https://developer.vmware.com/apis/vsphere-automation/latest/vcenter/api/vcenter/vm-template/library-items/template_library_item/check-outsactioncheck-out/post/), if this VirtualMachineCloneSpec may support VMTX template from Content Library, better to mention this fullclone is for OVF template only.

Comment on lines +137 to +138
* Parse the field as a URI. If just a "Host" is present, it means user is requesting an item name, so it should be queried by item name
* If both host and item are present, then host is the content library name (or uuid) and path is the item name
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Content Library, the name can change, but the ID of a library or library item does not, so showing the resolved ID in the status can be useful in certain scenarios.


##### VSphereVM clone changes

We need to enhance the [clone](https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/19c5deb79e97a8cd9d546ec39db3f29231bbcb91/pkg/services/govmomi/vcenter/clone.go#L75) function to recogize the scheme we defined, and choose appropriate API to clone the VM.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a content library is on-demand subscribed, an OVF template in it can be still deployed, it will take longer as it needs to download the image from the publisher first and may delay the cluster deployment, what other problems are we trying to avoid?

// ERROR deploying the content library item
}

finder := find.NewFinder(vim25Client)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not show the task in the deploy result though it does create a vCenter task, you can find it by query vCenter task API, and may check how VM operator handle it.

@k8s-ci-robot
Copy link
Contributor

@adam-jian-zhang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-vsphere-e2e-main 30e3b1d link true /test pull-cluster-api-provider-vsphere-e2e-main
pull-cluster-api-provider-vsphere-e2e-blocking-main 76f9abb link true /test pull-cluster-api-provider-vsphere-e2e-blocking-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sbueringer
Copy link
Member

/close

Let's revive if there is interest in this again

@k8s-ci-robot
Copy link
Contributor

@sbueringer: Closed this PR.

In response to this:

/close

Let's revive if there is interest in this again

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Content Library Support