You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While testing azure file cloning in OpenShift we noticed that if a controller pod running azcopy job is killed any clone PVC still being copied gets stuck in Pending phase.
This seems to be the effect of azcopy relying on job plan files (AZCOPY_JOB_PLAN_LOCATION or ~/.azcopy by default) to track jobs which can be lost if stored on ephemeral volume.
Is this a know limitation or is there a recommended solution?
What you expected to happen:
Clone job surviving a lost controller pod.
How to reproduce it:
Create Azure File PVC/PV
Create a new clone PVC referencing the origin PVC as a source
Kill Azure File leader controller pod
Clone PVC is stuck in Pending state
Anything else we need to know?:
Checking the helm charts in this repo the destination for those job plan files seems to be ephemeral with emptyDir volume so the issue would occur with this deployment as well:
yes, that's a limitation since the job status can only be stored locally in ~/.azcopy, as long as the controller pod is restarted, copy jobs could not be continued.
Can Kubernetes Job objects be used for scheduling these azcopy operations, so as these cloning operations can be tracked separately? If controller pod dies, then cloning can continue and when controller pod restarts, it can find existing cloning Jobs (via label or something) and then continue with creation of PV etc.
What happened:
While testing azure file cloning in OpenShift we noticed that if a controller pod running azcopy job is killed any clone PVC still being copied gets stuck in Pending phase.
This seems to be the effect of azcopy relying on job plan files (
AZCOPY_JOB_PLAN_LOCATION
or~/.azcopy
by default) to track jobs which can be lost if stored on ephemeral volume.Is this a know limitation or is there a recommended solution?
What you expected to happen:
Clone job surviving a lost controller pod.
How to reproduce it:
Anything else we need to know?:
Checking the helm charts in this repo the destination for those job plan files seems to be ephemeral with
emptyDir
volume so the issue would occur with this deployment as well:azurefile-csi-driver/charts/v1.30.2/azurefile-csi-driver/templates/csi-azurefile-controller.yaml
Line 229 in eefed91
Environment:
kubectl version
):uname -a
):The text was updated successfully, but these errors were encountered: