The mutex is not released after the pod completes. #14002

waring92 · 2024-12-16T05:07:04Z

Pre-requisites

I have double-checked my configuration
I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
I have searched existing issues and could not find a match for this bug
I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

I have a WorkflowTemplate that defines sequential workflows using a DAG.
The DAG is responsible solely for managing the sequence of workflows, and each step references a workflow using templateRef.

Some of these steps (workflows) implement their own mutex synchronization mechanisms, but these are not applied to the entire WorkflowTemplate. In other words, certain steps need synchronization, while others do not.

The issue arises when a workflow with its own mutex completes execution, even after its pod is finished.
At this point, another parallel workflow becomes stuck with the message:

Waiting for … lock. Lock status: 0/1

From my understanding, once the pod with the mutex has completed execution, the mutex should be released, allowing the next workflow to acquire the lock.
However, it appears that mutexes for all workflows within the template are only released after the entire WorkflowTemplate has completed execution.

Am I misunderstanding how the mutex synchronization works in this context?
Or is there a configuration or behavior I may have overlooked that ensures the mutex is released immediately after the specific workflow (or pod) finishes?

I register this issue with version 3.5.5, because there have been no updates regarding this feature.

Version(s)

v3.5.5

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.

The following is the WorkflowTemplate

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: ...
  generateName: ...
  namespace: argo
spec:
  entrypoint: ...
  serviceAccountName: argo
  templates:
  - name: ...
    inputs:
      parameters:
      - ...
    dag:
      tasks:
      ...
      - name: get-workflow
        templateRef:
          name: specific-workflow
          template: specific-workflow-template
        arguments:
          parameters:
          - name: mutex_key
            value: "mutex/key"
            ...

And the following is the referred workflow.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: specific-workflow
  namespace: argo
spec:
  serviceAccountName: argo
  templates:
  - name: specific-workflow-template
    inputs:
      parameters:
      ...
      - name: mutex_key
        value: "default"
    synchronization:
      mutex:
        name: "{{inputs.parameters.mutex_key}}"
    script:
    ...

Logs from the workflow controller

time="2024-12-16T04:54:59.763Z" level=info msg="Could not acquire lock named: &{argo mutex-key  Mutex}" namespace=argo workflow=...

Logs from in your workflow's wait container

Waiting for argo/Mutex/mutex-key lock. Lock status: 0/1

The text was updated successfully, but these errors were encountered:

isubasinghe · 2024-12-16T05:11:55Z

@waring92 can you check if this is an issue in 3.5.11 please? I suspect that this issue is fixed there.
#13553 this fix is not in 3.5.5

waring92 · 2024-12-16T05:37:25Z

@waring92 can you check if this is an issue in 3.5.11 please? I suspect that this issue is fixed there. #13553 this fix is not in 3.5.5

Thank you for your reply.
But the same situation in v3.5.11

waring92 · 2024-12-20T01:29:20Z

After numerous attempts, I discovered that the mutex holding the "/" character in its name remains locked.
Is it working correctly?

waring92 added the type/bug label Dec 16, 2024

shuangkun added the area/mutex-semaphore label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The mutex is not released after the pod completes. #14002

The mutex is not released after the pod completes. #14002

waring92 commented Dec 16, 2024 •

edited

Loading

isubasinghe commented Dec 16, 2024 •

edited

Loading

waring92 commented Dec 16, 2024

waring92 commented Dec 20, 2024

The mutex is not released after the pod completes. #14002

The mutex is not released after the pod completes. #14002

Comments

waring92 commented Dec 16, 2024 • edited Loading

Pre-requisites

What happened? What did you expect to happen?

Version(s)

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.

Logs from the workflow controller

Logs from in your workflow's wait container

isubasinghe commented Dec 16, 2024 • edited Loading

waring92 commented Dec 16, 2024

waring92 commented Dec 20, 2024

waring92 commented Dec 16, 2024 •

edited

Loading

isubasinghe commented Dec 16, 2024 •

edited

Loading