Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deleteTasksOnCompletion to Azure Batch configuration #4114

Conversation

adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Jul 19, 2023

Deleting Azure Tasks was checking the configuration object deleteJobsOnCompletion which was incorrect since a task belongs to a job. This adds the equivalent configuration for tasks which is checked before deleting the tasks.

@netlify
Copy link

netlify bot commented Jul 19, 2023

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit e363418
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/64be9db8e8f3d50008a0a0b7
😎 Deploy Preview https://deploy-preview-4114--nextflow-docs-staging.netlify.app/config
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Deleting Azure Tasks was checking the configuration object
deleteJobsOnCompletion which was incorrect since a task belongs to a
job. This adds the equivalent configuration for tasks which is checked
before deleting the tasks.

Signed-off-by: Adam Talbot <[email protected]>
@adamrtalbot adamrtalbot force-pushed the azure_deleteTasksOnCompletion_configuration_added branch from 3ac14a2 to 778d0f0 Compare July 19, 2023 15:47
@adamrtalbot adamrtalbot changed the title Add deleteTasksOnCompletion to Azure task configuration Add deleteTasksOnCompletion to Azure Batch configuration Jul 19, 2023
@nextflow-io nextflow-io deleted a comment from sonatype-lift bot Jul 20, 2023
Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, tho cleanup logic is becoming a bit convoluted.

modules/nf-commons/src/main/nextflow/Const.groovy Outdated Show resolved Hide resolved
@adamrtalbot
Copy link
Collaborator Author

tho cleanup logic is becoming a bit convoluted.

We could just...remove this bit? Deleting Azure Tasks isn't really necessary because it can be performed at the Job (queue) level. I don't believe deleting tasks achieves much other than clearing them out of the Azure Batch interface.

@adamrtalbot adamrtalbot force-pushed the azure_deleteTasksOnCompletion_configuration_added branch from c7ec817 to 32abc3c Compare July 20, 2023 17:10
@pditommaso
Copy link
Member

Tagging @bentsherman for reviewing

Signed-off-by: Ben Sherman <[email protected]>
docs/config.md Outdated
`azure.batch.deleteJobsOnCompletion`
: Enable the automatic deletion of jobs created by the pipeline execution (default: `true`).

`azure.batch.deletePoolsOnCompletion`
: Enable the automatic deletion of compute node pools upon pipeline completion (default: `false`).

`azure.batch.deleteTasksOnCompletion`
: Delete tasks after successful completion. This deletes them on the Azure Batch service but files and resources may persist on the compute nodes (default: `true`).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick thought - does this default to true? I think it actually defaults to false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, all of these delete flags are false by default.

Copy link
Member

@bentsherman bentsherman Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually deleteJobsOnCompletion is true by default because of this line:

if( config.batch().deleteJobsOnCompletion!=Boolean.FALSE ) {

But it defaults to false in shouldDelete()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it defaults to false there too:

if( !taskKey || shouldDelete()==Boolean.FALSE )
return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this logic should happen in AzBatchOpts

Copy link
Member

@bentsherman bentsherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adam, do you think it would be better to remove the task deletion logic? If they are already deleted when the job is deleted (for which there is already a config option), then it seems like that would be better by reducing the number of API calls to Azure Batch. On the other hand, I suppose the tasks would be deleted only when the entire job queue is done, rather than immediately. I'm not sure whether that matters.

@adamrtalbot
Copy link
Collaborator Author

Adam, do you think it would be better to remove the task deletion logic? If they are already deleted when the job is deleted (for which there is already a config option), then it seems like that would be better by reducing the number of API calls to Azure Batch. On the other hand, I suppose the tasks would be deleted only when the entire job queue is done, rather than immediately. I'm not sure whether that matters.

I was thinking the same, I'm not sure what deleting the task adds. The only possibility is if it clears out data from the worker nodes but I don't think it actually does, I think it just removes them from the batch interface.

@bentsherman
Copy link
Member

From the documentation:

When a Task is deleted, all of the files in its directory on the Compute Node where it ran are also deleted (regardless of the retention time).

That makes me think there is an advantage to deleting tasks as soon as possible. Unless someone wants to be able to control task deletion and job deletion, I am fine with the current behavior, which controls both via deleteJobsOnCompletion.

@bentsherman
Copy link
Member

On the other hand, Dave on StackOverflow says:

You don't need to delete tasks or their jobs at all and there's no real downside of leaving them in the system. You do however need to terminate jobs on completion as there is a quota on the number of active jobs within a Batch account.

I think I will go with the Microsoft docs 😆

@adamrtalbot
Copy link
Collaborator Author

On the other hand, Dave on StackOverflow says:

You don't need to delete tasks or their jobs at all and there's no real downside of leaving them in the system. You do however need to terminate jobs on completion as there is a quota on the number of active jobs within a Batch account.

I think I will go with the Microsoft docs 😆

Yep, currently on stable Nextflow leaves jobs in an active state which fills quota. My recent PR switches them to terminate if Nextflow gracefully finishes (still leaves them hanging around if it blows up).

With this behaviour it's adding some task delete calls before running a job delete call, shall we just remove it? It doesn't really add anything as long as the job is deleted cleanly. And we know we frequently hit 429 errors on Azure.

@bentsherman
Copy link
Member

But my point is, if the tasks aren't deleted until the entire job is done, will that cause the task files on the VM to accumulate and run out of storage? Or are the task files already deleted some other way?

@adamrtalbot
Copy link
Collaborator Author

But my point is, if the tasks aren't deleted until the entire job is done, will that cause the task files on the VM to accumulate and run out of storage? Or are the task files already deleted some other way?

Yes, this happens frequently and is a regularly common complaint on Azure. I was wrong, I thought it only ran deleteTask() after completing the workflow but it does it when running handler.checkIfCompleted(). So I think there's enough justification to keep a deleteTasks config item and separate from deleteJobs?

I will definitely clean up the logic.

@bentsherman
Copy link
Member

Okay, then let's add your config option making it true by default, and if a user has rate limit issues then they have that option as a lever. Let me push a few minor changes first.

@adamrtalbot
Copy link
Collaborator Author

Another quick point, according to the code it retains successful runs but not failed ones for debugging reasons? I think we should just remove that block and make it a blanket true or false to simplify the logic.

Signed-off-by: Ben Sherman <[email protected]>
@bentsherman
Copy link
Member

The way it works currently is that deleteTasksOnCompletion can be true, false, or null. It is true by default, but failed tasks are deleted only if it is explicitly true. It's like a hidden third state. So by default, successful tasks will be deleted and failed tasks will be preserved in case you want to inspect it. I think that comment is backwards, so I changed it to "preserve failed tasks for debugging purposes".

If you think that isn't useful, I would be fine with removing it, thereby deleting all tasks by default.

@adamrtalbot
Copy link
Collaborator Author

OK that makes sense. Although in general the worker machine is autoscaled down after a failure and you lose the logs anyway.

The way it works currently is that deleteTasksOnCompletion can be true, false, or null. It is true by default, but failed tasks are deleted only if it is explicitly true. It's like a hidden third state. So by default, successful tasks will be deleted and failed tasks will be preserved in case you want to inspect it. I think that comment is backwards, so I changed it to "preserve failed tasks for debugging purposes".

If you think that isn't useful, I would be fine with removing it, thereby deleting all tasks by default.

That makes sense. The reason to remove it might be to make it simpler, i.e. a simple true or false.

@adamrtalbot
Copy link
Collaborator Author

One small change - let's default deleteJobsOnCompletion to false. We can use job termination to remove jobs from the Azure Batch quota with deleteJobsOnCompletion being for people who really want to keep things clean (e.g. me).

Signed-off-by: Adam Talbot <[email protected]>
docs/config.md Outdated Show resolved Hide resolved
docs/config.md Outdated Show resolved Hide resolved
adamrtalbot and others added 3 commits July 21, 2023 19:14
Co-authored-by: Ben Sherman <[email protected]>
Signed-off-by: Adam Talbot <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Signed-off-by: Adam Talbot <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
@bentsherman
Copy link
Member

Okay I updated the tests and docs to reflect that deleteJobsOnCompletion is now false by default. But we should still make deleteTasksOnCompletion true by default right? So that we are still deleting the task files from the VMs as soon as possible?

Copy link
Member

@bentsherman bentsherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pditommaso summary of changes:

  • change terminateJobsOnCompletion to true by default
  • change deleteJobsOnCompletion to false by default
  • add deleteTasksOnCompletion (true by default) which controls deletion of tasks as soon as they are complete

So now the default behavior is to terminate jobs but not delete them, and to delete tasks as soon as they complete. Failed tasks are still preserved for debugging purposes, unless the option is explicitly enabled.

@pditommaso pditommaso merged commit b14674d into nextflow-io:master Jul 25, 2023
19 checks passed
@adamrtalbot adamrtalbot deleted the azure_deleteTasksOnCompletion_configuration_added branch July 25, 2023 08:39
@adamrtalbot
Copy link
Collaborator Author

Thanks both.

abhi18av pushed a commit to abhi18av/nextflow that referenced this pull request Oct 28, 2023
…#4114)


Deleting Azure Tasks was checking the configuration object
deleteJobsOnCompletion which was incorrect since a task belongs to a
job. This adds the equivalent configuration for tasks which is checked
before deleting the tasks.



Signed-off-by: Adam Talbot <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Adam Talbot <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants