Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent "failed to run exec_wrapper action module_powershell_wrapper: Failed to compile C# code" errors #657

Open
Yannik opened this issue Sep 12, 2024 · 8 comments

Comments

@Yannik
Copy link

Yannik commented Sep 12, 2024

SUMMARY

With ever growing host count (currently 180 ansible managed windows 2019/2022 servers), I am seeing more and more of these errors, breaking our deployment CI/CD pipeline:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: at <ScriptBlock>, <No file>: line 11
fatal: [xxxx]: FAILED! => changed=false 
  msg: |-
    internal error: failed to run exec_wrapper action module_powershell_wrapper: Failed to compile C# code:
    error CS0016: Could not write to output file 'c:\Users\svc_ansible_admin\AppData\Local\Temp\9ad9c096-1bd3-4845-8e20-84ea3f018fd8\bsze5fqf.dll' -- 'The process cannot access the file because it is being used by another process. '

I had already reported this here, in an issue with a similar problem that was successfully resolved thanks to @jborean93.

Would be great if it was possible to solve this one too.

ISSUE TYPE
  • Bug Report
COMPONENT NAME

Unsure

ANSIBLE VERSION
ansible [core 2.16.11]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /app/lib/python3.12/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /app/bin/ansible
  python version = 3.12.6 (main, Sep  9 2024, 18:09:49) [GCC 13.2.1 20240309] (/usr/local/bin/python)
  jinja version = 3.1.4
  libyaml = True

STEPS TO REPRODUCE

Execute any windows task on enough hosts and you will run into this.

@jborean93
Copy link
Collaborator

Unfortunately there is not much we can do here, the process to compile the code uses csc.exe (called by the C# compiler methods) and the error you see here is from csc.exe itself and not any code we control. The typical reason why you would see this error is an AV or other scanning tool is either deleting or in your case holding an exclusive lock on the file. As we don't control how csc.exe work we have little sway over the outcome here.

We do provide a way to change the temporary directory used here through the remote_tmp option on the shell plugin. This could potentially be changed to a location that is either trusted by the AV or maybe less likely for it to be scanned and locked during the run.

@Yannik
Copy link
Author

Yannik commented Sep 13, 2024

We do provide a way to change the temporary directory used here through the remote_tmp option on the shell plugin. This could potentially be changed to a location that is either trusted by the AV or maybe less likely for it to be scanned and locked during the run.

As far as I can see, this directory could simply be used by an attacker as well, creating an attack vector? (Unless the code is signed - which I'm sure it isn't.. That said - signing of the temporary code done by the ansible controller DOES sound like an interesting idea!)

Anyway - wouldn't a retry/backoff mechanism pretty much solve this problem? Since this is only occuring every couple thousand task executions, it seems very much like unlucky timing.

@jborean93
Copy link
Collaborator

As far as I can see, this directory could simply be used by an attacker as well, creating an attack vector?

It's certainly not idea but potentially just changing it to another var and not the default $env:TEMP might be enough to stop the AV from picking it up.

That said - signing of the temporary code done by the ansible controller DOES sound like an interesting idea!)

It's certainly something we are looking into potentially but there are a lot of questions it brings up which make it hard to achieve.

Anyway - wouldn't a retry/backoff mechanism pretty much solve this problem? Since this is only occuring every couple thousand task executions, it seems very much like unlucky timing.

Not necessarily, in some cases maybe but in others it could just fail everytime. In other cases there could be code out of our control that uses Add-Type and not our custom Add-CSharpType. I prefer not to add a retry mechanism for such a scenario but I could be convinced otherwise.

One area I want to also look into for the next Ansible version if I have time is to officially support PowerShell 7.x. This version uses a different compiler mechanism that doesn't require temporary files as the compilation happens in process. This could be the solution to this particular problem. I cannot guarantee that it'll be done in the next release though, just something that's on my mind.

@Yannik
Copy link
Author

Yannik commented Sep 17, 2024

I am experimenting with remote_tmp now, but I suspect that the AV simply has a look at all new files, no matter which directory they are in.

Seeing that async_dir is set to %USERPROFILE%\.ansible_async, I configured remote_tmp to %USERPROFILE%\.ansible_tmp, kinda expecting the directory to be hidden, which is actually not the case, since windows does not recognize dot-prefixed items to be hidden but requires the hidden attribute. Any reason for still using the dot-prefix on async_dir? Or are you additionally setting the hidden attr on that one?

The remote_tmp dir is actually not even getting deleted after task/playbook execution, is that on purpose?

I have not rolled this out to prod just yet, so I cannot report any results on the effectiveness of fixing the errors.

One area I want to also look into for the next Ansible version if I have time is to officially support PowerShell 7.x. This version uses a different compiler mechanism that doesn't require temporary files as the compilation happens in process. This could be the solution to this particular problem. I cannot guarantee that it'll be done in the next release though, just something that's on my mind.

Sounds interesting to have that option! (Even though I don't see us rolling out powershell 7.x to all servers in the near future)

@jborean93
Copy link
Collaborator

Any reason for still using the dot-prefix on async_dir?

It's to replicate the same behaviour on the Linux side where the dir is ~/.ansible_async and . means hidden there. We are not explicitly setting the hidden attribute.

The remote_tmp dir is actually not even getting deleted after task/playbook execution, is that on purpose?

The actual dir isn't, the value is meant to be a location where each module would create their own temp directory inside it. The default is %TEMP% which means when a temp directory is needed it will be created inside that dir and that will be the one that should be cleaned up.

@Yannik
Copy link
Author

Yannik commented Oct 2, 2024

@jborean93 Unfortunately changing the temp directory did not help with this issue.

If you would consider adding a retry mechanism for this, I would greatly appreciate it.

@vlenoci
Copy link

vlenoci commented Dec 14, 2024

Hello there, I'm not sure if it's the same case or could be related, but I have similar behavior 1 each approx. 100 executions:

exception: "Failed to cleanup temporary directory 'C:\Users\ansible\AppData\Local\Temp\777468be-7403-43a1-8ad6-03939f2e943f' used for compiling C# code. Error: Exception calling "Delete" with "2" argument(s): "Access to the path 'CSC5AD4D805CC1142D293354511B46DB19D.TMP' is denied."

I don't think it's related to async because It happens also on tasks without async option. (probably is more common on async tasks where there are more activities on temp folder).
Probably same story for powershell remote_temp folder. I experienced the same error using ansible.windows.win_file task.
I tried to solve the issue using ansible_remote_tmp parameter, by changing the folder to a different one, but it didn't help.
I've investigated too if AV can interfere in such process, but I didn't found any related issue on AV log.

ANSIBLE VERSION

ansible [core 2.18.0]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.12/dist-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.12.8 (main, Dec  4 2024, 08:54:12) [GCC 11.4.0] (/usr/bin/python3)
  jinja version = 3.1.4
  libyaml = True

@Yannik
Copy link
Author

Yannik commented Dec 18, 2024

@jborean93 Any chance to get a retry mechanism added here? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants