Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make WindowsServiceLifetime gracefully stop #83892

Merged
merged 11 commits into from
Apr 6, 2023

Conversation

ericstj
Copy link
Member

@ericstj ericstj commented Mar 24, 2023

WindowsServiceLifetime was not waiting for ServiceBase to stop the service. As a result we would sometimes end the process before notifying service control manager that the service had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through public API. We must not call SetServiceStatus again once the service is marked as stopped.

Fixes: #62579

WindowsServiceLifetime was not waiting for ServiceBase to stop the service.  As a result
we would sometimes end the process before notifying service control manager that the service
had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through
public API.  We must not call SetServiceStatus again once the service is marked as stopped.
@ghost
Copy link

ghost commented Mar 24, 2023

Tagging subscribers to this area: @dotnet/area-system-serviceprocess
See info in area-owners.md if you want to be subscribed.

Issue Details

WindowsServiceLifetime was not waiting for ServiceBase to stop the service. As a result we would sometimes end the process before notifying service control manager that the service had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through public API. We must not call SetServiceStatus again once the service is marked as stopped.

Fixes: #62579

Author: ericstj
Assignees: -
Labels:

area-System.ServiceProcess

Milestone: -

@ericstj ericstj requested a review from a team March 24, 2023 17:17
Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any kind of test that can be added here?

@ericstj
Copy link
Member Author

ericstj commented Mar 24, 2023

Is there any kind of test that can be added here?

Yeah but needs some work as we need a service project. Plan to do so before moving out of draft.

I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will
not be present on .NETFramework.  Workaround that by avoiding calling
ServiceBase.Stop when the service has already been stopped by SCM.
These tests leverage RemoteExecutor to avoid creating a separate service
assembly.
This better integrates with the RemoteExecutor component as well,
by hooking up the service process and fetching its handle.

This gives us the correct logging and exitcode handling from
RemoteExecutor.
@ericstj
Copy link
Member Author

ericstj commented Mar 31, 2023

These tests are failing in CI but passing locally:
https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-83892-merge-4425fb85bd7d4689a3/Microsoft.Extensions.Hosting.WindowsServices.Tests/1/console.f3be89b1.log?helixlogtype=result
I pulled the test payload and it's missing bindingRedirects for System.Runtime.CompilerServices.Unsafe 😕 I have those when building locally - emitted by RAR. Might need to pull the binlogs from the test build to see what's going wrong on the build machine.

@ericstj
Copy link
Member Author

ericstj commented Mar 31, 2023

So the netfx tests are failing because they aren't getting bindingRedirects in AzDo build. Binlog shows the following parameters are missing from call to RAR:

-    AppConfigFile = C:\src\dotnet\runtime\eng\testing\netfx.exe.config
-    SupportsBindingRedirectGeneration = True

Seems significant.

@ericstj
Copy link
Member Author

ericstj commented Mar 31, 2023

So it looks like this difference is due to the following.

In CI builds we don't reference Microsoft.NET.Test.Sdk:

<PackageReference Include="Microsoft.NET.Test.Sdk" Version="$(MicrosoftNETTestSdkVersion)" />

The Microsoft.NET.Test.Sdk sets GenerateBindingRedirectsOutputType to true.
https://github.com/microsoft/vstest/blob/681849f4dced5de25b330cf9d20fe643de9e6924/src/package/Microsoft.NET.Test.Sdk/netfx/Microsoft.NET.Test.Sdk.targets#L22

Arcade sets the property, but does so after RAR, so it doesn't help (I have no idea why it's done after RAR):
https://github.com/dotnet/arcade/blob/747f53d751983dd062f39f4654bff074536e0012/src/Microsoft.DotNet.Arcade.Sdk/tools/Workarounds.targets#L6

So nothing is telling RAR to find and generate bindingRedirects in our test projects.

So that raises the question -- how are these loads succeeding for our test projects in helix today? The answer is they aren't, initially:

LOG: Assembly download was successful. Attempting setup of file: C:\src\dotnet\runtime\artifacts\bin\Microsoft.Extensions.Hosting.WindowsServices.Tests\Debug\net462\System.Runtime.CompilerServices.Unsafe.dll
LOG: Entering run-from-source setup phase.
LOG: Assembly Name is: System.Runtime.CompilerServices.Unsafe, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
WRN: Comparing the assembly name resulted in the mismatch: Major Version
ERR: The assembly reference did not match the assembly definition found.

What happens is Xunit has installed an AssemblyResolve handler that probes the test directory and ignores assembly version completely:
https://github.com/xunit/xunit/blob/1892224c60f8499b42ee1d8a521d65995ef32aea/src/common/AssemblyResolution/AssemblyHelper_Desktop.cs#L53

So this will break anything in remote-executor scenarios that requires bindingRedirects, since they won't be using that runner to set up the AssemblyResolve handler -- they need the bindingRedirects to be correct.

Copy link
Member

@buyaa-n buyaa-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the broad testing!

@ericstj
Copy link
Member Author

ericstj commented Apr 6, 2023

JIT test failure is #84398.

@ericstj ericstj merged commit f5895b3 into dotnet:main Apr 6, 2023
@ericstj
Copy link
Member Author

ericstj commented Apr 6, 2023

/backport to release/7.0-staging

@ericstj
Copy link
Member Author

ericstj commented Apr 6, 2023

/backport to release/6.0-staging

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

Started backporting to release/7.0-staging: https://github.com/dotnet/runtime/actions/runs/4625519941

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

Started backporting to release/6.0-staging: https://github.com/dotnet/runtime/actions/runs/4625520553

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

@ericstj backporting to release/7.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Make WindowsServiceLifetime gracefully stop
Applying: Alternate approach to ensuring we only ever set STATE_STOPPED once.
Applying: Avoid calling ServiceBase.Stop on stopped service
error: sha1 information is lacking or useless (src/libraries/Microsoft.Extensions.Hosting.WindowsServices/src/WindowsServiceLifetime.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0003 Avoid calling ServiceBase.Stop on stopped service
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

@ericstj an error occurred while backporting to release/7.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

@ericstj backporting to release/6.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Make WindowsServiceLifetime gracefully stop
Using index info to reconstruct a base tree...
M	src/libraries/Microsoft.Extensions.Hosting.WindowsServices/src/WindowsServiceLifetime.cs
M	src/libraries/System.ServiceProcess.ServiceController/src/System/ServiceProcess/ServiceBase.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.ServiceProcess.ServiceController/src/System/ServiceProcess/ServiceBase.cs
Auto-merging src/libraries/Microsoft.Extensions.Hosting.WindowsServices/src/WindowsServiceLifetime.cs
CONFLICT (content): Merge conflict in src/libraries/Microsoft.Extensions.Hosting.WindowsServices/src/WindowsServiceLifetime.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Make WindowsServiceLifetime gracefully stop
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

@ericstj an error occurred while backporting to release/6.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

ericstj added a commit to ericstj/runtime that referenced this pull request May 2, 2023
* Make WindowsServiceLifetime gracefully stop

WindowsServiceLifetime was not waiting for ServiceBase to stop the service.  As a result
we would sometimes end the process before notifying service control manager that the service
had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through
public API.  We must not call SetServiceStatus again once the service is marked as stopped.

* Alternate approach to ensuring we only ever set STATE_STOPPED once.

* Avoid calling ServiceBase.Stop on stopped service

I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will
not be present on .NETFramework.  Workaround that by avoiding calling
ServiceBase.Stop when the service has already been stopped by SCM.

* Add tests for WindowsServiceLifetime

These tests leverage RemoteExecutor to avoid creating a separate service
assembly.

* Respond to feedback and add more tests.

This better integrates with the RemoteExecutor component as well,
by hooking up the service process and fetching its handle.

This gives us the correct logging and exitcode handling from
RemoteExecutor.

* Honor Cancellation in StopAsync

* Fix bindingRedirects in RemoteExecutor

* Use Async lambdas for service testing

* Fix issue on Win7 where duplicate service descriptions are disallowed

* Respond to feedback

* Fix comment and add timeout
ericstj added a commit to ericstj/runtime that referenced this pull request May 2, 2023
* Make WindowsServiceLifetime gracefully stop

WindowsServiceLifetime was not waiting for ServiceBase to stop the service.  As a result
we would sometimes end the process before notifying service control manager that the service
had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through
public API.  We must not call SetServiceStatus again once the service is marked as stopped.

* Alternate approach to ensuring we only ever set STATE_STOPPED once.

* Avoid calling ServiceBase.Stop on stopped service

I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will
not be present on .NETFramework.  Workaround that by avoiding calling
ServiceBase.Stop when the service has already been stopped by SCM.

* Add tests for WindowsServiceLifetime

These tests leverage RemoteExecutor to avoid creating a separate service
assembly.

* Respond to feedback and add more tests.

This better integrates with the RemoteExecutor component as well,
by hooking up the service process and fetching its handle.

This gives us the correct logging and exitcode handling from
RemoteExecutor.

* Honor Cancellation in StopAsync

* Fix bindingRedirects in RemoteExecutor

* Use Async lambdas for service testing

* Fix issue on Win7 where duplicate service descriptions are disallowed

* Respond to feedback

* Fix comment and add timeout
ericstj added a commit that referenced this pull request May 4, 2023
…5661)

* Make WindowsServiceLifetime gracefully stop (#83892)

* Make WindowsServiceLifetime gracefully stop

WindowsServiceLifetime was not waiting for ServiceBase to stop the service.  As a result
we would sometimes end the process before notifying service control manager that the service
had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through
public API.  We must not call SetServiceStatus again once the service is marked as stopped.

* Alternate approach to ensuring we only ever set STATE_STOPPED once.

* Avoid calling ServiceBase.Stop on stopped service

I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will
not be present on .NETFramework.  Workaround that by avoiding calling
ServiceBase.Stop when the service has already been stopped by SCM.

* Add tests for WindowsServiceLifetime

These tests leverage RemoteExecutor to avoid creating a separate service
assembly.

* Respond to feedback and add more tests.

This better integrates with the RemoteExecutor component as well,
by hooking up the service process and fetching its handle.

This gives us the correct logging and exitcode handling from
RemoteExecutor.

* Honor Cancellation in StopAsync

* Fix bindingRedirects in RemoteExecutor

* Use Async lambdas for service testing

* Fix issue on Win7 where duplicate service descriptions are disallowed

* Respond to feedback

* Fix comment and add timeout

* Fix test condition

# Conflicts:
#	src/libraries/Microsoft.Extensions.Hosting.WindowsServices/tests/UseWindowsServiceTests.cs

* Enable M.E.H.WindowsServices and S.SP.ServiceController for servicing

* Make service wait on its state before stopping (#84447)

* Fix WindowsService Tests where RemoteExecutor is unsupported

# Conflicts:
#	src/libraries/Microsoft.Extensions.Hosting.WindowsServices/tests/UseWindowsServiceTests.cs

* Port changes to 6.0 codebase

* Version Microsoft.Windows.Compatibility

---------

Co-authored-by: Vladimir Sadov <[email protected]>
ericstj added a commit that referenced this pull request May 4, 2023
…5656)

* Make WindowsServiceLifetime gracefully stop (#83892)

* Make WindowsServiceLifetime gracefully stop

WindowsServiceLifetime was not waiting for ServiceBase to stop the service.  As a result
we would sometimes end the process before notifying service control manager that the service
had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through
public API.  We must not call SetServiceStatus again once the service is marked as stopped.

* Alternate approach to ensuring we only ever set STATE_STOPPED once.

* Avoid calling ServiceBase.Stop on stopped service

I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will
not be present on .NETFramework.  Workaround that by avoiding calling
ServiceBase.Stop when the service has already been stopped by SCM.

* Add tests for WindowsServiceLifetime

These tests leverage RemoteExecutor to avoid creating a separate service
assembly.

* Respond to feedback and add more tests.

This better integrates with the RemoteExecutor component as well,
by hooking up the service process and fetching its handle.

This gives us the correct logging and exitcode handling from
RemoteExecutor.

* Honor Cancellation in StopAsync

* Fix bindingRedirects in RemoteExecutor

* Use Async lambdas for service testing

* Fix issue on Win7 where duplicate service descriptions are disallowed

* Respond to feedback

* Fix comment and add timeout

* Fix test condition

* Enable M.E.H.WindowsServices and S.SP.ServiceController for servicing

* Make service wait on its state before stopping (#84447)

* Fix WindowsService Tests where RemoteExecutor is unsupported

* Enable MS.W.C for servicing

* Reference latest Microsoft.Extensions.Logging.Abstractions

This package has been serviced and we compile against the serviced
version of its assemblies.

None of the directly referenced projects have been serviced so our
package doesn't restore the serviced versions.

Lift up the dependency on Logging.Abstractions to ensure we reference
the serviced package.

---------

Co-authored-by: Vladimir Sadov <[email protected]>
@ghost ghost locked as resolved and limited conversation to collaborators May 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows service infrequently results in error 1067 on stop operation
3 participants