-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[metricbeat/system][windows] - Metricbeat reports DEGRADED while running in privileged mode #40484
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
What data stream was this observed for? Are their logs you can attach to the issue? |
@cmacknz the errors reported are similar to #40542 (comment) - id: system/metrics-default
state:
message: 'Healthy: communicating with pid ''1556'''
pid: 0
state: 2
units:
input-system/metrics-default-system/metrics-system-5f5e65eb-2fd6-41e1-8c29-f24d57e66509:
state: DEGRADED
message: |-
Error fetching data for metricset system.process_summary: Not enough privileges to fetch information: Not enough privileges to fetch information: GetInfoForPid: could not get all information for PID 0: error fetching name: OpenProcess failed for pid=0: The parameter is incorrect.
error fetching status: OpenProcess failed for pid=0: The parameter is incorrect.
GetInfoForPid: could not get all information for PID 4: error fetching name: GetProcessImageFileName failed for pid=4: GetProcessImageFileName failed: invalid argument
payload:
streams:
system/metrics-system.process.summary-5f5e65eb-2fd6-41e1-8c29-f24d57e66509:
error: |-
Error fetching data for metricset system.process_summary: Not enough privileges to fetch information: Not enough privileges to fetch information: GetInfoForPid: could not get all information for PID 0: error fetching name: OpenProcess failed for pid=0: The parameter is incorrect.
error fetching status: OpenProcess failed for pid=0: The parameter is incorrect.
GetInfoForPid: could not get all information for PID 4: error fetching name: GetProcessImageFileName failed for pid=4: GetProcessImageFileName failed: invalid argument
status: DEGRADED There's a coincidence. All the these PIDs refer to the SYSTEM processes. This seems to be related with #17314 |
@pierrehilbert @cmacknz I'll raise a POC PR for this and see how can we fix this, permanently. Last week, I considered a solution and here’s what I came up with: Accessing PID 0 and PID 4 on Windows is unnecessary, as these are protected processes.
cc: @elastic/elastic-agent-data-plane |
@VihasMakwana I think this makes sense. I only have a comment about this:
In this case we need to make sure we have a debug-level log that we failed to fetch additional metrics from a protected process. Error logs would inevitably flood the logs and it would not be a good user experience. If this behavior is expected, it should not be considered an error. |
Yes. We should log it at a debug level. |
Unnecessary isn't exactly how I'd phrase it, our endpoint-security service runs as a protected process on Windows, and we currently support collecting a limited set of important metrics for it. It particular we can get the CPU and memory usage for inclusion into the calculation of the agent's CPU and memory usage on the host. This support was added in elastic/elastic-agent-system-metrics#104. This introduced the concept of a non-fatal error originally, which might help you implement the logic you propose. Can we get the CPU and memory usage for these PIDs as well, like we can for endpoint?
👍 this overall makes sense to me. As mentioned above, make sure you test this logic against endpoint-security on Windows as it is a protected service that will hit this case. This potentially gives you a way to write automated tests as well since we have integration tests that install endpoint on Windows. |
I'll confirm and get back to you. |
@cmacknz Here's a summary after running countless tests on my windows machine:
|
That's an awesome summary, thank you! I'd suggest adding this summary to the system module permissions section (in a new Windows sub-section) https://github.com/elastic/beats/blob/main/metricbeat/docs/modules/system.asciidoc#required-permissions. Adding that same section to the system integration is also a good idea https://github.com/elastic/integrations/tree/main/packages/system |
@VihasMakwana great investigation. This looks consistent with what users would see with Powershell, ProcessExplorer and TaskManager (see below), which is where I think we need to be. PowerShell
|
- Enhancement We can ignore the error in two cases: - While reading the process executable name. - For pid 4, this call fails as we can't access the executable name via the system call. Same for other kernel-level processes. - While finding the owner for a particular process. - We try to open the process token via `syscall.OpenProcessToken`and we can't access the token for protected processes , even as an administrator. it's okay to ignore these errors and move forward as we can access few other metrics (memory, cpu). More context [here](elastic/beats#40484 (comment)) Relates elastic/beats#40484
Access is denied
errors, which results in DEGRADED mode.The text was updated successfully, but these errors were encountered: