Skip to content
This repository has been archived by the owner on Sep 16, 2022. It is now read-only.

Investigate CPU usage spike on Ubuntu #293

Open
vpetersson opened this issue Mar 31, 2020 · 6 comments
Open

Investigate CPU usage spike on Ubuntu #293

vpetersson opened this issue Mar 31, 2020 · 6 comments

Comments

@vpetersson
Copy link
Contributor

I'm having a hard time installing/upgrading the agent on Ubuntu 18.04

Both the installation (tried purging) and upgrade hangs at 100% CPU usage from the agent:

Screen Shot 2020-03-31 at 12 27 31 PM

@a-martynovich
Copy link
Contributor

Hangs forever or for how long?

@vpetersson
Copy link
Contributor Author

I left it for a good 10 minutes before sending a SIGKILL.

@a-martynovich
Copy link
Contributor

a-martynovich commented Apr 1, 2020

@vpetersson Did it happen once or does it always happen?
Also, logs? The stack trace should show up in systemd logs after you kill the service.

@vpetersson
Copy link
Contributor Author

vpetersson commented Apr 1, 2020

Did it happen once or does it always happen?

Yes, I've been able to reproduce it a number of times on this box.

Here's where it freezes:

Setting up wott-agent (0.1.5.820~f64d919) ...
Created symlink /etc/systemd/system/timers.target.wants/wott-agent-self-update.timer → /lib/systemd/system/wott-agent-self-update.timer.
Created symlink /etc/systemd/system/wott-agent → /lib/systemd/system/wott-agent.service.
Created symlink /etc/systemd/system/multi-user.target.wants/wott-agent.service → /lib/systemd/system/wott-agent.service.
wott-agent-self-update.service is a disabled or a static unit, not starting it.

Logs:

Apr  1 05:52:36 us systemd[1]: Started WoTT Agent.
Apr  1 05:52:38 us wott-agent[27065]: 2020-04-01 05:52:38,053 - agent - MainThread - INFO - start in daemon mode...
Apr  1 05:52:38 us wott-agent[27065]: 2020-04-01 05:52:38,084 - agent - ThreadPoolExecutor-0_2 - INFO - Fetching node metadata...
Apr  1 05:52:38 us wott-agent[27065]: 2020-04-01 05:52:38,087 - agent - ThreadPoolExecutor-0_1 - INFO - Fetching credentials...
Apr  1 05:52:38 us wott-agent[27065]: 2020-04-01 05:52:38,366 - agent - ThreadPoolExecutor-0_1 - INFO - Credentials retrieved.
Apr  1 05:52:38 us wott-agent[27065]: 2020-04-01 05:52:38,379 - agent - ThreadPoolExecutor-0_2 - INFO - metadata retrieved.
Apr  1 05:52:38 us wott-agent[27065]: 2020-04-01 05:52:38,381 - agent - ThreadPoolExecutor-0_2 - INFO - metadata stored.
Apr  1 06:00:03 us systemd[1]: wott-agent.service: Service hold-off time over, scheduling restart.
Apr  1 06:00:03 us systemd[1]: wott-agent.service: Scheduled restart job, restart counter is at 1.
Apr  1 06:00:03 us systemd[1]: Stopped WoTT Agent.
Apr  1 06:00:03 us systemd[1]: Started WoTT Agent.
Apr  1 06:00:04 us wott-agent[27211]: 2020-04-01 06:00:04,481 - agent - MainThread - INFO - start in daemon mode...
Apr  1 06:00:04 us wott-agent[27211]: 2020-04-01 06:00:04,521 - agent - ThreadPoolExecutor-0_2 - INFO - Fetching node metadata...
Apr  1 06:00:04 us wott-agent[27211]: 2020-04-01 06:00:04,514 - agent - ThreadPoolExecutor-0_1 - INFO - Fetching credentials...
Apr  1 06:00:04 us wott-agent[27211]: 2020-04-01 06:00:04,782 - agent - ThreadPoolExecutor-0_1 - INFO - Credentials retrieved.
Apr  1 06:00:04 us wott-agent[27211]: 2020-04-01 06:00:04,833 - agent - ThreadPoolExecutor-0_2 - INFO - metadata retrieved.
Apr  1 06:00:04 us wott-agent[27211]: 2020-04-01 06:00:04,834 - agent - ThreadPoolExecutor-0_2 - INFO - metadata stored.

@a-martynovich
Copy link
Contributor

a-martynovich commented Apr 1, 2020

Agent spins in the patched version of find_matches. This happens when there're some firewall rules applied on the backend.
If I remove the patching everything works just fine. This proves I was right in #289 (comment). Now we just need two things:

  1. make sure we know exactly since which version of iptables we need to patch
  2. detect iptables version and decide what to do when. We may need to revert the patch if running python-iptables > 0.14.0 on iptables <= 1.6.2 because this will be in mainstream python-iptables since 0.15.0.

I will also file an issue on https://gtihub.com/ldx/python-iptables/issues and see what they have to say.

@a-martynovich
Copy link
Contributor

a-martynovich commented Apr 1, 2020

The commit which causes our error was added after the release of 1.6.2. Next release is 1.8.0.
Therefore we need to patch anything newer than 1.6.2.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants