Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat: test_start_stop failure on Win 10 #37841

Open
sharbuz opened this issue Feb 2, 2024 · 3 comments
Open

Metricbeat: test_start_stop failure on Win 10 #37841

sharbuz opened this issue Feb 2, 2024 · 3 comments
Labels
failed-test indicates a failed automation test relates Metricbeat Metricbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@sharbuz
Copy link
Contributor

sharbuz commented Feb 2, 2024

Flaky Test

  • Test Name: tests\system\test_reload.py::Test::test_start_stop

  • Link:

    @unittest.skipUnless(re.match("(?i)linux|darwin|freebsd|openbsd", sys.platform), "os")
    def test_start_stop(self):
    """
    Test if module is properly started and stopped
    """
    self.render_config_template(
    reload=True,
    reload_path=self.working_dir + "/configs/*.yml",
    flush_min_events=1,
    )
    os.mkdir(self.working_dir + "/configs/")
    config_path = self.working_dir + "/configs/system.yml"
    proc = self.start_beat()
    # Ensure no modules are loaded
    self.wait_until(
    lambda: self.log_contains("Start list: 0, Stop list: 0"),
    max_timeout=10)
    systemConfig = """
    - module: system
    metricsets: ["cpu"]
    period: 1s
    """
    with open(config_path, 'w') as f:
    f.write(systemConfig)
    # Ensure the module is started
    self.wait_until(
    lambda: self.log_contains("Start list: 1, Stop list: 0"),
    max_timeout=10)
    # Remove config again
    os.remove(config_path)
    # Ensure the module is stopped
    self.wait_until(
    lambda: self.log_contains("Start list: 0, Stop list: 1"),
    max_timeout=10)
    time.sleep(1)
    proc.check_kill_and_wait()

  • Branch: main (related PR: Enable tests that were muted during migration #40387)

  • Artifact Link: beats-metricbeat_build_8357_windows-metricbeat-win-10-unit-tests.log
    Buildkite build: https://buildkite.com/elastic/beats-metricbeat/builds/8357#01914798-a4da-4530-87c5-b427cc93b988

  • Notes: Additional details about the test. e.g. theory as to failure cause
    Test was muted on main during migration. To launch it on CI please see changes to apply in related PR mentioned above.

Stack Trace

================================== FAILURES ===================================
____________________________ Test.test_start_stop _____________________________
self = <test_reload.Test testMethod=test_start_stop>
    @unittest.skipUnless(re.match("(?i)win|linux|darwin|freebsd|openbsd", sys.platform), "os")
    def test_start_stop(self):
        """
        Test if module is properly started and stopped
        """
        self.render_config_template(
            reload=True,
            reload_path=self.working_dir + "/configs/*.yml",
            flush_min_events=1,
        )
        os.mkdir(self.working_dir + "/configs/")
        config_path = self.working_dir + "/configs/system.yml"
        proc = self.start_beat()
        # Ensure no modules are loaded
>       self.wait_until(
            lambda: self.log_contains("Start list: 0, Stop list: 0"),
            max_timeout=10)
tests\system\test_reload.py:61:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <test_reload.Test testMethod=test_start_stop>
cond = <function Test.test_start_stop.<locals>.<lambda> at 0x0000011B4ACCFCA0>
max_timeout = 10, poll_interval = 0.1, name = 'cond', err_msg = ''
    def wait_until(self, cond, max_timeout=20, poll_interval=0.1, name="cond", err_msg=""):
        """
        TODO: this can probably be a "wait_until_output_count", among other things, since that could actually use `self`, and this can become an internal function
        Waits until the cond function returns true,
        or until the max_timeout is reached. Calls the cond
        function every poll_interval seconds.
        If the max_timeout is reached before cond() returns
        true, an exception is raised.
        """
        start = datetime.now()
        while not cond():
            if datetime.now() - start > timedelta(seconds=max_timeout):
                print("Test has failed, here are the Beat logs")
                for l in self.get_log_lines():
                    print(l)
>               raise WaitTimeoutError(
                    f"Timeout waiting for condition '{name}'. Waited {max_timeout} seconds: {err_msg}")
E               beat.beat.WaitTimeoutError: Timeout waiting for condition 'cond'. Waited 10 seconds:
..\libbeat\tests\system\beat\beat.py:449: WaitTimeoutError
@sharbuz sharbuz added Metricbeat Metricbeat failed-test indicates a failed automation test relates labels Feb 2, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 2, 2024
@sharbuz
Copy link
Contributor Author

sharbuz commented Feb 2, 2024

CC @pazone

@oakrizan oakrizan reopened this Aug 13, 2024
@oakrizan oakrizan changed the title Fail of Windows 10 Unit tests Metricbeat: test_start_stop failure on Win 10 Aug 13, 2024
@ycombinator ycombinator added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Aug 13, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 13, 2024
@rowlandgeoff
Copy link
Contributor

During the migration of beats-ci from Jenkins to Buildkite, a number of tests were failing consistently due to issues unrelated to the migration. Those tests were disabled to stabilize the CI, with the intent to revisit them post-migration. @oakrizan has reviewed them all in her draft PRs linked above in the description, and has opened tickets such as this one to highlight to the product teams the tests that are currently still disabled and could use some attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
failed-test indicates a failed automation test relates Metricbeat Metricbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

5 participants