Metricbeat: test_start_stop failure on Win 10 #37841

sharbuz · 2024-02-02T14:46:41Z

Flaky Test

Test Name: tests\system\test_reload.py::Test::test_start_stop

Link:

beats/metricbeat/tests/system/test_reload.py

Lines 46 to 90 in 5e5f5d4

    
               @unittest.skipUnless(re.match("(?i)linux|darwin|freebsd|openbsd", sys.platform), "os") 
        
               def test_start_stop(self): 
        
                   """ 
        
                   Test if module is properly started and stopped 
        
                   """ 
        
                   self.render_config_template( 
        
                       reload=True, 
        
                       reload_path=self.working_dir + "/configs/*.yml", 
        
                       flush_min_events=1, 
        
                   ) 
        
                   os.mkdir(self.working_dir + "/configs/") 
        
                   config_path = self.working_dir + "/configs/system.yml" 
        
                   proc = self.start_beat() 
        
                   # Ensure no modules are loaded 
        
                   self.wait_until( 
        
                       lambda: self.log_contains("Start list: 0, Stop list: 0"), 
        
                       max_timeout=10) 
        
                   systemConfig = """ 
        
           - module: system 
        
             metricsets: ["cpu"] 
        
             period: 1s 
        
           """ 
        
                   with open(config_path, 'w') as f: 
        
                       f.write(systemConfig) 
        
                   # Ensure the module is started 
        
                   self.wait_until( 
        
                       lambda: self.log_contains("Start list: 1, Stop list: 0"), 
        
                       max_timeout=10) 
        
                   # Remove config again 
        
                   os.remove(config_path) 
        
                   # Ensure the module is stopped 
        
                   self.wait_until( 
        
                       lambda: self.log_contains("Start list: 0, Stop list: 1"), 
        
                       max_timeout=10) 
        
                   time.sleep(1) 
        
                   proc.check_kill_and_wait()

Branch: main (related PR: Enable tests that were muted during migration #40387)
Artifact Link: beats-metricbeat_build_8357_windows-metricbeat-win-10-unit-tests.log
Buildkite build: https://buildkite.com/elastic/beats-metricbeat/builds/8357#01914798-a4da-4530-87c5-b427cc93b988
Notes: Additional details about the test. e.g. theory as to failure cause
Test was muted on main during migration. To launch it on CI please see changes to apply in related PR mentioned above.

Stack Trace

================================== FAILURES ===================================
____________________________ Test.test_start_stop _____________________________
self = <test_reload.Test testMethod=test_start_stop>
    @unittest.skipUnless(re.match("(?i)win|linux|darwin|freebsd|openbsd", sys.platform), "os")
    def test_start_stop(self):
        """
        Test if module is properly started and stopped
        """
        self.render_config_template(
            reload=True,
            reload_path=self.working_dir + "/configs/*.yml",
            flush_min_events=1,
        )
        os.mkdir(self.working_dir + "/configs/")
        config_path = self.working_dir + "/configs/system.yml"
        proc = self.start_beat()
        # Ensure no modules are loaded
>       self.wait_until(
            lambda: self.log_contains("Start list: 0, Stop list: 0"),
            max_timeout=10)
tests\system\test_reload.py:61:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <test_reload.Test testMethod=test_start_stop>
cond = <function Test.test_start_stop.<locals>.<lambda> at 0x0000011B4ACCFCA0>
max_timeout = 10, poll_interval = 0.1, name = 'cond', err_msg = ''
    def wait_until(self, cond, max_timeout=20, poll_interval=0.1, name="cond", err_msg=""):
        """
        TODO: this can probably be a "wait_until_output_count", among other things, since that could actually use `self`, and this can become an internal function
        Waits until the cond function returns true,
        or until the max_timeout is reached. Calls the cond
        function every poll_interval seconds.
        If the max_timeout is reached before cond() returns
        true, an exception is raised.
        """
        start = datetime.now()
        while not cond():
            if datetime.now() - start > timedelta(seconds=max_timeout):
                print("Test has failed, here are the Beat logs")
                for l in self.get_log_lines():
                    print(l)
>               raise WaitTimeoutError(
                    f"Timeout waiting for condition '{name}'. Waited {max_timeout} seconds: {err_msg}")
E               beat.beat.WaitTimeoutError: Timeout waiting for condition 'cond'. Waited 10 seconds:
..\libbeat\tests\system\beat\beat.py:449: WaitTimeoutError

The text was updated successfully, but these errors were encountered:

sharbuz · 2024-02-02T15:12:02Z

CC @pazone

elasticmachine · 2024-08-13T20:40:57Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

rowlandgeoff · 2024-08-13T21:15:46Z

During the migration of beats-ci from Jenkins to Buildkite, a number of tests were failing consistently due to issues unrelated to the migration. Those tests were disabled to stabilize the CI, with the intent to revisit them post-migration. @oakrizan has reviewed them all in her draft PRs linked above in the description, and has opened tickets such as this one to highlight to the product teams the tests that are currently still disabled and could use some attention.

sharbuz added Metricbeat Metricbeat failed-test indicates a failed automation test relates labels Feb 2, 2024

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 2, 2024

sharbuz mentioned this issue Feb 2, 2024

migrate metricbeat pipeline to Buildkite #37592

Merged

sharbuz mentioned this issue Feb 6, 2024

temporary disable the failed windows test #37880

Merged

oakrizan closed this as completed May 29, 2024

oakrizan reopened this Aug 13, 2024

oakrizan changed the title ~~Fail of Windows 10 Unit tests~~ Metricbeat: test_start_stop failure on Win 10 Aug 13, 2024

ycombinator added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Aug 13, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metricbeat: test_start_stop failure on Win 10 #37841

Metricbeat: test_start_stop failure on Win 10 #37841

sharbuz commented Feb 2, 2024 •

edited by oakrizan

Loading

sharbuz commented Feb 2, 2024

elasticmachine commented Aug 13, 2024

rowlandgeoff commented Aug 13, 2024

Metricbeat: test_start_stop failure on Win 10 #37841

Metricbeat: test_start_stop failure on Win 10 #37841

Comments

sharbuz commented Feb 2, 2024 • edited by oakrizan Loading

Flaky Test

Stack Trace

sharbuz commented Feb 2, 2024

elasticmachine commented Aug 13, 2024

rowlandgeoff commented Aug 13, 2024

sharbuz commented Feb 2, 2024 •

edited by oakrizan

Loading