Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent SEGFAULTs on consecutive exec_command() invocations #658

Open
wants to merge 12 commits into
base: devel
Choose a base branch
from

Conversation

Jakuje
Copy link
Contributor

@Jakuje Jakuje commented Nov 14, 2024

SUMMARY

The function exec_command() keeps the callbacks as a local variable before assigning them to the created channel. The channel is not guaranteed to be completely freed when ssh_channel_free() is called because there might be some leftover messages or responses to process (close confirmation, exit code ...).

Calling the exec_command() as done previously in the test from the same function without anything in between (except assert) will likely map the second function call to the same memory on the call stack so it was working most of the time. But calling it from different functions or contexts will likely change the call stack and processing of outstanding callbacks is more likely to result in addressing wrong memory location.

Likely fixes #57, #645 and #657

ISSUE TYPE
  • Bugfix Pull Request
ADDITIONAL INFORMATION

I was not able to reproduce the issue locally so pushing to see if the CI will be able to crash.

This is also introducing memory leaks as the callback structure is never freed. We should probably store it somewhere in the python code before returning to make sure it is not garbage collected (or can the python GC track the callback pointer is still stored on the libssh side?).

@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Nov 14, 2024
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/ansible-pylibssh-658
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@Jakuje Jakuje force-pushed the crash-exec branch 3 times, most recently from 1de5bc0 to 251bb6d Compare November 19, 2024 10:12
@Jakuje Jakuje marked this pull request as ready for review November 19, 2024 10:12
@webknjaz webknjaz changed the title Prevent crash on consecutive exec_command() Prevent SEGFAULTs on consecutive exec_command() invocations Nov 19, 2024
@webknjaz
Copy link
Member

@Jakuje it looks like this is making some CI jobs get stuck: https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210235668?pr=658 / https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210009382?pr=658.
(I cancelled the 3.9 one after 39 minutes, and the 3.10 one was killed by xdist after a 30-second timeout)

@webknjaz
Copy link
Member

I restarted said jobs, but this is something to look into, as it'll probably make the CI flakier if merged.

@webknjaz
Copy link
Member

@webknjaz
Copy link
Member

OTOH, it's also unstable on devel: https://github.com/ansible/pylibssh/actions/runs/11917417576/job/33212843276

@Jakuje Jakuje force-pushed the crash-exec branch 4 times, most recently from 9bc197c to 9a6e7d6 Compare November 20, 2024 14:05
@webknjaz
Copy link
Member

So the failure @ https://github.com/ansible/pylibssh/actions/runs/11935006968/job/33265619653?pr=658#step:15:181 is referring to https://github.com/ansible/pylibssh/blob/6cbf2dd/src/pylibsshext/channel.pyx#L45.
And for some reason, the result variable in that context seems to contain an instance of a pytest-internal object — ReprTraceback (https://github.com/pytest-dev/pytest/blob/72f17d1/src/_pytest/_code/code.py#L1156-L1180).

This is rather bizzare. How did it get in there?

============================= test session starts ==============================
platform linux -- Python 3.8.18, pytest-8.3.3, pluggy-1.5.0
cachedir: .tox/test-source-dists/.pytest_cache
rootdir: /home/runner/work/pylibssh/pylibssh
configfile: pytest.ini
testpaths: tests/
plugins: cov-5.0.0, forked-1.6.0, xdist-3.6.1
created: 4/4 workers
4 workers [25 items]

Timeout (0:00:30)!
Thread 0x00007fd72445b700 (most recent call first):
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 534 in read
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 567 in from_io
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1160 in _thread_receiver
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 341 in run
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn

Thread 0x00007fd7255ff740 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 1780 in _try_wait
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 1822 in _wait
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 1083 in wait
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 342 in call
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 359 in check_call
  File "/home/runner/work/pylibssh/pylibssh/tests/conftest.py", line 103 in ssh_clientkey_path
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 898 in call_fixture_func
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1140 in pytest_fixture_setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1091 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 532 in getfixturevalue
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 697 in _fillfixtures
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/python.py", line 1630 in setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 514 in setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 160 in pytest_runtest_setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/xdist/remote.py", line 195 in run_one_test
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/xdist/remote.py", line 174 in pytest_runtestloop
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/main.py", line 337 in _main
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/xdist/remote.py", line 393 in <module>
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1291 in executetask
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 341 in run
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 389 in integrate_as_primary_thread
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1273 in serve
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1806 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
..FE......................                                               [100%]
==================================== ERRORS ====================================
____________________ ERROR at teardown of test_exec_command ____________________
[gw0] linux -- Python 3.8.18 /home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/bin/python

>   result.stdout += data_b
E   AttributeError: 'ReprTraceback' object has no attribute 'stdout'

BytesIO    = <class '_io.BytesIO'>
Channel    = <class 'pylibsshext.channel.Channel'>
ChannelCallback = <class 'pylibsshext.channel.ChannelCallback'>
CompletedProcess = <class 'subprocess.CompletedProcess'>
LibsshChannelReadFailure = <class 'pylibsshext.errors.LibsshChannelReadFailure'>
__builtins__ = <builtins>
__doc__    = None
__file__   = '/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pylibsshext/channel.cpython-38-x86_64-linux-gnu.so'
__loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f1714238b50>
__name__   = 'pylibsshext.channel'
__package__ = 'pylibsshext'
__reduce_cython__ = <cyfunction Channel.__reduce_cython__ at 0x7f17141352b0>
__setstate_cython__ = <cyfunction Channel.__setstate_cython__ at 0x7f1714133450>
__spec__   = ModuleSpec(name='pylibsshext.channel', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f1714238b50...libssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pylibsshext/channel.cpython-38-x86_64-linux-gnu.so')
__test__   = {}
signal     = <module 'signal' from '/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/signal.py'>
time       = <module 'time' (built-in)>

src/pylibsshext/channel.pyx:45: AttributeError

The above exception was the direct cause of the following exception:

ssh_client_session = <pylibsshext.session.Session object at 0x7f170fbccca0>

    @pytest.fixture
    def ssh_channel(ssh_client_session):
        """Initialize a channel and tear it down after testing.
    
        :param ssh_client_session: A pre-authenticated SSH session.
        :type ssh_client_session: pylibsshext.session.Session
    
        :yields: A libssh channel instance.
        :ytype: pylibsshext.channel.Channel
        """
        chan = ssh_client_session.new_channel()
        try:  # noqa: WPS501
            yield chan
        finally:
>           chan.close()
E           SystemError: <cyfunction Channel.close at 0x7f17141351e0> returned a result with an error set

chan       = <pylibsshext.channel.Channel object at 0x7f1714180fc0>
ssh_client_session = <pylibsshext.session.Session object at 0x7f170fbccca0>

tests/unit/channel_test.py:33: SystemError
---------------------------- Captured stdout setup -----------------------------
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key.pub
The key fingerprint is:
SHA256:Us8tQjYuzYhTfcblu4lb5wrgGe2ldOtFQ++NgSVUrug ansible-pylibssh integration tests key
The key's randomart image is:
+---[RSA 8192]----+
|            o..  |
|       . . + .   |
|      . * + o +  |
|     o O B o B . |
|    o + S B B + .|
|     . + O * = =.|
|        o E = + o|
|           = +   |
|          . o..  |
+----[SHA256]-----+
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key.pub
The key fingerprint is:
SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw runner@fv-az524-896
The key's randomart image is:
+---[RSA 3072]----+
|==o..            |
|+.+. o   .       |
| +  +   o +      |
|  oo ..+ o =     |
| .  o +oS *     E|
|  . oo.o.= +   + |
|   o .ooo.=..   +|
|    . =oo.o  . . |
|    .+.=..    .  |
+----[SHA256]-----+
---------------------------- Captured stderr setup -----------------------------
debug2: load_server_config: filename /dev/null
debug2: load_server_config: done config len = 1
debug2: parse_server_config_depth: config /dev/null len 1
debug1: sshd version OpenSSH_8.2, OpenSSL 1.1.1f  31 Mar 2020
debug1: private host key #0: ssh-rsa SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw
debug1: setgroups() failed: Operation not permitted
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: rexec_argv[2]='-f'
debug1: rexec_argv[3]='/dev/null'
debug1: rexec_argv[4]='-o'
debug1: rexec_argv[5]='LogLevel=DEBUG3'
debug1: rexec_argv[6]='-o'
debug1: rexec_argv[7]='HostKey=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key'
debug1: rexec_argv[8]='-o'
debug1: rexec_argv[9]='PidFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/sshd.pid'
debug1: rexec_argv[10]='-o'
debug1: rexec_argv[11]='UsePAM=yes'
debug1: rexec_argv[12]='-o'
debug1: rexec_argv[13]='PasswordAuthentication=no'
debug1: rexec_argv[14]='-o'
debug1: rexec_argv[15]='ChallengeResponseAuthentication=no'
debug1: rexec_argv[16]='-o'
debug1: rexec_argv[17]='GSSAPIAuthentication=no'
debug1: rexec_argv[18]='-o'
debug1: rexec_argv[19]='StrictModes=no'
debug1: rexec_argv[20]='-o'
debug1: rexec_argv[21]='PermitEmptyPasswords=yes'
debug1: rexec_argv[22]='-o'
debug1: rexec_argv[23]='PermitRootLogin=yes'
debug1: rexec_argv[24]='-o'
debug1: rexec_argv[25]='Protocol=2'
debug1: rexec_argv[26]='-o'
debug1: rexec_argv[27]='HostbasedAuthentication=no'
debug1: rexec_argv[28]='-o'
debug1: rexec_argv[29]='IgnoreUserKnownHosts=yes'
debug1: rexec_argv[30]='-o'
debug1: rexec_argv[31]='Port=43559'
debug1: rexec_argv[32]='-o'
debug1: rexec_argv[33]='ListenAddress=127.0.0.1'
debug1: rexec_argv[34]='-o'
debug1: rexec_argv[35]='AuthorizedKeysFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/authorized_keys'
debug1: rexec_argv[36]='-o'
debug1: rexec_argv[37]='AcceptEnv=LANG LC_*'
debug1: rexec_argv[38]='-o'
debug1: rexec_argv[39]='Subsystem=sftp internal-sftp'
Warning: Permanently added '[127.0.0.1]:43559' (RSA) to the list of known hosts.
=================================== FAILURES ===================================
______________________________ test_exec_command _______________________________
[gw0] linux -- Python 3.8.18 /home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/bin/python

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>

    def test_exec_command(ssh_channel):
        """Test getting the output of a remotely executed command."""
        u_cmd_out = ssh_channel.exec_command('echo -n Hello World').stdout.decode()
        assert u_cmd_out == u'Hello World'  # noqa: WPS302
        # Test that repeated calls to exec_command do not segfault.
    
        # NOTE: Call `exec_command()` once again from another function to
        # NOTE: force it to happen in another place of the call stack,
        # NOTE: making sure that the context is different from one in this
        # NOTE: this test function. The resulting call stack will end up
        # NOTE: being more random.
>       exec_second_command(ssh_channel)

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>
u_cmd_out  = 'Hello World'

tests/unit/channel_test.py:53: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>

    def exec_second_command(ssh_channel):
        """Call ``exec_command()`` and checks the output."""
        u_cmd_out = ssh_channel.exec_command('echo -n Hello Again').stdout.decode()
>       assert u_cmd_out == u'Hello Again'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello Again'
E         
E         - Hello Again

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>
u_cmd_out  = ''

tests/unit/channel_test.py:39: AssertionError
---------------------------- Captured stdout setup -----------------------------
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key.pub
The key fingerprint is:
SHA256:Us8tQjYuzYhTfcblu4lb5wrgGe2ldOtFQ++NgSVUrug ansible-pylibssh integration tests key
The key's randomart image is:
+---[RSA 8192]----+
|            o..  |
|       . . + .   |
|      . * + o +  |
|     o O B o B . |
|    o + S B B + .|
|     . + O * = =.|
|        o E = + o|
|           = +   |
|          . o..  |
+----[SHA256]-----+
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key.pub
The key fingerprint is:
SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw runner@fv-az524-896
The key's randomart image is:
+---[RSA 3072]----+
|==o..            |
|+.+. o   .       |
| +  +   o +      |
|  oo ..+ o =     |
| .  o +oS *     E|
|  . oo.o.= +   + |
|   o .ooo.=..   +|
|    . =oo.o  . . |
|    .+.=..    .  |
+----[SHA256]-----+
---------------------------- Captured stderr setup -----------------------------
debug2: load_server_config: filename /dev/null
debug2: load_server_config: done config len = 1
debug2: parse_server_config_depth: config /dev/null len 1
debug1: sshd version OpenSSH_8.2, OpenSSL 1.1.1f  31 Mar 2020
debug1: private host key #0: ssh-rsa SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw
debug1: setgroups() failed: Operation not permitted
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: rexec_argv[2]='-f'
debug1: rexec_argv[3]='/dev/null'
debug1: rexec_argv[4]='-o'
debug1: rexec_argv[5]='LogLevel=DEBUG3'
debug1: rexec_argv[6]='-o'
debug1: rexec_argv[7]='HostKey=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key'
debug1: rexec_argv[8]='-o'
debug1: rexec_argv[9]='PidFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/sshd.pid'
debug1: rexec_argv[10]='-o'
debug1: rexec_argv[11]='UsePAM=yes'
debug1: rexec_argv[12]='-o'
debug1: rexec_argv[13]='PasswordAuthentication=no'
debug1: rexec_argv[14]='-o'
debug1: rexec_argv[15]='ChallengeResponseAuthentication=no'
debug1: rexec_argv[16]='-o'
debug1: rexec_argv[17]='GSSAPIAuthentication=no'
debug1: rexec_argv[18]='-o'
debug1: rexec_argv[19]='StrictModes=no'
debug1: rexec_argv[20]='-o'
debug1: rexec_argv[21]='PermitEmptyPasswords=yes'
debug1: rexec_argv[22]='-o'
debug1: rexec_argv[23]='PermitRootLogin=yes'
debug1: rexec_argv[24]='-o'
debug1: rexec_argv[25]='Protocol=2'
debug1: rexec_argv[26]='-o'
debug1: rexec_argv[27]='HostbasedAuthentication=no'
debug1: rexec_argv[28]='-o'
debug1: rexec_argv[29]='IgnoreUserKnownHosts=yes'
debug1: rexec_argv[30]='-o'
debug1: rexec_argv[31]='Port=43559'
debug1: rexec_argv[32]='-o'
debug1: rexec_argv[33]='ListenAddress=127.0.0.1'
debug1: rexec_argv[34]='-o'
debug1: rexec_argv[35]='AuthorizedKeysFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/authorized_keys'
debug1: rexec_argv[36]='-o'
debug1: rexec_argv[37]='AcceptEnv=LANG LC_*'
debug1: rexec_argv[38]='-o'
debug1: rexec_argv[39]='Subsystem=sftp internal-sftp'
Warning: Permanently added '[127.0.0.1]:43559' (RSA) to the list of known hosts.
- generated xml file: /home/runner/work/pylibssh/pylibssh/.test-results/pytest/results.xml -
============================= slowest 10 durations =============================
33.40s setup    tests/unit/scp_test.py::test_get
29.28s setup    tests/unit/scp_test.py::test_get_large
22.48s setup    tests/unit/channel_test.py::test_recv_eof
22.26s setup    tests/unit/scp_test.py::test_get_existing_local
16.19s setup    tests/unit/channel_test.py::test_channel_exit_status
12.88s setup    tests/unit/channel_test.py::test_send_eof
12.46s setup    tests/unit/scp_test.py::test_copy_from_non_existent_remote_path
12.35s setup    tests/unit/channel_test.py::test_request_exec
11.33s setup    tests/unit/channel_test.py::test_send_signal
11.08s setup    tests/unit/channel_test.py::test_is_eof
=========================== short test summary info ============================
ERROR tests/unit/channel_test.py::test_exec_command - SystemError: <cyfunctio...
FAILED tests/unit/channel_test.py::test_exec_command - AssertionError: assert...
=============== 1 failed, 24 passed, 1 error in 67.99s (0:01:07) ===============

@Jakuje Jakuje force-pushed the crash-exec branch 2 times, most recently from a2fcf3c to 9a007ea Compare November 25, 2024 12:47
@Jakuje
Copy link
Contributor Author

Jakuje commented Nov 25, 2024

I have the feeling there is some issue in how the references are counted or some timing/synchronization issue. The current code works for me locally as well as in manylinux, but fails in rpm build of Fedora 40+ fails already with the first command:

>       assert u_cmd.stdout.decode() == u'Hello World'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello World'
E         - Hello World

but on Fedora 39 fails on the second one:

>       assert u_cmd.stdout.decode() == u'Hello Again'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello Again'
E         - Hello Again

@Jakuje
Copy link
Contributor Author

Jakuje commented Dec 22, 2024

Seems like I can see the issue now. The problem is that the Chanel::exec_command() sends the command, but does not wait for the server reply and basically immediately sends the EOF. The call to the ssh_channel_get_exit_status() add some synchronization, meaning that it tries to read the outstanding packets, but does not guarantee that all of the data is processed. In the last run, I can see that the messages from the OpenSSH server are received in this order:

(first exit status)
[2024/12/22 18:51:16.267580, 3] channel_rcv_request:  received exit-status 0
(second data we look for)
[2024/12/22 18:51:16.267604, 3] channel_default_bufferize:  placing 11 bytes into channel buffer (stdout)
(eof, close)
[2024/12/22 18:51:16.267656, 3] channel_rcv_eof:  Received eof on channel (44:1)
[...]
[2024/12/22 18:51:16.267670, 3] channel_rcv_close:  Received close on channel (44:1)

While naively, I would expect the first two messages should come in reversed direction. But I do not see this mandated anywhere in the specs. The openssh sends this message basically async, when the child process dies:

https://github.com/openssh/openssh-portable/blob/826483d51a9fee60703298bbf839d9ce37943474/session.c#L2369

My suspicion is that this is a timing/scheduling issue and when openssh sends the messages in this order, we will not process both output and exit code and close the channel earlier than we could read the output.

Pushed a change that should address this issue, I hope.

@Jakuje
Copy link
Contributor Author

Jakuje commented Dec 23, 2024

This looks like it did not help. Reading through the code and through the logs, another data point might be that the ssh_channel_request_exec might already invoke some callbacks and if they are not registered at that time, we might lose some data. Trying with setting the callback earlier.

@Jakuje
Copy link
Contributor Author

Jakuje commented Dec 23, 2024

This sounds like it finally made it through. Now, there is just the broken ubi, which I likely broke with the removal of the pytest-forked.

@Jakuje
Copy link
Contributor Author

Jakuje commented Dec 23, 2024

The error in ubi looked like this:

+ /usr/bin/python3 -m build --wheel --skip-dependencies --no-isolation .
[1/1] Cythonizing /tmp/.tmp-ansible-pylibssh-pep517-84wt4gfm/src/src/pylibsshext/errors.pyx
/github/home/rpmbuild/BUILD/ansible-pylibssh-1.2.3.dev87+ga7eb975/bin/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning: 
ERROR: setuptools==56.0.0 is used in combination with setuptools_scm>=8.x

Your build configuration is incomplete and previously worked by accident!
setuptools_scm requires setuptools>=61

Suggested workaround if applicable:
 - migrating from the deprecated setup_requires mechanism to pep517/518
   and using a pyproject.toml to declare build dependencies
   which are reliably pre-installed before running the build tools

(not sure if this is the relevant part though)

It worked last month, but I do not see any logs from the ubi runs. lets try with the pytest-forked (with some cleanup of the commit series)

@Jakuje
Copy link
Contributor Author

Jakuje commented Dec 23, 2024

now, rpmfind is down causing the ubi build failure ... probably giving up for today, but it should be in the shape for a reviews

The Protocol=2 is default and the current implementation does not even
support any other protocol version. Using this just makes it confusing
to readers

Signed-off-by: Jakub Jelen <[email protected]>
When the OpenSSH server runs in daemon mode, the logs are written into
syslog. This is not suitable for tests as we can not see what is going
on there. The captured logs are only from the main sshd process and
does not show any useful information about the connection and what we
actually test.

The trick to get debug logs is either to not run in deamon mode, but it
would accept only one connection, or use the `-E` to write logs from all
connections to the given file.

Signed-off-by: Jakub Jelen <[email protected]>
Even though the PAM is required for the server to correctly work and
integrate well to the OS, for testing, PAM introduces needless
complexity and it is much better to not to use it.

The tests are running under non-root user so the OpenSSH won't use PAM
anyway.

Signed-off-by: Jakub Jelen <[email protected]>
This simplifies debugging of the failed tests. With previous changes, we
have already debug logs from the server. This prints also debug logs
from the client and allows us to align sent and received protocol
messages.

Signed-off-by: Jakub Jelen <[email protected]>
Previously, stack-local variables held the callback structure.
This worked as long as the call stack was not overridden by other
function calls or as long as the server closed the channel
very early. In other cases, the delayed close in libssh
could result in calling the callbacks even after we believed
the channel was freed, causing invalid memory access and crashes.

The accompanying change to libssh was merged to avoid calling
callbacks on channels the caller considers freed here:

https://gitlab.com/libssh/libssh-mirror/-/merge_requests/549/

But we will be using older libssh versions for some time so we
need a workaround in pylibssh too.

Fixes ansible#57

Signed-off-by: Jakub Jelen <[email protected]>
Surprisingly, the SSH protocol does not define if the exit-code
notification can come before or after the data. Depending on the server,
scheduler or current mood, the exit-code might arrive earlier than the
data and if we close the channel right away, we will lose some output.

This changes the code to correctly wait for the EOF before closing the
channel and returning.

Signed-off-by: Jakub Jelen <[email protected]>
Depending on scheduler, the ssh_channel_request_exec() might already
process some of the data from the peer and we could lose some data from
peer.

By setting the callback before calaling this function, we make sure
callbacks are correctly triggered on all the data.

Signed-off-by: Jakub Jelen <[email protected]>
Removes the failed marker as it works now.

Signed-off-by: Jakub Jelen <[email protected]>
Signed-off-by: Jakub Jelen <[email protected]>
Signed-off-by: Jakub Jelen <[email protected]>
Copy link

@Jakuje
Copy link
Contributor Author

Jakuje commented Dec 23, 2024

Ok, so now only the ubi is broken (and not because of the pytest-forked, which can be now safely removed). This talks about deprecated setuptools, which I believe should be addressed outside of this PR as it is not related to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Flaky SEGFAULT while testing exec_command
2 participants