Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something is opening file descriptors and not closing them #51

Open
brutog opened this issue Nov 18, 2020 · 4 comments · May be fixed by #72
Open

Something is opening file descriptors and not closing them #51

brutog opened this issue Nov 18, 2020 · 4 comments · May be fixed by #72

Comments

@brutog
Copy link

brutog commented Nov 18, 2020

Easy to reproduce. Just run a fabric task across 1024+ hosts on. The per process file descriptor will trigger and the first place fab-classic breaks is on this select:

!!! Parallel execution exception under host 'test.host:
Process test.host:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/repo/python-pp-fabric/fabric/tasks.py", line 217, in _parallel_wrap
    queue.put({'name': name, 'result': task.run(*args, **kwargs)})
  File "/home/ubuntu/repo/python-pp-fabric/fabric/tasks.py", line 168, in run
    return self.wrapped(*args, **kwargs)
  File "/home/ubuntu/repo/python-pp-fabtools/pp_fabtools/system_cmds.py", line 20, in runcmd_as_root
    cmd = sudo(cmd)
  File "/home/ubuntu/repo/python-pp-fabric/fabric/network.py", line 700, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/home/ubuntu/repo/python-pp-fabric/fabric/operations.py", line 1130, in sudo
    capture_buffer_size=capture_buffer_size,
  File "/home/ubuntu/repo/python-pp-fabric/fabric/operations.py", line 923, in _run_command
    timeout=timeout, capture_buffer_size=capture_buffer_size)
  File "/home/ubuntu/repo/python-pp-fabric/fabric/operations.py", line 807, in _execute
    worker.raise_if_needed()
  File "/home/ubuntu/repo/python-pp-fabric/fabric/thread_handling.py", line 26, in raise_if_needed
    six.reraise(e[0], e[1], e[2])
  File "/home/ubuntu/.virtualenvs/python-pp-fabric/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/ubuntu/repo/python-pp-fabric/fabric/thread_handling.py", line 13, in wrapper
    callable(*args, **kwargs)
  File "/home/ubuntu/repo/python-pp-fabric/fabric/io.py", line 269, in input_loop
    r, w, x = select([f], [], [], 0.0)
ValueError: filedescriptor out of range in select()

As far as where this is happening, the best read I have on it seems like sys.stdout and/or other sys based output streams are being duplicated through the threadable call to io.py and output/input_looper functions.

I can say that this doesn't happen at all in the original fabric package. It also doesn't seem to happen under fab-classic using python2.7 - I've tested 3.6.9 and 3.7.5 and they both have this problem.

Doing something like this at the bottom of the tasks.execeute():

os.close(sys.stdin.fileno())
os.close(sys.stdout.fileno())
os.close(sys.stderr.fileno())

While silly, this stops the leak. It breaks all the other output of the program. There is a noted difference between fd.close() and os.close(fd).

The best I can tell is it seems like fd.close() leaves the C level file descriptor open in python 3.x but I'm not sure why yet.

@ploxiln
Copy link
Owner

ploxiln commented Nov 18, 2020

I've had some idea of re-implementing the parallel mode with threads, because the multiprocessing strategy doesn't work with python-3.8 (can't seem to pass references to functions decorated with @task to the forked process), and I think it never worked on windows.

Separately, there is the idea of replacing select() with poll(), on unix systems, to avoid the max-fd-value problem. Here's the idea on the paramiko side: ploxiln/paramiko-ng#30

unfortunately I've been neglecting these two projects recently, with the usual excuse, busy at work on other stuff ...

@brutog
Copy link
Author

brutog commented Nov 18, 2020

I did think about poll as well. Maybe I'll give something a shot if you'll accept a PR.

@brutog
Copy link
Author

brutog commented Nov 18, 2020

Well, something as simple as this works in io.py:

        elif waitable:
            poller = poll()
            poller.register(f)
            if poller.poll(f.fileno()):
                byte = f.read(1)

File descriptors for the parent fab process still go over 1024 but fabric no longer bails. Not sure if something more robust than that is needed.

Looks context_managers.py uses select as well and could probably use the same treatment.

@ploxiln
Copy link
Owner

ploxiln commented Nov 19, 2020

Yup, feel free to open a PR. But notice that the paramiko PR checks whether poll() is available and falls back to select() (primarily for Windows compatibility).

Stealthii added a commit to Stealthii/fab-classic that referenced this issue Apr 12, 2022
Using select() here causes a file descriptor leak. We've replaced this
with a poll() implemenetation to avoid this.

The polling object needs to be opened with an eventmask that stops it
from blocking paramiko.

Fixes ploxiln#51

Co-authored-by: Dan Porter <[email protected]>
Stealthii added a commit to Stealthii/fab-classic that referenced this issue Apr 12, 2022
Using select() here causes a file descriptor leak. We've replaced this
with a poll() implementation to avoid this.

The polling object needs to be opened with an eventmask that stops it
from blocking paramiko.

Fixes ploxiln#51

Co-authored-by: Dan Porter <[email protected]>
@Stealthii Stealthii linked a pull request Apr 12, 2022 that will close this issue
Stealthii added a commit to Stealthii/fab-classic that referenced this issue Apr 12, 2022
Using select() here causes a file descriptor leak. We've replaced this
with a poll() implementation to avoid this.

The polling object needs to be opened with an eventmask that stops it
from blocking paramiko.

Fixes ploxiln#51

Co-authored-by: Dan Porter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants