Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] psycopg-binary breaks TLS connections to Postgres #53

Closed
1 task done
colemannugent opened this issue Mar 7, 2024 · 6 comments
Closed
1 task done

[BUG] psycopg-binary breaks TLS connections to Postgres #53

colemannugent opened this issue Mar 7, 2024 · 6 comments

Comments

@colemannugent
Copy link

colemannugent commented Mar 7, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

rqworker crashes due to segmentation faults when processing requests:

[uwsgi-daemons] spawning "python3 ./manage.py rqworker" (uid: 1000 gid: 1000)
No queues have been specified. This process will service the following queues by default: high, default, low
17:27:13 Worker rq:worker:db836afd502f4f86a965691bd9586f58 started with PID 193, version 1.16.0
17:27:13 Subscribing to channel rq:pubsub:db836afd502f4f86a965691bd9586f58
17:27:13 *** Listening on high, default, low...
17:27:13 Scheduler for low, default, high started with PID 195
!!! uWSGI process 192 got Segmentation Fault !!!
DAMN ! worker 1 (pid: 192) died :( trying respawn ...
Respawned uWSGI worker 1 (new pid: 196)
!!! uWSGI process 196 got Segmentation Fault !!!
DAMN ! worker 1 (pid: 196) died :( trying respawn ...
Respawned uWSGI worker 1 (new pid: 212)

Expected Behavior

rqworker should not segfault 😆

Steps To Reproduce

  1. Configure standalone Postgres to use TLS only (TLS 1.3, certs from internal PKI)
  2. Make sure certs are present in container, typically through bind mounting host CA store
  3. Set HOME=. in a env var since libpq will look for CA certs in ~/.postgresql/root.crt by default and will fail since /root/.postgresql/root.crt is not accessable to the abc user that uwsgi and it's children run as.
  4. Alter configuration.py to securely connect to Postgres like so:
DATABASE = {
    # Other DB settings elided for clarity
    'OPTIONS': {
        'sslmode': 'verify-full', # Require TLS and verify the certificate
        'sslrootcert': '/path/to/your/root',  # You must enter this path explicitly, the magic 'system' value doesn't work here
    }
}
  1. Start the container
  2. Make a request once a rqworker has started

Environment

- OS: Ubuntu 22.04.3 LTS
- Docker: 25.0.3
- Docker Compose: v2.24.5

CPU architecture

x86-64

Docker creation

Snippet from docker-compose.yml:

  netbox: &netbox
    image: lscr.io/linuxserver/netbox:latest
    container_name: netbox
    depends_on:
    - redis
    env_file:
    - netbox.env
    - netbox.secret
    volumes:
    - ./configuration.py:/defaults/configuration.py
    - /etc/ssl/certs:/etc/ssl/certs:ro
    - /usr/local/share/ca-certificates:/usr/local/share/ca-certificates:ro

Excerpt from configuration.py:

DATABASE = {
    # Other DB settings elided for clarity
    'OPTIONS': {
        'sslmode': 'verify-full', # Require TLS and verify the certificate
        'sslrootcert': '/path/to/your/root',  # You must enter this path explicitly, the magic 'system' value doesn't work here
    }
}

netbox.env:

PUID=1000
PGID=1000
TZ=America/Los_Angeles
[email protected]
ALLOWED_HOST=netbox.internal.com
DB_NAME=netbox
DB_USER=netbox
DB_HOST=db.internal.com
DB_PORT=5432
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_DB_TASK=1
REDIS_DB_CACHE=2
HOME=.

Container logs

[migrations] started
[migrations] no migrations found
───────────────────────────────────────

      ██╗     ███████╗██╗ ██████╗
      ██║     ██╔════╝██║██╔═══██╗
      ██║     ███████╗██║██║   ██║
      ██║     ╚════██║██║██║   ██║
      ███████╗███████║██║╚██████╔╝
      ╚══════╝╚══════╝╚═╝ ╚═════╝

   Brought to you by linuxserver.io
───────────────────────────────────────

To support LSIO projects visit:
https://www.linuxserver.io/donate/

───────────────────────────────────────
GID/UID
───────────────────────────────────────

User UID:    1000
User GID:    1000
───────────────────────────────────────

Building local documentation
INFO    -  Cleaning site directory
INFO    -  Building documentation to directory: /app/netbox/netbox/project-static/docs
INFO    -  The following pages exist in the docs directory, but are not included in the "nav" configuration:
  - index.md
INFO    -  Documentation built in 19.59 seconds
Operations to perform:
  Apply all migrations: account, admin, auth, circuits, contenttypes, core, dcim, django_rq, extras, ipam, sessions, social_django, taggit, tenancy, users, virtualization, vpn, wireless
Running migrations:
  Applying circuits.0043_circuittype_color... OK
  Applying core.0006_datasource_type_remove_choices... OK
  Applying core.0007_job_add_error_field... OK
  Applying core.0008_contenttype_proxy... OK
  Applying core.0009_configrevision... OK
  Applying core.0010_gfk_indexes... OK
  Applying dcim.0183_devicetype_exclude_from_utilization... OK
  Applying dcim.0184_protect_child_interfaces... OK
  Applying dcim.0185_gfk_indexes... OK
  Applying extras.0099_cachedvalue_ordering... OK
  Applying extras.0100_customfield_ui_attrs... OK
  Applying extras.0101_eventrule... OK
  Applying extras.0102_move_configrevision... OK
  Applying extras.0103_gfk_indexes... OK
  Applying extras.0104_stagedchange_remove_change_logging... OK
  Applying extras.0105_customfield_min_max_values... OK
  Applying extras.0106_bookmark_user_cascade_deletion... OK
  Applying extras.0107_cachedvalue_extras_cachedvalue_object... OK
  Applying ipam.0068_move_l2vpn... OK
  Applying ipam.0069_gfk_indexes... OK
  Applying taggit.0006_rename_taggeditem_content_type_object_id_taggit_tagg_content_8fc721_idx... OK
  Applying tenancy.0012_contactassignment_custom_fields... OK
  Applying tenancy.0013_gfk_indexes... OK
  Applying tenancy.0014_contactassignment_ordering... OK
  Applying virtualization.0037_protect_child_interfaces... OK
  Applying virtualization.0038_virtualdisk... OK
  Applying vpn.0001_initial... OK
  Applying vpn.0002_move_l2vpn... OK
  Applying vpn.0003_ipaddress_multiple_tunnel_terminations... OK
  Applying vpn.0004_alter_ikepolicy_mode... OK
Superuser creation skipped. Already exists.
[custom-init] No custom files found, skipping...
[uWSGI] getting INI configuration from uwsgi.ini
[uwsgi-static] added mapping for /static => static
*** Starting uWSGI 2.0.23 (64bit) on [Thu Mar  7 09:26:48 2024] ***
compiled with version: 13.2.1 20231014 on 30 November 2023 14:34:33
os: Linux-5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024
nodename: 9adf44f9c41f
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 12
current working directory: /app/netbox/netbox
detected binary path: /usr/sbin/uwsgi
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
building mime-types dictionary from file /etc/mime.types...1390 entry found
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :8000 fd 3
Python version: 3.11.8 (main, Feb 19 2024, 17:01:17) [GCC 13.2.1 20231014]
PEP 405 virtualenv detected: /lsiopy
Set PythonHome to /lsiopy
Python main interpreter initialized at 0x7f9b079a52d8
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 203184 bytes (198 KB) for 1 cores
*** Operational MODE: single process ***
running "exec:python3 ./manage.py collectstatic --noinput" (pre app)...
Connection to localhost (127.0.0.1) 8000 port [tcp/*] succeeded!
[ls.io-init] done.

535 static files copied to '/app/netbox/netbox/static'.
running "exec:python3 ./manage.py remove_stale_contenttypes --no-input" (pre app)...
running "exec:python3 ./manage.py clearsessions" (pre app)...
WSGI app 0 (mountpoint='') ready in 2 seconds on interpreter 0x7f9b079a52d8 pid: 156 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 156)
spawned uWSGI worker 1 (pid: 192, cores: 1)
[uwsgi-daemons] spawning "python3 ./manage.py rqworker" (uid: 1000 gid: 1000)
No queues have been specified. This process will service the following queues by default: high, default, low
17:27:13 Worker rq:worker:db836afd502f4f86a965691bd9586f58 started with PID 193, version 1.16.0
17:27:13 Subscribing to channel rq:pubsub:db836afd502f4f86a965691bd9586f58
17:27:13 *** Listening on high, default, low...
17:27:13 Scheduler for low, default, high started with PID 195
!!! uWSGI process 192 got Segmentation Fault !!!
DAMN ! worker 1 (pid: 192) died :( trying respawn ...
Respawned uWSGI worker 1 (new pid: 196)
!!! uWSGI process 196 got Segmentation Fault !!!
DAMN ! worker 1 (pid: 196) died :( trying respawn ...
Respawned uWSGI worker 1 (new pid: 212)
[uwsgi-daemons] stopping daemon (pid: 193): python3 ./manage.py rqworker
17:27:53 Worker db836afd502f4f86a965691bd9586f58 [PID 193]: warm shut down requested
17:27:53 Scheduler stopping, releasing locks for low, default, high...
17:27:53 Scheduler with PID 195 has stopped
17:27:53 Unsubscribing from channel rq:pubsub:db836afd502f4f86a965691bd9586f58
...brutally killing workers...
Copy link

github-actions bot commented Mar 7, 2024

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

@colemannugent
Copy link
Author

I've also identified a possible fix: removing psycopg-binary.

After uninstalling with pip uninstall psycopg-binary requests succeed and everything appears to work normally.

This appears to be a side effect of how psycopg-binary is built with included SSL libs. From the PsycoPG install docs:

Warning: The psycopg2 wheel package comes packaged, among the others, with its own libssl binary. This may create conflicts with other extension modules binding with libssl as well, for instance with the Python ssl module: in some cases, under concurrency, the interaction between the two libraries may result in a segfault. In case of doubts you are advised to use a package built from source.

In order to support this fairly common use-case of secure TLS connections we may have to remove psycopg-binary, but that might have other implications like having to install libpq-dev from APK.

@colemannugent
Copy link
Author

After some testing it looks like uninstalling psycopg-binary is all it takes.

I added this to /etc/s6-overlay/s6-rc.d/init-netbox-config/run to make sure it's uninstalled before netbox starts:

pip uninstall -y psycopg-binary

@LinuxServer-CI
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.

@thespad
Copy link
Member

thespad commented May 18, 2024

Look like they've removed the binary package from requirements netbox-community/netbox@93c9f8c

Is this still an issue that you're seeing?

@colemannugent
Copy link
Author

That change seems to have resolved the issue. We've been running the stock run script on Netbox 4 for a little while now.

@LinuxServer-CI LinuxServer-CI moved this from Issues to Done in Issue & PR Tracker May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants