Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requires system-wide runit on Debian #53

Open
matthiasr opened this issue Mar 27, 2017 · 18 comments
Open

requires system-wide runit on Debian #53

matthiasr opened this issue Mar 27, 2017 · 18 comments

Comments

@matthiasr
Copy link

Break-out from #52.

We use system-wide runit for various other daemons on our nodes. chef-server-ctl reconfigure works correctly, but to use chef-server-ctl restart or chef-server-ctl status we need to explicitly set SVDIR (which #52 will prevent).

$ sudo env -i chef-server-ctl status
fail: bookshelf: unable to change to service directory: file does not exist
fail: nginx: unable to change to service directory: file does not exist
fail: oc_bifrost: unable to change to service directory: file does not exist
fail: oc_id: unable to change to service directory: file does not exist
fail: opscode-erchef: unable to change to service directory: file does not exist
fail: opscode-expander: unable to change to service directory: file does not exist
fail: opscode-solr4: unable to change to service directory: file does not exist
fail: postgresql: unable to change to service directory: file does not exist
fail: rabbitmq: unable to change to service directory: file does not exist
fail: redis_lb: unable to change to service directory: file does not exist
$ sudo env -i SVDIR=/opt/opscode/service chef-server-ctl status
run: bookshelf: (pid 4816) 1724702s; run: log: (pid 4869) 1724702s
run: nginx: (pid 4723) 1724706s; run: log: (pid 4980) 1724698s
run: oc_bifrost: (pid 4656) 1724708s; run: log: (pid 4693) 1724707s
run: oc_id: (pid 4708) 1724706s; run: log: (pid 4713) 1724706s
run: opscode-erchef: (pid 4936) 1724700s; run: log: (pid 4917) 1724701s
run: opscode-expander: (pid 4777) 1724703s; run: log: (pid 4807) 1724703s
run: opscode-solr4: (pid 4737) 1724704s; run: log: (pid 4767) 1724704s
run: postgresql: (pid 4635) 1724708s; run: log: (pid 4647) 1724708s
run: rabbitmq: (pid 4538) 1724709s; run: log: (pid 4530) 1724710s
run: redis_lb: (pid 4473) 1724773s; run: log: (pid 4976) 1724698s

This is on Chef server 12.7 and 12.13 (same effect on both).

Digging into the code, the shell-out happens here. It does not call the embedded/bin/sv binary but wrapper scripts in init/, and these contain RUNIT=/usr/bin/sv which is the global binary, looking for the services in /etc/service.

I don't know why these wrapper scripts are necessary, how they are created, or whether they should be used at all?

@matthiasr
Copy link
Author

cc @srenatus

@stevendanna
Copy link
Contributor

@matthiasr Thanks for reporting this! The files in init/ shouldn't be wrapper scripts at all, but rather symlinks to the internal sv:

vagrant@api:~$ ls -al /opt/opscode/init/
total 8
drwxrwxr-x 2 root root 4096 Mar 27 15:08 .
drwxrwxr-x 8 root root 4096 Mar 27 15:05 ..
lrwxrwxrwx 1 root root   28 Mar 27 15:07 bookshelf -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:08 nginx -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 oc_bifrost -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 oc_id -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:08 opscode-chef-mover -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 opscode-erchef -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 opscode-expander -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 opscode-solr4 -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 postgresql -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:06 rabbitmq -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:08 redis_lb -> /opt/opscode/embedded/bin/sv

If you are seeing those files as scripts, would you mind sharing:

  1. The content of the script
  2. The content of your chef-server.rb
  3. The version of chef-server you are running

There must be another piece to this puzzle.

@matthiasr
Copy link
Author

The nginx one is

#!/bin/sh
### BEGIN INIT INFO
# Provides:          nginx
# Required-Start:
# Required-Stop:
# Default-Start:
# Default-Stop:
# Short-Description: initscript for runit-managed nginx service
### END INIT INFO

# Author: Chef Software, Inc. <[email protected]>

PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="runit-managed nginx"
NAME=nginx
RUNIT=/usr/bin/sv
SCRIPTNAME=/etc/init.d/$NAME

# Exit if runit is not installed
[ -x $RUNIT ] || exit 0

# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh

# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions


case "$1" in
  start)
        [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC " "$NAME"
        $RUNIT start $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  stop)
        [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
        $RUNIT stop $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  status)
        $RUNIT status $NAME && exit 0 || exit $?
        ;;
  reload)
        [ "$VERBOSE" != no ] && log_daemon_msg "Reloading $DESC" "$NAME"
        $RUNIT reload $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  force-reload)
        [ "$VERBOSE" != no ] && log_daemon_msg "Force reloading $DESC" "$NAME"
        $RUNIT force-reload $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  restart)
        [ "$VERBOSE" != no ] && log_daemon_msg "Restarting $DESC" "$NAME"
        $RUNIT restart $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  *)
        echo "Usage: $SCRIPTNAME {start|stop|status|reload|force-reload|restart}" >&2
        exit 3
        ;;
esac

:

It's possible we did / are doing something wrong while upgrading between Chef Server versions …

@matthiasr
Copy link
Author

The test server I'm investigating on is on package version 12.7.0-1, on Debian Jessie. I'm upgrading it to 12.13 now. I'll also try with a fresh install.

@matthiasr
Copy link
Author

Upgrading from 12.7 to 12.13 using the proper procedure (stop, upgrade, start, cleanup) doesn't change anything about these files.

@matthiasr
Copy link
Author

… and if I move aside one of the init scripts, then a chef-server-ctl reconfigure faithfully recreates it.

@matthiasr
Copy link
Author

This seems to come from here. I understand why that is, we use an old hacked-up version of the runit cookbook ourselves and Debian is very unhappy about symlinks to binaries in /etc/init.d. It's just problematic that omnibus-ctl assumes otherwise …

so, this is only a problem on Debian, but it is a problem on Debian.

@matthiasr
Copy link
Author

Is there a reason omnibus-ctl needs to execute init/<service> <command> rather than embedded/bin/sv <service> <command>?

@stevendanna
Copy link
Contributor

We are currently on this version of the runit cookbook in chef-server, where the path to runit is hardcoded:

https://github.com/chef-cookbooks/runit/blob/v1.6.0/templates/debian/init.d.erb#L16

However, on newer versions, this appears to be configurable:

https://github.com/chef-cookbooks/runit/blob/master/templates/debian/init.d.erb#L16

So one option might be to upgrade the version of runit we are using.

Is there a reason omnibus-ctl needs to execute init/ rather than embedded/bin/sv ?

As long as we provide compatible output (as to not break any monitoring), I don't see a reason we couldn't use sv directly.

@stevendanna
Copy link
Contributor

In the sort term, maybe the easiest thing is to have omnibus-ctl set SVDIR to the "correct" value

@matthiasr
Copy link
Author

I'm pretty sure with the symlink sv itself behaves the same way, and produces the same output.

Yes, setting SVDIR "correctly" would work, but it still relies on the system runit actually being sane then. It probably will be, but it's still leaking in.

@matthiasr matthiasr changed the title System sv leaks in System sv leaks into chef-server on Debian Mar 27, 2017
@matthiasr
Copy link
Author

As expected from the runit_service provider, the same happens on a clean install, so it's not related to any upgrade issues.

@matthiasr
Copy link
Author

I'm wondering, but have no quick way to check, whether this works at all on Debian unless a system-wide runit package is installed?

@stevendanna
Copy link
Contributor

stevendanna commented Mar 27, 2017

@matthiasr My assumption from the code is that it does not work on debian currently without runit installed on the system (outside of the chef-server package)

@matthiasr
Copy link
Author

Quickly testing this in a minimal VM supports this. I think changing this to use the internal sv binary directly is easy enough, I'll try to make a PR for that. It's not the cleanest solutions – the broken init files still exist, but they would no longer be actually used.

@matthiasr
Copy link
Author

Done now: #54

@stevendanna
Copy link
Contributor

We've merged #55 which will hopefully keep debian working provided you have runit available on the system. I'm going to leave this open, however, since we still depend on system-installed runit.

@matthiasr matthiasr changed the title System sv leaks into chef-server on Debian requires system-wide runit on Debian Mar 28, 2017
@matthiasr
Copy link
Author

I updated the title to reflect that the leaking in is what makes it work in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants