Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use of populateEtcDynamically=1 generates failures to start the sshd #272

Open
dmjacobsen opened this issue Dec 28, 2019 · 2 comments
Open

Comments

@dmjacobsen
Copy link
Contributor

I am evaluating using:

allowLibcPwdCalls=1
populateEtcDynamically=1

in udiRoot.conf in order to get away from the centralized password file which is now starting to cause problems of another sort.

When populateEtcDynamically is enabled a passwd and group file are generated, and they only include the current user and the primary group for that user. This lack of auxiliary is problematic and should also be fixed. The lack of an "sshd" in /etc/passwd causes the integrated sshd to fail with:

[2019-12-28T01:43:59.257] error: setupRoot stdout: Generating public/private dsa key pair.

[2019-12-28T01:43:59.485] error: setupRoot stderr: Privilege separation user sshd does not exist^M

[2019-12-28T01:43:59.485] error: setupRoot stdout: Your identification has been saved in /var/udiMount/opt/udiImage/etc/ssh_host_dsa_key.

[2019-12-28T01:43:59.485] error: setupRoot stderr: FAILED to start sshd

[2019-12-28T01:43:59.485] error: setupRoot stdout: Your public key has been saved in /var/udiMount/opt/udiImage/etc/ssh_host_dsa_key.pub.

[2019-12-28T01:43:59.485] error: waiting on setupRoot

[2019-12-28T01:43:59.485] error: FAILED to run setupRoot
[2019-12-28T01:43:59.485] error: after setupRoot, exit code: 1

(from a slurmd log)

It might be good if populateEtcDynamically could augment an existing skeleton passwd/group file with the current user and all (up to maxGroupCount) groups for that user.

@scanon
Copy link
Member

scanon commented Jan 3, 2020

Can you elaborate on what you mean by "which is now starting to cause problems of another sort."?

@dmjacobsen
Copy link
Contributor Author

sure, there have been two issues with using a passwd and group file in etcFiles

  1. in the NERSC deployment we've had a cron job generate the files, which meant new users might not appear until the cron job reran. Also, from time to time that cron job has broken for one reason or another and has been a source of additional maintenance.

  2. in order to easily generate the passwd/group files via the cron job, the NERSC deployment has had to enable sssd enumeration. with a large quantity of users we have found that enumeration has generated major performance issues with sssd with some lookups. this has impacted slurmctld performance rather badly. thus, it is preferable to disable sssd enumeration, in order to disable this we also have to either move generation of the passwd/group files for shifter to another node in the system, or move to this configuration.

in light of slurm's recent nss_slurm, it is now possible to scalably lookup users with allowLibcPwdCalls, which was not the case during the original development of shifter. ironically, it is also use of nss_slurm which is potentially driving some of the sssd enumeration performance impacts for the slurm controller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants