Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess.call interrupted by system call may cause deployment operation failed #7

Open
hpttlook opened this issue Jan 2, 2014 · 5 comments
Assignees

Comments

@hpttlook
Copy link

hpttlook commented Jan 2, 2014

In deployment/rpcinterface.py file, commands are executed by call subprocess.call() function. Somtimes the bootstrap operation failed on different hosts with no rule. From the supervisord.log, it shows the operation is commonly failed by execute the following command:
tar zxf xxx.tar.gz -C root_dir
I checked that the package(xxx.tar.gz) did exist and execute this command mannually on the target machine, it worked with no error. i am using python 2.6.
I do some google search, and found somebody do encounter the similary problem:
https://mail.python.org/pipermail/pythonmac-sig/2006-September/018095.html
i don't know why, but it seems some signal from supervisord cause the popen.wait to exit.i replace all subprocess.call with os.system, so far it works great.

@wuzesheng
Copy link
Contributor

Can you post a stack trace of the error?

@ghost ghost assigned wuzesheng Jan 2, 2014
@hpttlook
Copy link
Author

hpttlook commented Jan 3, 2014

here is the stack trace:
2014-01-02 10:06:15,506 CRIT Traceback (most recent call last):
File "/home/work/app/supervisor/supervisor/xmlrpc.py", line 345, in continue_request
value = self.call(method, params)
File "/home/work/app/supervisor/supervisor/xmlrpc.py", line 388, in call
return traverse(self.rpcinterface, method, params)
File "/home/work/app/supervisor/supervisor/xmlrpc.py", line 402, in traverse
return ob(_params)
File "/home/work/app/supervisor/deployment/rpcinterface.py", line 221, in bootstrap
return self._do_bootstrap(service, cluster, job, instance_id, *_config_dict)
File "/home/work/app/supervisor/deployment/rpcinterface.py", line 525, in _do_bootstrap
message = self._prepare_run_env(service, cluster, job, instance_id, *_config_dict)
File "/home/work/app/supervisor/deployment/rpcinterface.py", line 508, in _prepare_run_env
instance_id, revision, timestamp, package_name)
File "/home/work/app/supervisor/deployment/rpcinterface.py", line 420, in _make_package_dir
subprocess.call(cmd):
File "/usr/lib64/python2.6/subprocess.py", line 444, in call
return Popen(_popenargs, **kwargs).wait()
File "/usr/lib64/python2.6/subprocess.py", line 1137, in wait
pid, sts = os.waitpid(self.pid, 0)
OSError: [Errno 4] Interrupted system call

BTW: In my minos repo, i changed the check_call() to call() as u can see from the stacktrace
Currently, i re-deploy the supervisor module restart it. i cannot re-produce this problem in some special machine, it just appears from time to time.

@wuzesheng
Copy link
Contributor

According to my experience of using minos, the uncompress error mainly caused by insufficient disk space, you can pay more attention to your disk usage when you find uncompress failure.

@hpttlook
Copy link
Author

hpttlook commented Jan 3, 2014

thanks. But i manually execute this command, it succeeded with no error.

@wuzesheng
Copy link
Contributor

Got it, let's keep this issue open to track this problem, post more evidences if you find any

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants