-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meet a problem when use owl to monitor yarn #6
Comments
看这个栈都是django和python底层的,有没有minos本身的栈相关的信息? |
server.log里面没有找到跟minos本身相关的信息,其他日志文件跟这个问题应该没关系 |
这个看上去是在连某个smtp的server,而这个server没有起,但怀疑这个不是root cause. |
你贴一下你要点的那个链接,另外看一下后台django日志中该请求对应的http status code |
链接是个内网的地址,格式类似于:http://10.10.65.13:8080/monitor/cluster/6/task/,感觉应该是你说的“这样的path: owl有问题->django发邮件给管理员->发邮件失败”,我再检查一下日志。 |
好,你看看出问题的请求django返回的http status code, 可能会有些帮助 |
WARNING 2014-01-03 15:07:01,349 collect 16994 140143162611456 <Task: yarn/hadoop-crete/nodemanager/26> failed to update metric: OperationalError(2006, 'MySQL server has gone away')怀疑是数据库连接断开的问题。 |
owl在更新mysql中的监控数据的时候是先建立mysql连接,然后通过jmx获取json数据,再更新msyql table的吗? |
与mysql的连接是底层django维护的,应该是长连接 |
你能发一下你们搭的owl的collect和mysql各自的cpu使用情况吗? |
20182 minos 20 0 101m 23m 3388 S 8 0.0 0:01.99 python2.7 都不大。把收集数据的时间周期设为30,现在mysql的问题没出现了。有个新的问题: |
你看下你的hbase的jmx页面上有没有这项:"name" : "hadoop:service=Master,name=Master" |
这个问题可能有两个原因:
|
应该是hbase版本原因,我们用的是0.96的,jmx页面项有所改变:"name" : "Hadoop:service=HBase,name=MetricsSystem,sub=Stats" |
哦,明白了,对多个版本的兼容这一块看来要做的事情还比较多。 |
嗯,这个在代码里面写的比较死,能改成配置项就好了,最好能提供几个现在常用的hbase版本对应的配置(如果这些版本之间jmx有区别的话) |
刚刚有点错误,你说的jmx那项对应于0.96版的应该是 Hadoop:service=HBase,name=Master |
好,明白了,谢谢反馈。你的建议挺好,我们会考虑。不过目前人力有限,没那么快来做这个事情,所以你这边就先自己改一下用吧。 |
创建了一个新的Issue来跟踪这个事情,#18 |
当在owl的web页面上点击yarn的某个task id时,无法正常进入由opentsdb监控视图组成的页面,而是报错:“A server error occurred. Please contact the administrator.”
查看日志serve.log,发现以下问题:
[02/Jan/2014 15:49:15] "GET /monitor/task/225 HTTP/1.1" 301 0
Traceback (most recent call last):
File "/usr/local/lib/python2.7/wsgiref/handlers.py", line 85, in run
self.result = application(self.environ, self.start_response)
File "/usr/local/lib/python2.7/site-packages/django/contrib/staticfiles/handlers.py", line 67, in call
return self.application(environ, start_response)
File "/usr/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 209, in call
response = self.get_response(request)
File "/usr/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 200, in get_response
response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
File "/usr/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 230, in handle_uncaught_exception
'request': request
File "/usr/local/lib/python2.7/logging/init.py", line 1154, in error
self._log(ERROR, msg, args, **kwargs)
File "/usr/local/lib/python2.7/logging/init.py", line 1246, in _log
self.handle(record)
File "/usr/local/lib/python2.7/logging/init.py", line 1256, in handle
self.callHandlers(record)
File "/usr/local/lib/python2.7/logging/init.py", line 1293, in callHandlers
hdlr.handle(record)
File "/usr/local/lib/python2.7/logging/init.py", line 740, in handle
self.emit(record)
File "/usr/local/lib/python2.7/site-packages/django/utils/log.py", line 106, in emit
connection=self.connection())
File "/usr/local/lib/python2.7/site-packages/django/core/mail/init.py", line 98, in mail_admins
mail.send(fail_silently=fail_silently)
File "/usr/local/lib/python2.7/site-packages/django/core/mail/message.py", line 284, in send
return self.get_connection(fail_silently).send_messages([self])
File "/usr/local/lib/python2.7/site-packages/django/core/mail/backends/smtp.py", line 92, in send_messages
new_conn_created = self.open()
File "/usr/local/lib/python2.7/site-packages/django/core/mail/backends/smtp.py", line 51, in open
self.connection = connection_class(self.host, self.port, **connection_params)
File "/usr/local/lib/python2.7/smtplib.py", line 239, in init
(code, msg) = self.connect(host, port)
File "/usr/local/lib/python2.7/smtplib.py", line 295, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/local/lib/python2.7/smtplib.py", line 273, in _get_socket
return socket.create_connection((port, host), timeout)
File "/usr/local/lib/python2.7/socket.py", line 567, in create_connection
raise error, msg
error: [Errno 111] Connection refused
请问这个问题可能由什么原因造成?
The text was updated successfully, but these errors were encountered: