Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job manager crashed while processing running jobs #193

Open
boegel opened this issue Jul 3, 2023 · 3 comments
Open

job manager crashed while processing running jobs #193

boegel opened this issue Jul 3, 2023 · 3 comments

Comments

@boegel
Copy link
Contributor

boegel commented Jul 3, 2023

/usr/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.16) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
job manager just started, logging to '/mnt/shared/home/bot/eessi-bot-software-layer/eessi_bot_job_manager.log', processing job ids ''
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/mnt/shared/home/bot/eessi-bot-software-layer/eessi_bot_job_manager.py", line 640, in <module>
    main()
  File "/mnt/shared/home/bot/eessi-bot-software-layer/eessi_bot_job_manager.py", line 609, in main
    job_manager.process_running_jobs(current_jobs[rj])
  File "/mnt/shared/home/bot/eessi-bot-software-layer/eessi_bot_job_manager.py", line 373, in process_running_jobs
    repo = gh.get_repo(repo_name)
  File "/mnt/shared/home/bot/.local/lib/python3.6/site-packages/github/MainClass.py", line 330, in get_repo
    headers, data = self.__requester.requestJsonAndCheck("GET", url)
  File "/mnt/shared/home/bot/.local/lib/python3.6/site-packages/github/Requester.py", line 355, in requestJsonAndCheck
    verb, url, parameters, headers, input, self.__customConnection(url)
  File "/mnt/shared/home/bot/.local/lib/python3.6/site-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 502 {"message": "Server Error"}

Last bit of log:

[20230703-T12:22:41] job manager main loop: iteration 8971
[20230703-T12:22:41] job manager main loop: known_jobs='5705,5706'
[20230703-T12:22:41] run_subprocess(): 'get_current_jobs(): squeue command' by running '/usr/bin/squeue --long --user=bot' in directory '/mnt/shared/home/bot/eessi-bot-software-layer'
[20230703-T12:22:41] run_cmd(): Result for running '/usr/bin/squeue --long --user=bot' in 'None
           stdout 'Mon Jul 03 12:22:41 2023
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
              5705   compute bot-buil      bot  RUNNING       8:27 1-00:00:00      1 fair-mastodon-c7g-4xlarge-0001
              5706   compute bot-buil      bot  RUNNING       7:57 1-00:00:00      1 fair-mastodon-c6g-4xlarge-0002
'
           stderr ''
           exit code 0
[20230703-T12:22:41] job manager main loop: current_jobs='5705,5706'
[20230703-T12:22:41] job manager main loop: new_jobs=''
[20230703-T12:22:41] job manager main loop: running_jobs='5705,5706'
[20230703-T12:22:41] Found metadata file at /mnt/shared/home/bot/eessi-bot-software-layer/jobs/submitted/5705/_bot_job5705.metadata
@boegel
Copy link
Contributor Author

boegel commented Jul 3, 2023

It looks like there was a problem with the connection to GitHub (see also #20)

Simply restarting the bot worked fine, finished jobs were processed.

@boegel
Copy link
Contributor Author

boegel commented Jul 4, 2023

Same crash happened again. Bot restarted a couple of minutes after the crash.

@boegel
Copy link
Contributor Author

boegel commented Sep 20, 2023

Another crash, but not exactly the same:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/mnt/shared/home/bot/eessi-bot-software-layer/eessi_bot_job_manager.py", line 640, in <module>
    main()
  File "/mnt/shared/home/bot/eessi-bot-software-layer/eessi_bot_job_manager.py", line 609, in main
    job_manager.process_running_jobs(current_jobs[rj])
  File "/mnt/shared/home/bot/eessi-bot-software-layer/eessi_bot_job_manager.py", line 373, in process_running_jobs
    repo = gh.get_repo(repo_name)
  File "/mnt/shared/home/bot/.local/lib/python3.6/site-packages/github/MainClass.py", line 330, in get_repo
    headers, data = self.__requester.requestJsonAndCheck("GET", url)
  File "/mnt/shared/home/bot/.local/lib/python3.6/site-packages/github/Requester.py", line 355, in requestJsonAndCheck
    verb, url, parameters, headers, input, self.__customConnection(url)
  File "/mnt/shared/home/bot/.local/lib/python3.6/site-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 500 null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants