Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring Memory Utilization #294

Open
StephanieArteaga opened this issue Aug 21, 2024 · 1 comment
Open

Monitoring Memory Utilization #294

StephanieArteaga opened this issue Aug 21, 2024 · 1 comment

Comments

@StephanieArteaga
Copy link

I am running dsub on the AoU platform and was wondering if there is a way to monitor memory utilization to help troubleshoot failed jobs.

@mbookman
Copy link
Contributor

Hi @StephanieArteaga !

This is a good question. dsub does not have built-in memory utilization insights, but there may be a few options.

What I have done in the past (admittedly it has been a while) was to user the GNU time command, which can dump out the Maximum resident set size. This can be used to validate that memory usage is indeed high and potentially can help you narrow down the amount that you need.

For the latter, you'd need to run a test task on a larger VM (so that it runs to completion) and hopefully then glean from the time output a sense of memory used by your command.

One tricky bit is that GNU time isn't necessarily in all distributions (I don't believe) and is different than the bash built-in time command. When present, it is typically under /usr/bin/time. For example:

$ /usr/bin/time -v ls
<output redacted>
	Command being timed: "ls"
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 66%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2560
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 134
	Voluntary context switches: 1
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

So you'll want to check that the Docker image that you use has /usr/bin/time in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants