Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory use in the combine function exceeds (by factor of 2-4 or more) the memory limit passed in #638

Closed
mwcraig opened this issue Aug 10, 2018 · 6 comments · Fixed by #642

Comments

@mwcraig
Copy link
Member

mwcraig commented Aug 10, 2018

The combine function allows one to set a limit on the amount of memory used. Unfortunately, the function actually uses more memory than that limit when performing the image combination.

Still working on tracking down the root cause, but will post in a moment some example memory profiles.

@mwcraig
Copy link
Member Author

mwcraig commented Aug 11, 2018

Running the memory profiles

To run the memory profiles shown below:

  1. install memory_profiler. conda install memory_profiler (or use pip if you prefer)
  2. Check out the branch mwcraig/fix-open-files-in-combiner. These tests from that branch were run on with commit c20a7bf
  3. Copy the file run_with_file_number_limit.py to run_with_file_number_limit_copy.py (or something like that). You are making a copy so that when you check out master later you have a copy of the script.
  4. Change line 185 from resource.setrlimit(resource.RLIMIT_NOFILE, (n, n)) to resource.setrlimit(resource.RLIMIT_NOFILE, (n + 20, n + 20)). Or just grab the file from this gist
  5. Run the profile with: mprof run --interval 0.05 run_with_file_number_limit_copy.py --overhead 6 --kind fits --open-by combine-chunk --size 2000 --frequent-gc 20
  6. Open the graph mprof produces with mprof plot.

Note: If you want the marks where a particular function begins/ends you need to mark the function with a decorator @profile. You do not need to import profile. I do not think the markers are necessary, but included them below to confirm when memory is being used/freed.

Parameters of profiling runs

  • Each run generates several FITS files of dimension 2000×2000, all zeros. [Fun fact: np.zeros does not allocate memory when called. It allocates when the array is first used.] This is not a good set of images for profiling clipping because nothing will be clipped.
  • The memory limit for the combine function is the size of one file, 32000000 bytes. That is probably a poor choice as it makes interpretation of the results a little confusing.
  • The number of files generated and other parameters, like the memory limit imposed, are printed in the terminal.

Interpreting the output

The graphs below show snippets from the memory profiles of two commits, one the tip of the file refactor, the other master (not the tip of master as of today, it is a few commits back). Prior to the interval shown memory use in both cases has reached roughly 150MB. That is prior to the beginning of the actual image combination.

The cyan "brackets" show when Combiner.__init__ is entered (opening bracket [) and exited (closing bracket ]). The gold brackets show when Combiner.average_combine is entered exited.

After refactor to limit number of open files (commit c20a7bf)

mem-profile-refactor1

Discussion

I believe the jump from 150MB to what looks like a new "baseline" of 180MB is due to the creation of the CCDData object to hold the results. My expectation from the documentation is that during the combination, memory use for performing the combination will not exceed the limit set (32000000 MB), i.e. that memory use will not exceed 180 + 32 = 212MB. In fact, peak memory usage regularly reaches roughly 270MB, roughly 2x what I expected.

The sharp drops after average_combine exits are due to the explicit deletion of the tile_combiner and a couple others; most of the drop is from the tile_combiner.

Recent master (commit c289d4e)

mem-profile-master

Discussion

A few things jump out here. One is that the peak memory usage is much higher above the 150MB baseline. Peak is 290MB, and the floor on memory use never drops below 240MB, i.e. about 60MB above starting, and 30MB above the limit. Not sure why that is, exactly, but the new version is definitely an improvement.

Summary

The easiest fix here is to increase the number of chunks the file is broken into by adding a "fudge factor" of 2 or 4. I don't think that will slow the code appreciably and I don't think we will be able to accurately predict what memory usage will be in all cases. As an example, I plan to open a PR to switch us to using astropy sigma clipping instead of our own. Currently astropy sigma clipping makes copies of its input (because sorting is done in-place, I think) so memory usage will depend on whether sigma clipping is done. More broadly, we can't predict what memory use might be in other functions the user passes in.

To some extent, I think that is not our problem to solve. On the other hand, it seems safe to assume that memory use for combination will be at last twice the memory limit if the number of chunks is calculated based only on the size of the images to be combined and it isn't hard to imagine it being 4 times.

I don't think we have to be super strict about the amount of memory....but in the PR I'll try to come up with a test we can add to make sure there are not any bad regressions on this front.

@MSeifert04
Copy link
Contributor

MSeifert04 commented Aug 11, 2018

Thanks Matt for the careful analysis.

I wonder if it wouldn't be enough (especially because of your "we will be able to accurately predict what memory usage will be in all cases. " statement) to fix the documentation to clearly state that "*roughly that amount of input data will be used and that the effective memory usage during combination can significantly higher (in general 2-4 times is to be expected)".

We probably should investigate if we can reduce the memory-usage but given that we allow arbitrary combine and uncertainty functions it will be completely impossible to know what fudge factor makes sense. Even in the default case we would rely on numpy implementation details...

@MSeifert04
Copy link
Contributor

As an example, I plan to open a PR to switch us to using astropy sigma clipping instead of our own.

Just out of interest: Did you happen to do memory profilings on #632? It probably isn't something we should use as-is (numba is cool but without anaconda a real pain to install & use) but an equivalent Cython/C approach wouldn't be too complicated.

@mwcraig
Copy link
Member Author

mwcraig commented Aug 12, 2018

Just out of interest: Did you happen to do memory profilings on #632?

Can do today; would be interesting to see what happens.

@mwcraig
Copy link
Member Author

mwcraig commented Aug 12, 2018

We probably should investigate if we can reduce the memory-usage

I'll try to write a more extensive set of profile cases today and write a more proper memory test runner instead of hacking the files-open runner. To some extent the examples above are really "toy" cases since the memory limit is being set to the size of one image. In a more real-world case it may be that the memory overhead is much less....

@mwcraig
Copy link
Member Author

mwcraig commented Aug 14, 2018

Updated discussion and more (and more realistic) profiles in #642

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants