Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #2954 and alternative way to give values to memoized function #2955

Merged
merged 1 commit into from
Nov 21, 2023

Conversation

pzinn
Copy link
Contributor

@pzinn pzinn commented Oct 8, 2023

  • fix for memoizeValues issue #2954: now all memoized functions have the same function body. I ended up removing initialValues from the codeHelper since they appear in the list of symbols anyway.
  • I also implemented the suggestion in the comments of memoizeValues issue #2954 to give values to a memoized function after defining it and added an example in the doc.

@d-torrance
Copy link
Member

Would it make things any simpler if memoize returned instances of some new subclass of FunctionClosure, maybe MemoizedFunction?

@pzinn
Copy link
Contributor Author

pzinn commented Oct 8, 2023

yeah that's another option, I've been thinking about it. I see nothing wrong with it (in fact, I'm experimenting with something like this for mode-dependent functions like print or show). It'd be an easy change, something like adding

MemoizedFunctionClosure = new Type of FunctionClosure
...
new MemoizedFunctionClosure from ( x -> ( ... ) )

I don't know what others' opinions are...

@mahrud
Copy link
Member

mahrud commented Oct 9, 2023

Like I said on Zulip, I wish memoize didn't cache results in non-garbage collected places, and until that is fixed I'd rather keep its use limited (e.g. strictly not used by packages, only for personal use).

I also pointed out that one problem of the Function Thing = ... notation is that it's really hard to search for in the code, and hence hard to figure out where something is being incorrectly cached when debugging. I used to find it cute, but because of this now I'd rather remove that syntax entirely (or at least limit it to personal use, not packages and the core).

@pzinn
Copy link
Contributor Author

pzinn commented Oct 9, 2023

I don't understand your comment on storefuns: currently there's no interaction between memoize and storefuns.
As to the second part, I think this is fixable, so rather than remove or not use the feature, one should improve debugging tools.

@mahrud
Copy link
Member

mahrud commented Oct 9, 2023

I don't understand your comment on storefuns: currently there's no interaction between memoize and storefuns.

Sorry, I got confused. I meant this non-garbage collected hash table in remember.m2:

          values := new MutableHashTable;

As to the second part, I think this is fixable, so rather than remove or not use the feature, one should improve debugging tools.

How do you propose debugging the situation that I described above? It's pretty much impossible to use search/grep/regex to find this. If you have a suggestion you should add that to the documentation as well.

@pzinn
Copy link
Contributor Author

pzinn commented Oct 9, 2023

maybe I don't understand how garbage collecting works in M2. in the stupid example

fib=memoize(n -> if n <= 1 then 1 else fib(n-1) + fib(n-2))

why wouldn't values get gc-ed when say fib is removed?

@mahrud
Copy link
Member

mahrud commented Oct 9, 2023

The trouble is that the memoized function is itself almost never removed. Suppose you have a memoized function myRank which takes in modules, and you have functions that generate semi-temporary modules over new rings and return the result of myRank. If people use memoize then the temporary rings and modules remain alive inside the values hash table and are not garbage collected until myRank is removed! You might say the correct thing is to use cacheValue, but there's no way to prevent users from memoizing functions that take in modules or rings!

@pzinn
Copy link
Contributor Author

pzinn commented Oct 9, 2023

OK, there are cases when memoize shouldn't be used, and cases when it's perfectly fine (say, when arguments are numbers), we can all agree.

@pzinn
Copy link
Contributor Author

pzinn commented Oct 9, 2023

incidentally I notice two strategies for accessing the closure of a function: either using frame, or using localDictionaries. what are the pros/cons of each?

@pzinn
Copy link
Contributor Author

pzinn commented Oct 11, 2023

Also I would like to hear others' thoughts on @d-torrance 's suggestion to introduce say MemoizedFunctionClosure. It would definitely make the code of memoizeClear and memoizeValues cleaner and would in particular fix the issue that this PR is for.

@pzinn
Copy link
Contributor Author

pzinn commented Oct 26, 2023

I've been thinking about the pros and cons of memoize vs cacheValue.
Maybe I'm mistaken, please correct me if I'm wrong (@mahrud ?) but one major issue of cacheValue is when it applies to objects which we can create multiple copies of, like modules. These objects will be undistinguishable with === and yet they'll each have their own cache table... which means if we compute something for one of them, it will not be memorized for use for any of the other undistinguishable objects. On the other hand, memoize will cache them all at once.

@DanGrayson
Copy link
Member

Like I said on Zulip, I wish memoize didn't cache results in non-garbage collected places (so in particular not storefuns), and until that is fixed I'd rather keep its use limited (e.g. strictly not used by packages, only for personal use).

There are no non-garbage-collected places.

@mahrud
Copy link
Member

mahrud commented Oct 26, 2023

There are no non-garbage-collected places.

What happens if you run this code?

f = memoize(n -> koszul vars(QQ[x_1..x_5]))
scan(10000, f)

The output is null, but the rings, modules, and complexes, are not garbage collected.

@mahrud
Copy link
Member

mahrud commented Oct 26, 2023

Here's a more realistic example:

mbasis = memoize basis
hilb = (n, d) -> numcols mbasis(d, QQ(monoid[vars(1..n)]))
apply(100, hilb_5)

You can watch memoize eat up all of your memory.

To be clear, mbasis is a function I do use locally because with careful programming it can be extremely useful, but it would 100% be abused if something like this was in the Core.

@mahrud
Copy link
Member

mahrud commented Oct 26, 2023

These objects will be undistinguishable with === and yet they'll each have their own cache table... which means if we compute something for one of them, it will not be memorized for use for any of the other undistinguishable objects. On the other hand, memoize will cache them all at once.

You're not wrong, but like in my example above, this is too easy to abuse if it were used in Core.

My preferred solution for this would be having an extra context object storing the cached information which can be garbage collected, so the user is forced to keep track of it. Maybe if instances of your proposed MemoizedFunctionClosure by default accepted an optional context object, like this:

f = memoize(n -> (...))
ctx = new MutableHashTable
f(10, Context => ctx)

In fact, the context object can be the parent ring, so that when the ring goes out of scope (and hence also the module), so does the cached data.

@pzinn
Copy link
Contributor Author

pzinn commented Oct 26, 2023

for the record, this whole discussion is not directly related to the PR, which does not change the way memoize works, but simply fixes a bug and adds a minor feature.

@mahrud
Copy link
Member

mahrud commented Oct 28, 2023

Agreed. I'm not objecting to this PR. I do like the added features, and would use them personally.

As to the second part, I think this is fixable, so rather than remove or not use the feature, one should improve debugging tools.

The specific reason I prefer avoiding using the Function Thing = ... notation in Core and packages is very negative experiences with bugs caused by this line in particular:

-- cache poincare
poincare cokernel m1 = hf);

I think @mikestillman also mentioned that it's really difficult to debug issues with this kind of caching, so I'm curious if you have ideals for how to improve debugging tools.

@pzinn
Copy link
Contributor Author

pzinn commented Oct 28, 2023

I do want to think about this debugging issue some more.
Your example reminded me of this 5-year old bug report: #732 -- I checked and the bug is still there ;-)

@mahrud
Copy link
Member

mahrud commented Nov 11, 2023

Could you rebase on top of upstream/development and force push instead of having multiple merge commits? This way the history remains linear.

@pzinn
Copy link
Contributor Author

pzinn commented Nov 11, 2023

what's the exact rebase command that's needed? so I don't spend too much time on this...

@pzinn
Copy link
Contributor Author

pzinn commented Nov 11, 2023

BTW the only reason I'm doing all these pulls, as you probably guessed, is that the build keeps failing for reasons that are most likely unrelated to the PR itself.

@mahrud
Copy link
Member

mahrud commented Nov 11, 2023

I was actually confused about why you merged in the first place, since it shows that your first commit (before the merges) passed all tests:
image

The command is:

git fetch upstream
git rebase upstream/development
git push --force

If rebase runs into conflicts and you don't know how to fix them, you could also just go back in history to the first commit that passed:

git reset --hard HEAD~2
git push --force

@pzinn
Copy link
Contributor Author

pzinn commented Nov 12, 2023

thanks, done. and now I'm really confused, thought the builds had always failed...

@DanGrayson DanGrayson merged commit 3be5db8 into Macaulay2:development Nov 21, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants