Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signals should be able to efficiently return to an old value #256

Open
DavidANeil opened this issue Jan 22, 2025 · 8 comments
Open

Signals should be able to efficiently return to an old value #256

DavidANeil opened this issue Jan 22, 2025 · 8 comments

Comments

@DavidANeil
Copy link

DavidANeil commented Jan 22, 2025

In certain cases a Producer changes to a new value, but then later resets to an older value. In the current spec / implementation all downstream Consumers must recompute, even if the value of all signals they are consuming are equal to the previous time they were consumed.

Imagine this scenario (example taken from #197):

let n = 0;
const s = new Signal.State(0);
const c = new Signal.Computed(() => (n++, s.get()));
c.get(); // this triggers a computation of c (so n passes from 0 to 1)
s.set(1); // this does not trigger anything because the computation of c is lazy
s.set(0); // let's come back to the previous value of s
c.get(); // if we recompute c, there is no need to call the function as the last time c was computed was with s = 0
expect(n).toBe(1); // so I am expecting n to still be 1
// but this test fails: there is an (unneeded) re-computation

The best way for a Producer to be able to return an old value and have consumers "just work" would be to allow it to return to an old value version. In order to support the ability to return to an old version this check must be removed from the implementation:

    // First check the versions. A mismatch means that the producer's value is known to have
    // changed since the last time we read it.
    if (seenVersion !== producer.version) {
      return true;
    }

It can be improved by allowing the Producer to be asked if it can possibly return the seenVersion:

    if (seenVersion !== producer.version && !producer?.canMaybeProduceVersion(seenVersion)) {
      return true;
    }

Then the Producer implementation will be allowed to store a cache of N previous values, mapping to the previous version numbers.
If the seenVersion is in that map, then canMaybeProduceVersion will return true, forcing a recalculate.

For a State, when a new value is being set; For a Computed, when a new value is calculated:
The new value will be compared against cached values using the equals callback. The first entry that returns positively will be used, and the value version will be reset to that number.

I think this should be an option during the construction of State and Computed like

const computed = new Signal.computed(() => state.get(), {cacheSize: 1});

If #255 is implemented then a value will be considered "dropped" when it is evicted from the cache. A newly calculated value will be immediately dropped if a cached value is going to be used instead.

@R2D221
Copy link

R2D221 commented Jan 22, 2025

I'm not sure if versions is the right way to see it, considering a Computed can have many sources.

Consider this scenario:

const a = new Signal.State(1); //version 1
const b = new Signal.State(2); //version 1
const c = new Signal.State(3); //version 1

const abc = new Signal.Computed(() => a + b + c);
assert(abc.get() === 6); //combined versions 1,1,1

const watcher = new Signal.subtle.Watcher(() => {});
watcher.watch(abc);

a.set(11); //version 2
c.set(33); //version 2

a.set(1); //back to version 1

assert(abc.get() === 36); //combined versions 1,1,2 - must recompute

Even with something as simple as 3 sources, memoizing previous versions would become a mess. Now imagine 10 sources.

I think the reasonable way to have your original scenario (isEven.get()) not recompute is if only by saving the last known value for each source.

Something like this:

const abc = new Signal.Computed(() => a + b + c);
watcher.watch(abc);
assert(abc.get() === 6);

/*
sources: [
    { signal: a, lastKnownValue: 1 },
    { signal: b, lastKnownValue: 2 },
    { signal: c, lastKnownValue: 3 },
]
*/

a.set(11);
c.set(33);

a.set(1);
c.set(3);

/*
sources: [
    { signal: a, lastKnownValue: 1 }, // same
    { signal: b, lastKnownValue: 2 }, // same
    { signal: c, lastKnownValue: 3 }, // same
                                        // = NOT RECOMPUTE
]
*/
assert(abc.get() === 6);

a.set(11);

/*
sources: [
    { signal: a, lastKnownValue: 1 }, // different
    { signal: b, lastKnownValue: 2 }, // same
    { signal: c, lastKnownValue: 3 }, // same
                                        // = RECOMPUTE
]
*/
assert(abc.get() === 16);

@DavidANeil
Copy link
Author

So my org's legacy implementation of Signals does that kind of thing, where it stores the lastValue on the edges of the dependency graph. It is a design decision we regret. Storing previous values on the node has a few advantages:

  • fewer copies of the data
  • lifecycle of data matches the lifecycle of the object
  • edges are just a Map<Signal, number>, which is much cheaper than a Map<Signal, {lastValue: unknown; lastValueId: number}>
  • Checking if something is stale is just a numerical === check, which is much faster than an unknowably complex equals check
  • Storing values on the edges requires explicit memory cleanup/disposal, otherwise nodes are rarely garbage collectible.

Even with something as simple as 3 sources, memoizing previous versions would become a mess. Now imagine 10 sources.

The last seen version is already stored in the polyfill implementation, downstream nodes don't need to change anything to take advantage of this.

@divdavem
Copy link

Cf also #197
In tansu, we store the last value and version number on the edges, and this works well. We only call equal if the version number changed and we cache the result.

@DavidANeil
Copy link
Author

I like using the internal cache because then it is behavior that you can specifically opt into in cases where it is actually helpful, and has almost no cost on the system as a whole.
Storing the values on the edges requires the whole system to cooperate for what might be a very small benefit.

Additionally, storing the values on the edges can lead to significant memory leaks. Our legacy implementation had to implement a toComparable callback on Signals, and then the equals is in charge of comparing those. This introduces additional complexity to the system, which can be entirely avoided with the cache.

Thanks for the examples in #197, I am going to edit the original post to use those, since I think they are clearer.
I considered closing this as a duplicate, but then I realized #197 is a PR, not an issue, so I'll leave this open.

@divdavem
Copy link

What do you mean by your toComparable callback?

Here are some advantages of storing values on the edges (rather than having a cache on the node):

  • the whole system is guarantied not to recompute dependencies when it is not needed, I think this is very important
  • there is no need to configure an arbitrary cache size on each node
  • the number of old values that are kept in memory exactly corresponds to what is required to avoid useless computations in the future
  • if all dependent signals are up-to-date, there is no old value stored anywhere
  • equal is only called between values that are useful to compare to avoid a re-computation at the time it is needed (vs preemptively with the whole cache if the cache is on the node)

Regarding memory, it is true that keeping a lot of references to dirty signals can lead to having a lot of previous values kept in memory, but it is already the case currently, as mentioned in #254. I would not call that a memory "leak". That memory is not lost, it is linked to the dirty signals, it is used when updating a signal and freed when no longer needed (when the signal is up-to-date, or when it is garbage collected). As suggested in #254, maybe we could also have an explicit free function.

@DavidANeil
Copy link
Author

We have some Puppeteer tests that load our application, and then puts the application in a state where we expect there to be 100s of instances of class C, then it puts it into a state where we expect there to be 0 instances. The test runs GC then asserts that there are 0 instances of the class on the heap.
If the values are stored on the edge then the test fails, as dozens of instances have leaked into reactive edges.
Each of these objects contains ~10KB of data, so leaking them like that is catastrophic.

So we have a toComparable callback that is defined on signals which in this case converts the value to an integer that can be compared to determine if the computed value is equivalent.
The Signals store the lastValue and the lastComparable, edges store only the lastComparable.

A downside of storing the values on the edges is that you have to remember to opt in to the toComparable to prevent leaking memory where it matters, whereas an extraCacheSize option would default to not being greedy with memory, and could be enabled where the memory cost is worth it.

if all dependent signals are up-to-date, there is no old value stored anywhere

Since the signal knows which valueVersions consumers last used, it is trivial for it to drop cached values that correlate to valueVersions that no consumers are depending on.

the number of old values that are kept in memory exactly corresponds to what is required to avoid useless computations in the future

No, they correspond to the number of dependency edges there are, since each edge holds that data in memory. Obviously if they are each holding a reference to the same object then the only duplicated data is the reference.

if all dependent signals are up-to-date, there is no old value stored anywhere

Same as the last point: even when you aren't holding old values, you are still holding many copies of the most recent value.

Our application has ~1 million dependency edges, so each extra byte on each dependency edge results in 1MB of additional memory usage.

@divdavem
Copy link

A downside of storing the values on the edges is that you have to remember to opt in to the toComparable to prevent leaking memory where it matters, whereas an extraCacheSize option would default to not being greedy with memory, and could be enabled where the memory cost is worth it.

I prefer having by default "the whole system is guarantied not to recompute dependencies when it is not needed" and then I can opt-out when I know I have to (because of memory constraints), rather than having to opt-in to what is considered by many as the normal behavior of a computed.

Since the signal knows which valueVersions consumers last used, it is trivial for it to drop cached values that correlate to valueVersions that no consumers are depending on.

You can easily have a counter of the maximum number of consumers that are using a valueVersion, which allows to get rid of some values when it reaches 0, but it may not be the exact number if some consumers used some old version and have been garbage collected. Or are you keeping a reference to all consumers with a WeakRef as mentioned in #252 (with an impact on performance) ? or do you use a FinalizationRegistry ?

@DavidANeil
Copy link
Author

and then I can opt-out when I know I have to

There is a fixed overhead of another ~8 bytes per edge that I can't think of a reasonable way to opt out of.

Or are you keeping a reference to all consumers

The "reference to the consumer" is the "edge" of the dependency graph, so you need to store that one way or another for the signal to be reactive, and you would definitely need to store it to allow for storing lastValues on the edges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants