Referential transparency and independence of distributions #303

Hazelfire · 2022-04-16T05:55:05Z

Hazelfire
Apr 16, 2022
Maintainer

I just stumbled across a relatively large issue that we need to have a discussion about at some point:

I'm not sure what we should do about some of the semantics of these distributions. Say for instance:

normal(0, 1) - normal(0, 1)
Should probably return normal(0, sqrt(2)) as it currently does.

The only awkward thing is now, if I was to define a variable x:

x = normal(0, 1)
x - x

I feel intuitively, I would expect the answer to be 0. Because the distributions are not independent. The current version of squiggle disagrees with me, and returns the same answer as above.

I have a practical example of this problem when creating the GiveDirectly model:

amount_consumed = 20
baseline_consumption = 200 to 350
log_increase_in_initial_consumption = normalize(log(amount_consumed + baseline_consumption)) - normalize(log(baseline_consumption))
log_increase_in_initial_consumption

(Please ignore the normalize expressions, it's an artefact of working around #294). Here what I mean is the two references to baseline_consumption in log_increase_in_initial_consumption are not independent. Squiggle assumes they are, and gives the following result:

What squiggle has output is nonsense to me, because it gives a decent probability mass on giving money to people actually decreasing their consumption!

My intuition is that we should fix this, and the above expression should not have any probability mass below 0. This has an obvious problem, which is my intuition violates referential transparency. It would mean that squiggle is no longer an actually pure language.

If I were to implement my intuition, then a number of changes would likely have to happen:

Distribution constructors like normal(5, 2) would have to be impure functions, and return a unique identifier with each call.
The distribution operations would become much more complicated, tracking whether any combination of distributions that are not independent are combined together. I'm not really sure how I would even begin with that.

If anyone can come up with a way to talk about distributions that are not interdependent without breaking referential transparency, I'd love to know!

Hazelfire · 2022-04-16T07:47:33Z

Hazelfire
Apr 16, 2022
Maintainer Author

A possible solution to this, which could keep purity + referential transparency, would be to allow an optional "name" argument to distributions, like: normal(0, 1, {name: "baseline_consumption"). This name could be used by the operations to determine whether it should behave like it's the same variable. You could then have a syntactic sugar:

baseline_consumption ~ normal(0, 1)

Which is syntactic sugar for

baseline_consumption = normal(0, 1, {name: "baseline_consumption"})

If the argument is not defined, it is assumed that the distribution is independent to all other distributions.

Could be quite a bit past alpha

0 replies

OAGr · 2022-04-16T11:38:08Z

OAGr
Apr 16, 2022
Maintainer

Some of this (with sampleSet distributions) will be addressed here:
#287

@umuro said that it will be fixed with the upcoming Function functionality.
https://eaforecasting.slack.com/archives/C030T49UHSS/p1650069587390819?thread_ts=1649986604.795659&cid=C030T49UHSS

Current code is not making a difference between variables and functions. They are both lazy expressions. However, variables has to be eager instead of lazy. I am yet to write a switch statement that recognizes 0 parameters functions ( and 0 parameter functions happen to be eager variables).
I already know this. It will be gone along with the functions release.
In short, I first developed lazy evaluation for the sake of functions. Then made variables as parameterless functions. let statments will reduce expressions if everything is bound (

However, we do need a larger solution, and this will be more tricky, as you point out.

SampleSet distributions can address this easily enough, as the samples can be ordered. However, it's not clear how we can best do this for symbolic+pointSet distributions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Referential transparency and independence of distributions #303

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Referential transparency and independence of distributions #303

Hazelfire Apr 16, 2022 Maintainer

Replies: 2 comments

Hazelfire Apr 16, 2022 Maintainer Author

OAGr Apr 16, 2022 Maintainer

Hazelfire
Apr 16, 2022
Maintainer

Hazelfire
Apr 16, 2022
Maintainer Author

OAGr
Apr 16, 2022
Maintainer