Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probability package updates #3478

Merged
merged 5 commits into from
Sep 18, 2024
Merged

Conversation

d-torrance
Copy link
Member

@d-torrance d-torrance commented Sep 14, 2024

Currently, the probability distribution of random(QQ) is kind of unusual. If X and Y both have the discrete uniform distribution on {1,...,h} for some positive integer h, then we get a sample from the distribution of X/Y. This maybe isn't great. For example, P(X/Y = 1) = 1/h. Indeed:

i1 : tally apply(100, i -> random(QQ, Height => 3))

o1 = Tally{1 => 29}
           2 => 12
           3 => 11
           1
           - => 9
           3
           1
           - => 12
           2
           2
           - => 14
           3
           3
           - => 13
           2

Instead, we propose sampling from the standard normal distribution and rounding the results to the nearest rational number with denominator bounded by h (using the Farey sequence). I think I mentioned this idea recently in some discussion, but I can't find it. After the change:

i1 : tally apply(100, i -> random(QQ, Height => 3))

o1 = Tally{-1 => 10 }
           -2 => 1
           0 => 12
           1 => 6
           2 => 2
             1
           - - => 5
             3
             1
           - - => 8
             2
             2
           - - => 12
             3
             3
           - - => 3
             2
             4
           - - => 6
             3
             5
           - - => 3
             3
             7
           - - => 1
             3
           1
           - => 6
           2
           1
           - => 15
           3
           2
           - => 3
           3
           3
           - => 3
           2
           4
           - => 2
           3
           7
           - => 1
           2
           7
           - => 1
           3

This PR also has a few other related changes:

  • Make the Farey sequence approximation routine available at top-level (but unexported).
  • Add a function to the engine that returns a random RR from the standard normal distribution and use it in the Probability package.
  • Improve the Probability package documentation.

This possibly closes #999, although that suggests using a uniform distribution.

@mahrud
Copy link
Member

mahrud commented Sep 15, 2024

Also see #1229. Is there a standard data type for distributions?

@d-torrance
Copy link
Member Author

Is there a standard data type for distributions?

There is in the Probability package -- see https://macaulay2.com/doc/Macaulay2/share/doc/Macaulay2/Probability/html/___Probability__Distribution.html

@mahrud
Copy link
Member

mahrud commented Sep 16, 2024

I know it'll be a lot of work, but I think it would be useful down the road if I could pass a distribution as an option to random.

@d-torrance
Copy link
Member Author

I know it'll be a lot of work, but I think it would be useful down the road if I could pass a distribution as an option to random.

That's already possible in the Probability package -- see random(ProbabilityDistribution).

I suppose you're suggesting things like random ring elements and random matrices? That would be pretty easy to implement in the package, I think, by adding more methods like random(Module, Module, ProbabilityDistribution).

@mahrud
Copy link
Member

mahrud commented Sep 17, 2024

I guess I want everything to be compatible, so random(QQ, ...) can be changed to random(QQ^n, ...) or random(QQ^m, QQ^n, ...), and further random(S = QQ[x], d, ...) or random(S^n, S^{...}, ...) would also pull coefficients from the same distributions. I wouldn't be opposed to preloading the Probability package if necessary.

More specifically about this PR, why did you choose normal distribution over uniform distribution?

@d-torrance
Copy link
Member Author

Good ideas! I'm going to remove most of the engine stuff from this PR (keeping the Probability package updates) and mull it over a bit for a future PR.

I went with normal because of the central limit theorem -- it shows up so frequently in applications! And also we don't have to worry about the support -- what interval would we use for a uniform distribution? But if we were to change the default for random QQ, then we should probably also change it for RR and CC as well.

@d-torrance d-torrance changed the title Use normal distribution for random(QQ) Probability package updates Sep 17, 2024
@d-torrance
Copy link
Member Author

I've simplified this to now just add rawRandomRRNormal to the engine, use it in the Probability package, and improve some documentation.

@mahrud
Copy link
Member

mahrud commented Sep 17, 2024

what interval would we use for a uniform distribution?

I think since random ZZ uses a uniform distribution in [0,10), a uniform distribution by default for QQ/RR/CC makes the most sense, and I would imagine the support should be fixed to be the interval [0,1), allowing users to scale or translate it as they like. Also, I liked using the Farey sequence here!

@d-torrance
Copy link
Member Author

That makes sense! I'll work on a revised random(QQ) in another PR.

Copy link
Member

@mahrud mahrud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily in this PR, but it would be good to rename rawRandomRR to rawRandomRRuniform for clarity.

@d-torrance d-torrance merged commit da74d02 into Macaulay2:development Sep 18, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants