Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: change variable inclusion mechanism #2452

Closed
sgouezel opened this issue Aug 23, 2023 · 8 comments · Fixed by #4814
Closed

RFC: change variable inclusion mechanism #2452

sgouezel opened this issue Aug 23, 2023 · 8 comments · Fixed by #4814
Labels
Mathlib4 high prio Mathlib4 high priority issue P-high We will work on this issue RFC Request for comments

Comments

@sgouezel
Copy link

sgouezel commented Aug 23, 2023

The current Lean 4 mechanism for including variables is the following: When starting the proof of a theorem (or the body of a definition, it doesn't make a difference), all variables in the context are included and may be used in the proof. When the proof is over, only those variables that have indeed been used are kept for the statement of the theorem (i.e., all the other ones are discarded). This looks nice, but we have faced several issues in mathib 4 because of this:

  • if in the context there is variable {n : ℕ} (hn : 1 < n) (h'n : 2 < n) and a theorem using hn but not h'n, then induction n will also pull h'n in the assumptions of the theorem. So will subst. The solution is to clear h'n before the induction even though h'n should not be in sight anywhere.
  • When cleaning up a proof, say by using lemma B instead of lemma A in the proof, new typeclass assumptions may be pulled in by lemma B and weaken the initial statement. All this will go unnoticed (no linter can help here).
  • In files which are heavy on variables or typeclass assumptions (say files on vector bundles), the goal view is cluttered by dozens of lines of assumptions which are not relevant for the lemma being currently proved, making it really hard to use.
  • Refactors in a file may change the statements of theorems in another file without breaking anything: if the proof of a theorem involves a typeclass classA α, and in the context there are assumptions classB α and classC α, both of them implying classA α, then the above mechanism will only keep classB α or classC α, and the one which will be kept depends on the details of the path chosen by typeclass inference. Therefore, adjusting instance priorities somewhere else may change which one will appear in the assumptions of the theorem.

No linter can guard against this lack of robustness. Therefore, most mathlib folks have been convinced that a less clever but more predictable behavior would be an improvement over the current situation. After several discussions on Zulip, the following scheme has been suggested:

  • Include for a lemma or a def all the variables that are mentioned explicitly in the statement of the lemma (not its body, even if it is in term mode, contrary to Lean 3)
  • Also include all the typeclass assumptions about these variables which are in the context (named or unnamed, contrary to Lean 3)
  • No automatic removal of unused typeclass assumptions at the end of the proof: Better let a linter signal it, and let the user fix it by hand (to avoid non-robustness issues pointed out in the discussion above).
  • Add a mechanism to force inclusion of variables that are in the context but are not explicit in the statement. Two variations have been suggested here. Both of them are local to a theorem or a definition, as the global include / omit mechanism in Lean 3 is felt by many people as a major source of non-readability. Assume that hf : ... and hg : ... are variables in the context.
    • (a) Allow include hf hg in before the statement of a theorem, to force inclusion of hf and hg (with the binders they have in the variables list, and at their position in the variables list). This could also be used to adust binder types, with include (hf) {hg} in for instance. Optional: also allow omit hf in to remove a named typeclass assumptions hf that would otherwise be included.
    • (b) Allow theorem foo ... (hf) ... {hg} ... to force inclusion of hf and hg, adjusting their binders and their position in the variables list. The interplay of this syntax with autoparams is not completely clear, and what one should do if other variables depend on hf and hg and appear before them in the list should also be clarified.

A poll was organized on Zulip (https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/automatic.20inclusion.20of.20variables.20in.20mathlib.204/near/386620103), proposing either to keep the Lean 4 behavior, or go back to the Lean 3 behavior, or use the above scheme with one of the two suggestions (a) or (b). No-one opted for the current Lean 4 behavior nor the Lean 3 behavior. 12 people voted only for (a), 4 only for (b), and 4 for (a) or (b).

@digama0
Copy link
Collaborator

digama0 commented Aug 23, 2023

Refactors in a file may change the statements of theorems in another file without breaking anything

Note that this is also a problem for proof parallelism, as it is necessary to compile the body of the proof to determine the statement of the theorem, which may be needed for the following theorem. This is not directly a factor in the decision, but we should consider whether we really mean to make proof parallelism impossible. (This is similarly relevant for finding things that can be skipped to be able to compile a file in "outline mode" when full checking is not required, e.g. if we only want to know the AST of the file or what declarations are in it.)

@kim-em kim-em added the Mathlib4 high prio Mathlib4 high priority issue label Aug 24, 2023
@kim-em
Copy link
Collaborator

kim-em commented Aug 24, 2023

After some internal discussion within the Lean FRO:

  • This is a reasonable request, and we've heard that there is consensus amongst Mathlib users and maintainers that this behaviour ought to be changed.
  • Proof parallelism and outline mode are desired as long term goals, and as pointed out this issue needs to be addressed for those.
  • There are two suggestions about preferred behaviour, without a clear consensus between these, so deciding between these could be decided by a future implementer.
  • This will be a significant change to the behaviour of Lean, requiring lots of new code, documentation, and tests.
  • We want such new code to have a clear owner within the FRO organisation, regardless of how implementation work is shared between that person and external contributors.
  • At present there is not capacity for anyone within the FRO to take this on.
  • There are more people joining the FRO within the next few months!

This RFC remains on our radar, we hope to be able to get to it, but not yet.

@eric-wieser
Copy link
Contributor

Here'a an example of how the current mechanism can bite:

-- I'm going to prove stuff both about `Nat` and `Fin` in this file
variable (n : Nat) (m : Fin n)

theorem ohno (h : n ≠ 0) : n - 1 ≠ n := by
  cases n with
  | zero => exact (h rfl).elim
  | succ n =>
    intro h
    cases h 

-- Help, this lemma needs `m`!
#check ohno -- ohno (n : Nat) (m : Fin n) (h : n ≠ 0) : n - 1 ≠ n

@eric-wieser
Copy link
Contributor

Recently we have been finding a lot of lemmas in Mathlib that pick up DecidableEq assumptions that were never intended to be used in the proof; I've linked two above.

@Kha
Copy link
Member

Kha commented Apr 3, 2024

Can you say whether these would have been addressed by the inclusion heuristics suggested so far?

Also include all the typeclass assumptions about these variables which are in the context (named or unnamed, contrary to Lean 3)

@kbuzzard
Copy link
Contributor

Here's another example of leaky variables causing problems:

import Mathlib.Algebra.Algebra.Defs

count_heartbeats in
def foo1 : ℕ := 0 -- 7 heartbeats

variable (R S M : Type) [CommRing R] [CommRing S] [Algebra R S]
  [AddCommGroup M] [Module R M] [Module S M] [IsScalarTower R S M]

count_heartbeats in
def foo2 : ℕ := 0 -- 1468 heartbeats

set_option trace.profiler true
def foo3 : ℕ := 0
/-
[Elab.command] [0.093040] def foo3 : ℕ :=
      0 ▼
  [step] [0.012494] expected type: Type, term
      Module R M 
  [step] [0.012286] expected type: Type, term
      Module S M 
  [step] [0.060222] expected type: Prop, term
      IsScalarTower R S M ▶
-/

It seems that those variables (which are a very standard set-up in the theory of Kaehler differentials, coincidentally the slowest file in mathlib) are adding 0.1 seconds to every declaration in the file.

Zulip thread

The explanation seems to be this message. Gouezel's summary: "My understanding is the following: when you define foo, then the variables are around, and might be used in the definition, so Lean has to make sense of these variables even before starting to process your definition (because it might involve these variables). So it has to understand what Algebra R S means, which means finding some typeclasses on R and S."

@kim-em
Copy link
Collaborator

kim-em commented May 23, 2024

@kbuzzard, I've given a Mathlib free minimization of this at #4253. Thanks for identifying this!

@kim-em
Copy link
Collaborator

kim-em commented Jul 21, 2024

The current inclusion mechanism is also an obstacle to using debug.byAsSorry, because the typeclasses which are pulled in by proofs will change.

@kim-em kim-em removed the postponed label Jul 22, 2024
@Kha Kha added the RFC Request for comments label Jul 25, 2024
@leanprover-bot leanprover-bot added the P-high We will work on this issue label Jul 26, 2024
@Kha Kha closed this as completed in #4814 Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Mathlib4 high prio Mathlib4 high priority issue P-high We will work on this issue RFC Request for comments
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants