RFC: change variable inclusion mechanism #2452

sgouezel · 2023-08-23T19:19:27Z

The current Lean 4 mechanism for including variables is the following: When starting the proof of a theorem (or the body of a definition, it doesn't make a difference), all variables in the context are included and may be used in the proof. When the proof is over, only those variables that have indeed been used are kept for the statement of the theorem (i.e., all the other ones are discarded). This looks nice, but we have faced several issues in mathib 4 because of this:

if in the context there is variable {n : ℕ} (hn : 1 < n) (h'n : 2 < n) and a theorem using hn but not h'n, then induction n will also pull h'n in the assumptions of the theorem. So will subst. The solution is to clear h'n before the induction even though h'n should not be in sight anywhere.
When cleaning up a proof, say by using lemma B instead of lemma A in the proof, new typeclass assumptions may be pulled in by lemma B and weaken the initial statement. All this will go unnoticed (no linter can help here).
In files which are heavy on variables or typeclass assumptions (say files on vector bundles), the goal view is cluttered by dozens of lines of assumptions which are not relevant for the lemma being currently proved, making it really hard to use.
Refactors in a file may change the statements of theorems in another file without breaking anything: if the proof of a theorem involves a typeclass classA α, and in the context there are assumptions classB α and classC α, both of them implying classA α, then the above mechanism will only keep classB α or classC α, and the one which will be kept depends on the details of the path chosen by typeclass inference. Therefore, adjusting instance priorities somewhere else may change which one will appear in the assumptions of the theorem.

No linter can guard against this lack of robustness. Therefore, most mathlib folks have been convinced that a less clever but more predictable behavior would be an improvement over the current situation. After several discussions on Zulip, the following scheme has been suggested:

Include for a lemma or a def all the variables that are mentioned explicitly in the statement of the lemma (not its body, even if it is in term mode, contrary to Lean 3)
Also include all the typeclass assumptions about these variables which are in the context (named or unnamed, contrary to Lean 3)
No automatic removal of unused typeclass assumptions at the end of the proof: Better let a linter signal it, and let the user fix it by hand (to avoid non-robustness issues pointed out in the discussion above).
Add a mechanism to force inclusion of variables that are in the context but are not explicit in the statement. Two variations have been suggested here. Both of them are local to a theorem or a definition, as the global include / omit mechanism in Lean 3 is felt by many people as a major source of non-readability. Assume that hf : ... and hg : ... are variables in the context.
- (a) Allow include hf hg in before the statement of a theorem, to force inclusion of hf and hg (with the binders they have in the variables list, and at their position in the variables list). This could also be used to adust binder types, with include (hf) {hg} in for instance. Optional: also allow omit hf in to remove a named typeclass assumptions hf that would otherwise be included.
- (b) Allow theorem foo ... (hf) ... {hg} ... to force inclusion of hf and hg, adjusting their binders and their position in the variables list. The interplay of this syntax with autoparams is not completely clear, and what one should do if other variables depend on hf and hg and appear before them in the list should also be clarified.

A poll was organized on Zulip (https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/automatic.20inclusion.20of.20variables.20in.20mathlib.204/near/386620103), proposing either to keep the Lean 4 behavior, or go back to the Lean 3 behavior, or use the above scheme with one of the two suggestions (a) or (b). No-one opted for the current Lean 4 behavior nor the Lean 3 behavior. 12 people voted only for (a), 4 only for (b), and 4 for (a) or (b).

The text was updated successfully, but these errors were encountered:

digama0 · 2023-08-23T20:22:30Z

Refactors in a file may change the statements of theorems in another file without breaking anything

Note that this is also a problem for proof parallelism, as it is necessary to compile the body of the proof to determine the statement of the theorem, which may be needed for the following theorem. This is not directly a factor in the decision, but we should consider whether we really mean to make proof parallelism impossible. (This is similarly relevant for finding things that can be skipped to be able to compile a file in "outline mode" when full checking is not required, e.g. if we only want to know the AST of the file or what declarations are in it.)

kim-em · 2023-08-24T04:10:00Z

After some internal discussion within the Lean FRO:

This is a reasonable request, and we've heard that there is consensus amongst Mathlib users and maintainers that this behaviour ought to be changed.
Proof parallelism and outline mode are desired as long term goals, and as pointed out this issue needs to be addressed for those.
There are two suggestions about preferred behaviour, without a clear consensus between these, so deciding between these could be decided by a future implementer.
This will be a significant change to the behaviour of Lean, requiring lots of new code, documentation, and tests.
We want such new code to have a clear owner within the FRO organisation, regardless of how implementation work is shared between that person and external contributors.
At present there is not capacity for anyone within the FRO to take this on.
There are more people joining the FRO within the next few months!

This RFC remains on our radar, we hope to be able to get to it, but not yet.

eric-wieser · 2024-02-16T21:07:30Z

Here'a an example of how the current mechanism can bite:

-- I'm going to prove stuff both about `Nat` and `Fin` in this file
variable (n : Nat) (m : Fin n)

theorem ohno (h : n ≠ 0) : n - 1 ≠ n := by
  cases n with
  | zero => exact (h rfl).elim
  | succ n =>
    intro h
    cases h 

-- Help, this lemma needs `m`!
#check ohno -- ohno (n : Nat) (m : Fin n) (h : n ≠ 0) : n - 1 ≠ n

eric-wieser · 2024-03-31T20:49:02Z

Recently we have been finding a lot of lemmas in Mathlib that pick up DecidableEq assumptions that were never intended to be used in the proof; I've linked two above.

Kha · 2024-04-03T08:41:48Z

Can you say whether these would have been addressed by the inclusion heuristics suggested so far?

Also include all the typeclass assumptions about these variables which are in the context (named or unnamed, contrary to Lean 3)

kbuzzard · 2024-05-22T15:36:49Z

Here's another example of leaky variables causing problems:

import Mathlib.Algebra.Algebra.Defs

count_heartbeats in
def foo1 : ℕ := 0 -- 7 heartbeats

variable (R S M : Type) [CommRing R] [CommRing S] [Algebra R S]
  [AddCommGroup M] [Module R M] [Module S M] [IsScalarTower R S M]

count_heartbeats in
def foo2 : ℕ := 0 -- 1468 heartbeats

set_option trace.profiler true
def foo3 : ℕ := 0
/-
[Elab.command] [0.093040] def foo3 : ℕ :=
      0 ▼
  [step] [0.012494] expected type: Type, term
      Module R M 
  [step] [0.012286] expected type: Type, term
      Module S M 
  [step] [0.060222] expected type: Prop, term
      IsScalarTower R S M ▶
-/

It seems that those variables (which are a very standard set-up in the theory of Kaehler differentials, coincidentally the slowest file in mathlib) are adding 0.1 seconds to every declaration in the file.

Zulip thread

The explanation seems to be this message. Gouezel's summary: "My understanding is the following: when you define foo, then the variables are around, and might be used in the definition, so Lean has to make sense of these variables even before starting to process your definition (because it might involve these variables). So it has to understand what Algebra R S means, which means finding some typeclasses on R and S."

kim-em · 2024-05-23T00:59:24Z

@kbuzzard, I've given a Mathlib free minimization of this at #4253. Thanks for identifying this!

kim-em · 2024-07-21T23:21:10Z

The current inclusion mechanism is also an obstacle to using debug.byAsSorry, because the typeclasses which are pulled in by proofs will change.

kim-em added the Mathlib4 high prio Mathlib4 high priority issue label Aug 24, 2023

kim-em added the postponed label Aug 24, 2023

girving mentioned this issue Feb 11, 2024

destructProducts can add dependencies on implicit variables leanprover-community/aesop#104

Open

eric-wieser mentioned this issue Feb 16, 2024

[Merged by Bors] - bug (Mathlib.Data.Finset.Sym) : suppress global variable (m : Sym alpha n) leanprover-community/mathlib4#10637

Closed

This was referenced Mar 31, 2024

[Merged by Bors] - chore(Finset/Basic): drop an unneeded DecidableEq assumption leanprover-community/mathlib4#11796

Closed

[Merged by Bors] - chore(DFinsupp/Basic): drop a DecidableEq assumption leanprover-community/mathlib4#11795

Closed

leodemoura mentioned this issue Jun 6, 2024

fix: variable must execute pending tactics and elaboration problems #4370

Merged

kim-em removed the postponed label Jul 22, 2024

Kha mentioned this issue Jul 23, 2024

feat: refined variable behavior in proofs #4814

Merged

Kha added the RFC Request for comments label Jul 25, 2024

leanprover-bot added the P-high We will work on this issue label Jul 26, 2024

Kha closed this as completed in #4814 Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: change variable inclusion mechanism #2452

RFC: change variable inclusion mechanism #2452

sgouezel commented Aug 23, 2023 •

edited by jcommelin

Loading

digama0 commented Aug 23, 2023

kim-em commented Aug 24, 2023

eric-wieser commented Feb 16, 2024

eric-wieser commented Mar 31, 2024

Kha commented Apr 3, 2024

kbuzzard commented May 22, 2024

kim-em commented May 23, 2024

kim-em commented Jul 21, 2024

RFC: change variable inclusion mechanism #2452

RFC: change variable inclusion mechanism #2452

Comments

sgouezel commented Aug 23, 2023 • edited by jcommelin Loading

digama0 commented Aug 23, 2023

kim-em commented Aug 24, 2023

eric-wieser commented Feb 16, 2024

eric-wieser commented Mar 31, 2024

Kha commented Apr 3, 2024

kbuzzard commented May 22, 2024

kim-em commented May 23, 2024

kim-em commented Jul 21, 2024

sgouezel commented Aug 23, 2023 •

edited by jcommelin

Loading