Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce csimp tactic #311

Closed
wants to merge 2 commits into from
Closed

Introduce csimp tactic #311

wants to merge 2 commits into from

Conversation

cdisselkoen
Copy link
Contributor

Issue #, if available:

Resolves #279

Description of changes:

Resolves #279 and also proposes a different (cleaner IMO) way to resolve the proof-stability problem with simp across Lean versions. Introduces a custom tactic csimp (Cedar-simp) that can be used like simp but internally uses a stable set of lemmas that will not change with the Lean version (unless we intentionally make changes).

csimp supports at syntax, e.g., all of the following are valid:

  • csimp
  • csimp at h
  • csimp at *

but csimp does not support extending the lemmas inline, e.g.,

  • csimp [h₂] (instead, use csimp ; simp only [h₂], or the other order)

only because I couldn't get that to work. Specifically, although I was able to get csimp to take an optional list argument of the appropriate type, and even append those items to the lemmas such that csimp [h₂] works, Lean's handling of the optional argument meant that it dropped the whole merged list of lemmas when trying to use csimp without the argument. Lean's documentation for macros doesn't explain details of how to use optional arguments to macros; it's very possible there's some simple syntax to accomplish this and I just don't know it. However, this PR demonstrates that even the current version of csimp is still very useful and improves proofs (making them more stable and readable).

Signed-off-by: Craig Disselkoen <[email protected]>
Copy link
Contributor

@khieta khieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall idea seems fine to me. Personally I like having a single tactic to try (csimp) instead of having to think about exactly which lemmas to include with simp only.

No comments on your questions about improving the tactic syntax since I don't have experience there 🙂

@emina
Copy link
Contributor

emina commented May 9, 2024

I prefer simp only. Even though it's noisier, it is standard, more flexible, and likely more performant because it is using only the lemmas that are needed for a specific call site. I personally don't mind the wordiness, and like seeing exactly which lemmas were needed for a proof.

My general rule of thumb for using tactics is this: (a) the tactic abstracts a specific pattern that repeats in many proofs, or (b) it's a general-purpose decision procedure (like omega).

I've only written tactics of kind (a) so far :)

This one doesn't fit either (a) or (b), so my vote is to stick to the standard solution (simp only).

@emina
Copy link
Contributor

emina commented May 9, 2024

Overall idea seems fine to me. Personally I like having a single tactic to try (csimp) instead of having to think about exactly which lemmas to include with simp only.

If you type simp? instead of simp in the VSCode IDE, Lean tells you how to transform this into simp only :)

@cdisselkoen
Copy link
Contributor Author

Am I correct that, absent proof stability concerns, all of us would prefer to use simp rather than any of the other options? (I assert this is true of @emina as well, because many of the existing proofs were written by her and use simp liberally in place of simp only.).

If I'm correct, then I claim that csimp provides nearly the same experience as simp, modulo a few ergonomics concerns. Like @khieta, I appreciate not having to do the simp? step (even though Lean/VSCode admittedly make it easier than it would otherwise be), and I especially appreciate having simpler/more-concise proofs.

@emina says she likes seeing exactly which lemmas were needed for a proof, but to me, I don't care about which basic lemmas (included in this csimp set) were used, as they are all intuitive and relatively uninteresting; it is much more helpful to see which non-basic lemmas, and especially which lemmas we defined ourselves in Cedar/Thm, were used. Using simp only frequently obscures this, as you're forced to look through the entire (sometimes long) list of lemmas, and you need the domain knowledge to recognize which ones are important / Cedar ones and which ones are just simp. Using csimp makes it very clear which non-simp lemmas were used.

I also claim that csimp is more approachable for new contributors, who might not have the list of simp lemmas memorized, and might not know the simp? trick. You see the use of csimp in the existing proofs, and you follow the lead and use csimp in your own proof. No extra tricks where you write one thing (simp) and commit another, less readable version (simp only).

likely more performant

For the performance argument, let's actually do some profiling before we reject this on performance grounds. We've made other decisions that favor proof readability over performance, like our heavy use of omega in proofs involving arithmetic. I suggest we make the same decision for csimp unless we have evidence to the contrary.

Comment on lines -1288 to +1271
cases xs
cases xs <;> csimp at *
case nil =>
simp only [Option.bind_eq_bind, List.foldlM, pure, Option.some.injEq, Option.bind_some_fun] at *
subst h₃; exact h₂
case cons hd tl =>
simp only [Option.bind_eq_bind, List.foldlM, Option.bind_eq_some] at *
cases h₄ : f x₂ hd <;> simp only [h₄, false_and, exists_false, Option.some.injEq, exists_eq_left'] at h₃
cases h₄ : f x₂ hd <;> rw [h₄] at h₃ <;> csimp at h₃
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example where I think csimp unquestionably improves readability. It also calls out much more clearly the one non-simp lemma (h4) which was previously used in these three long simp only invocations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly think that the improvement in readability is minimal here.

case true =>
split <;> simp only [Except.bind_ok, Except.bind_err]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally like the new proofs, but I prefer the prior version here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, reverted this bit to use the explicit h_1 and h_2 tags and renames

@emina
Copy link
Contributor

emina commented May 9, 2024

I prefer to use standard tooling whenever possible. I disagree that it's easier to use csimp than simp only, especially within the Lean IDE where all of us write proofs.

To be clear, this is nothing specific to csimp: this is true of all non-standard tactics. You need to know what the tactic does, find it somewhere, etc., and then use it. This is, of course, a good thing when the tactic really saves you a lot of tedious work (like omega does). But this tactic doesn't pass that bar IMO.

@andrewmwells-amazon
Copy link
Contributor

andrewmwells-amazon commented May 9, 2024

In general, I like how the new proofs look. I agree with Craig that performance probably isn't an issue (as evidence, a lot of these used simp instead of simp only in prior versions, and were still plenty fast).

I don't think we should underestimate the value in writing idiomatic Lean. I find the argument that people will see and emulate our use of csimp dubious. At least so far, most external Lean contributors have more Lean experience than we do. For these people, adding a new best practice is annoying, but following what Mathlib does is natural.

Additionally, even for new Lean users, Google (and Loogle, LeanCopilot, or whatever) will probably give better results for issues with simp than issues with csimp.

@emina
Copy link
Contributor

emina commented May 9, 2024

I don't think we should underestimate the value in writing idiomatic Lean. I find the argument that people will see and emulate our use of csimp dubious. At least so far, most external Lean contributors have more Lean experience than we do. For these people, adding a new best practice is annoying, but following what Mathlib does is natural.

^^^ Andrew said it much better than I did. 100% this.

@cdisselkoen
Copy link
Contributor Author

You need to know what the tactic does, find it somewhere, etc, and then use it.

This is true of all standard tactics as well (and a frustration for me, but that's somewhat tangential. I have no way to discover new useful tactics other than see them used somewhere else.) I don't think it's harder to discover csimp than any other Lean tactic. If anything, it is easier because it is explicitly imported, and you can see it used in the existing proofs.

I find the argument that people will see and emulate our use of csimp dubious.

I think it depends entirely on whether we're optimizing for new contributions by people familiar with Lean but not Cedar (as @emina and @andrewmwells-amazon seem to be arguing), or for new contributions by people familiar with Cedar but not Lean (which was the boat I was recently in, and longer ago, I think @khieta as well). For people in the latter boat, I think they will 100% see and emulate csimp, and the use of simp only is extremely opaque, and very hard to use unless you know about simp? (and how would you discover that? The existing proofs just seem to materialize the right simp only out of thin air.)

@khieta
Copy link
Contributor

khieta commented May 9, 2024

I still like csimp for the reasons @cdisselkoen provided: it makes it easier for new contributors (like me). But happy to defer to those with more Lean proof experience 🍿 🕶️

Signed-off-by: Craig Disselkoen <[email protected]>
@andrewmwells-amazon
Copy link
Contributor

This is true of all standard tactics as well (and a frustration for me, but that's somewhat tangential. I have no way to discover new useful tactics other than see them used somewhere else.) I don't think it's harder to discover csimp than any other Lean tactic. If anything, it is easier because it is explicitly imported, and you can see it used in the existing proofs.

(I worry this sounds passive-aggressive, but that's not my intent. I'd like to hear of complaints about it if you have them.) Have you tried LeanCopilot? It's pretty great for this IMO.

@cdisselkoen
Copy link
Contributor Author

One more question -- this is resolving an existing issue #279 which no one objected to when it was filed. Are folks now saying they object to #279 in principle?

@cdisselkoen
Copy link
Contributor Author

Have you tried LeanCopilot? It's pretty great for this IMO

This is a good call. I have not tried it.

@emina
Copy link
Contributor

emina commented May 9, 2024

I think it depends entirely on whether we're optimizing for new contributions by people familiar with Lean but not Cedar (as @emina and @andrewmwells-amazon seem to be arguing), or for new contributions by people familiar with Cedar but not Lean (which was the boat I was recently in, and longer ago, I think @khieta as well). For people in the latter boat, I think they will 100% see and emulate csimp, and the use of simp only is extremely opaque, and very hard to use unless you know about simp? (and how would you discover that? The existing proofs just seem to materialize the right simp only out of thin air.)

I think it's fair to say we all know about simp? now :) It is also described in the Lean docs.

And once you learn that, you don't need the custom tactic. It just adds cognitive overhead and process burden without saving much IMO.

@andrewmwells-amazon
Copy link
Contributor

I think it depends entirely on whether we're optimizing for new contributions by people familiar with Lean but not Cedar (as @emina and @andrewmwells-amazon seem to be arguing), or for new contributions by people familiar with Cedar but not Lean (which was the boat I was recently in, and longer ago, I think @khieta as well).

I think this is the core issue. I have trouble believing we'll have contributors who learn Lean just to contribute to Cedar (beyond minor changes). As evidence, think about how much effort went into #206 . So my vote is to optimize for people already familiar with Lean.

@emina
Copy link
Contributor

emina commented May 9, 2024

One more question -- this is resolving an existing issue #279 which no one objected to when it was filed. Are folks now saying they object to #279 in principle?

I think we're disagreeing about where to draw the abstraction boundaries, and when the benefits of a new abstraction outweigh the cost.

My take is above: either the tactic is a general decision procedure (like omega), or it abstracts over a specific class of proofs that happen over and over again (and usually have several steps).

To give a concrete example, I have this tactic in the internal code that lets me work with the do notation more easily (deconstruct hypothesis of the form (do ...) = Except.ok ... or (do ...) = Except.error ... because they arise all the time in my proofs):

/--
This tactic simplifes assumptions of the form `do (let x ← expr ...)` by
performing a `cases` and `simp` on `expr`.
-/

syntax "simp_do_let" term (" at " ident)? : tactic

macro_rules
| `(tactic| simp_do_let $e $[at $h:ident]?) =>
  `(tactic|
    cases h' : $e <;>
    simp only [h', Except.bind_err, Except.bind_ok] $[at $h:ident]?)

@andrewmwells-amazon
Copy link
Contributor

One more question -- this is resolving an existing issue #279 which no one objected to when it was filed. Are folks now saying they object to #279 in principle?

My intent at the time was to apply this to particularly ugly cases within one file (so you could define the tactic within the file). I think even Emina will agree that
simp only [Ext.IPAddr.IPNet.lt, ne_eq, Bool.or_eq_true, or_false, UInt32.val_val_eq_toNat, not_and, not_false_eq_true, imp_self, Bool.and_eq_true, Ext.IPAddr.IPNet.V6.injEq, Ext.IPAddr.IPNet.V4.injEq, decide_eq_true_eq, or_true, Nat.lt_eq]
is hard to read 😄

There's a tradeoff here between being concise and being idiomatic. I'd like to be more concise in some really ugly cases, but still mostly be idiomatic.

@emina
Copy link
Contributor

emina commented May 9, 2024

There's a tradeoff here between being concise and being idiomatic. I'd like to be more concise in some really ugly cases, but still mostly be idiomatic.

Couldn't agree more :)

@emina
Copy link
Contributor

emina commented May 9, 2024

I think even Emina will agree that
simp only [Ext.IPAddr.IPNet.lt, ne_eq, Bool.or_eq_true, or_false, UInt32.val_val_eq_toNat, not_and, not_false_eq_true, imp_self, Bool.and_eq_true, Ext.IPAddr.IPNet.V6.injEq, Ext.IPAddr.IPNet.V4.injEq, decide_eq_true_eq, or_true, Nat.lt_eq]
is hard to read 😄

Yep, it's not the easiest thing on the eyes :D

@andrewmwells-amazon
Copy link
Contributor

I would be curious to hear opinions from Lean experts. Do you want to post a question on Zulip, link to this PR and ask if people have recommendations? (Or I can do it). There's also a public Lean office hours every Wednesday where we can get opinions (Leo and a rotating group of Lean / Mathlib maintainers attend and answer questions).

@cdisselkoen
Copy link
Contributor Author

Asked in the Zulip at link

@cdisselkoen
Copy link
Contributor Author

Based on feedback in this thread and on the Zulip, I'll close this. We'll stick with the simp only convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor shared simp only with custom tactics
4 participants