Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: combine set filter and set map language constructs #10

Open
ahelwer opened this issue May 8, 2024 · 10 comments
Open

Proposal: combine set filter and set map language constructs #10

ahelwer opened this issue May 8, 2024 · 10 comments

Comments

@ahelwer
Copy link

ahelwer commented May 8, 2024

I often find myself wanting to both filter and map a set. The way to do this in TLA+ is currently:

op == { x \in { f(x) : x \in S} : p(x) }

or

op == { f(x) : x \in { x \in S : p(x) } }

I think it would be nicer to make this a single operation:

{ f(x) : x \in S : p(x) }

One possible semantic issue is that set mapping supports multiple quantifier bounds:

op == { f(x, y) : x \in S, y \in P }

while set filtering only supports a single quantifier:

op == { x \in S : p(x) }

because after all, what would { x \in S, y \in P : p(x, y) } even mean?

Fortunately having a map operation ensures that these bounds will coalesce into a single stream of elements. However, it does make things more difficult when trying to define the semantics of this combined map/filter operation, since you can't easily decompose it into a set map then a set filter. What does this mean, for example?

{ f(x, y) : x \in S, y \in P : p(x, y) }

It cannot be easily written in terms of the existing map and filter constructs. I believe the translation would have to be something like this:

op == {
  f(x, y) : <<x, y>> \in {
    <<x, y>> \in
      {<<x, y>> : x \in S, y \in P}
    : p(x, y)
  }
}

So it would be a map that wraps the multiple quantifier bounds in a tuple, nested inside the filter that recovers their names with a tuple destructuring, nested inside the map that recovers their names using tuple destructuring.

@muenchnerkindl
Copy link

I agree that the combined syntax would be convenient, and it has been used since Dijkstra.

For constructs involving multiple bound variables, an arguably more readable translation (even after your edit) is

LET Filter == { <<x, y>> \in S \X P : p(x, y) }
IN   { f(x, y) : <<x, y>> \in Filter }

@Calvin-L
Copy link

Calvin-L commented May 9, 2024

I don't love the second colon in { f(x) : x \in S : p(x) }.

The /\ operator makes more sense to me, since x \in S and p(x) are both tests that constrain what elements appear in the set:

{ f(x) : x \in S /\ p(x) }

Thoughts?

@lemmy
Copy link
Member

lemmy commented May 9, 2024

We've received numerous complaints from our target audience about the difficulty and confusion surrounding TLA+ syntax. I suggest we explore ways to assess whether these language extensions are beneficial or detrimental to addressing this issue. To that end, extensions could be required to come with a desugaring tool.

@ahelwer
Copy link
Author

ahelwer commented May 9, 2024

@lemmy to my mind this is a simplification of syntax. Users are well-acquainted with map/filter from many other languages, like Python or SQL. The current way of filtering then mapping is very verbose, confusing to read, and confusing to write. I would go as far as to say map and filter should be treated as specialized forms of this operation instead of this operation desugaring to map and filter.

@Calvin-L don't really like the use of /\ to be honest, at least in my mind I read : as "such that" so the dual : makes sense. Using /\ also greatly complicates parsing; S can be an arbitrarily-long TLA+ expression which could contain /\ within it. Trying to use existing infix operator tokens as keywords is fraught with difficulty, as seen in #9. Ideally I would have liked { f(x) : x \in S | p(x) } which fits with mathematical notation I've seen in the real world, but | is also an existing infix operator so would run into the same issue.

@lemmy
Copy link
Member

lemmy commented May 9, 2024

@lemmy to my mind this is a simplification of syntax. Users are well-acquainted with map/filter from many other languages, like Python or SQL. The current way of filtering then mapping is very verbose, confusing to read, and confusing to write.

Might very well be, but let's be data-driven: What's the background of the average target user? What do existing and potential users find confusing? How important is writability compared to readability? There is no need to rush here, and these RFCs should seek input from a broad group of users, not just four or five experts. Related: #1

Personally, I'm not against this proposal.

@ahelwer
Copy link
Author

ahelwer commented May 17, 2024

@hwayne provides the interesting suggestion that as part of these changes, set filter with multiple quantifier bounds could also be added to the language; { x \in S, y \in P : p(x, y) } would be defined to mean { <<x, y>> \in S \X P : p(x, y) }. There is precedent for this in the function definition syntax f[x \in S, y \in P] == ... meaning f[<<x, y>> \in S \X P] == ... as accessed by DOMAIN f. This would simplify the translation of the combined map/filter.

@Calvin-L
Copy link

don't really like the use of /\

What about comma? That would align with current TLA+ syntax, where bounds are comma-separated.

Users are well-acquainted with map/filter from many other languages, like Python or SQL.

That's a good point. Should we be surveying how other languages handle set comprehensions? For instance, Haskell allows a comma-separated list of clauses where you can mix binders and filters:

let catsInGoodHomes = [c | h <- homes, isGood h, c <- cats h]

You can even put let-bindings in the clauses; let x = y is the same as the clause x <- [y].

Python has similar syntax:

catsInGoodHomes = [c for h in homes if isGood(h) for c in cats(h)]

These are some examples of how that might be written in TLA+:

  • catsInGoodHomes == {c : h \in homes, c \in cats(h) : isGood(h)} (original proposal---but see note below)
  • catsInGoodHomes == {c : h \in homes : isGood(h) : c \in cats(h)} (can we reorder clauses?)
  • catsInGoodHomes == {c : h \in homes /\ isGood(h) /\ c \in cats(h)} (/\ proposal)
  • catsInGoodHomes == {c : h \in homes, isGood(h), c \in cats(h)} (, proposal)

On a related note, even though TLA+ does allow multiple bounds in set-filter expressions, it does not allow the bound a variable to depend on another. The TLA+ version of my example is not legal, even without a filter clause:

catsInGoodHomes == {c : h \in homes, c \in cats(h)}
\*                                         ^^^^^^^

I often want that feature.

@ahelwer
Copy link
Author

ahelwer commented May 17, 2024

The issue with commas is then it becomes difficult to tell whether you are entering another bound quantifier or entering the predicate. Given that { x \in S : p(x) } is already the syntax when filtering, don't you think it makes sense to keep it?

@hwayne
Copy link

hwayne commented May 18, 2024 via email

@ahelwer
Copy link
Author

ahelwer commented Jun 11, 2024

I like Hillel's idea. It would be interesting to use optional keywords instead of :, like {f(x) FOR x \in S WHERE p(x)}. This could also be added as alternatives in the existing set map and set filter constructs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants