User-defined keywords #3599

pzinn · 2024-11-28T08:12:20Z

Work in progress to implement user-defined keywords: (cf #3593)

i1 : makeKeyword "×"

o1 = ×

o1 : Keyword

i2 : ZZ × ZZ := times

o2 = times

o2 : CompiledFunction

i3 : 1×2

o3 = 2

The most important change is to have a default operation (which applies to 99% of existing keywords, and to the newly created user-defined keywords) which is to look up the corresponding method. so no need to define all these identical functions one-by-one.

TODO:

~~treat all the different cases (unary, binary, precedence) by adding options to the m2 level makeKeyword~~
~~simplify drastically actors5.d by removing all unnecessary functions~~
transfer the creation of keywords to m2 level (core)
get rid of opsWithBinaryMethod etc
get rid of getBinopName etc and rewrite pseudocode accordingly
~~simplify the tex routines~~
possibly remove entirely the fictitious Keyword class, instead providing some information in the Symbol Afterprint
~~fix the other minor issues mentioned in User-created keywords #3593~~
~~document makeKeyword~~

pzinn · 2024-12-01T10:01:15Z

I might stop here, to keep this PR relatively small and to avoid the more controversial items on my list.

d-torrance · 2024-12-01T21:37:04Z

M2/Macaulay2/d/actors5.d

+	   w:=makeUniqueWord(s.v, parseinfo(prec,bprec,uprec,parsefuns(u,t)));
+	   when globalLookup(w) is x:Symbol do buildErrorPacket("symbol already in use")
+	   else (
+	   install(s.v,w);  -- TODO check whether install is really needed (for mathematical symbols as opposed to words)


Are there any updates on whether we need to call install here?

yeah this is rather subtle. my understanding is, it's harmless to always have install here (except maybe for a tiny slowdown of the parsing?). The point is, this is useless for keywords that are actual words (like and) because you need to separate them with spaces anyway (so they behave more similarly to regular symbols). But at the moment I've put limited constraints on allowed keywords (see isvalidkeyword -- which may require more restrictions), so people can have keywords like @*@ for which install is definitely necessary.

I rephrased slightly the comment, but haven't actually changed anything

I need to think about this some more. Relatedly, at this stage nothing would prevent us from removing the section in lex.d which you wrote else if ismathoperator(peek2(file)) then ( etc except maybe unicode synonyms.

answering myself: no that doesn't work. math operators now have a slightly ambiguous status of being both "alphanumeric" (i.e., usable in a regular symbol) and have special parsing rules (so they can be used in keywords, e.g., binary operators).

well, right now, this PR doesn't change the status of math operators, so everything works fine as written.
but yes, as you pointed out, a nice feature of this PR is one can even define unary (prefix) operators which for all practical purposes work the same as a MethodFunction. so that would definitely be one way forward. might want to have a separate PR for this whole mess. In the meantime, I will fix the install thing today so we can resolve this particular comment.

it would be so much easier if we could declare math symbols to be non alphanumeric. In fact this was my original idea in my first PR on unicode symbols, but that breaks an existing package which defined the sum and product unicode symbols, so I had to change it. now we're paying the price for this backwards compatibility.

I don't follow this then. How does declaring math symbols to be non-alphanumeric break backwards compatibility with a package that uses non-alphanumeric method names? Unless you meant declare any non-alphanumeric string to be a math operator automatically?

I agree that we should aim for the ability to declare different types of operators, binary, unary left/right, with prescribed precedents.

it would be so much easier if we could declare math symbols to be non alphanumeric. In fact this was my original idea in my first PR on unicode symbols, but that breaks an existing package which defined the sum and product unicode symbols, so I had to change it. now we're paying the price for this backwards compatibility.

I don't follow this then. How does declaring math symbols to be non-alphanumeric break backwards compatibility with a package that uses non-alphanumeric method names?

you mean, with a package that uses math-operator method names?
because methods have ordinary symbols (as opposed to keywords), and these are required to be alphanumeric.

updated so install is only called when needed.

M2/Macaulay2/d/binding.d

M2/Macaulay2/m2/methods.m2

mahrud · 2024-12-03T04:13:33Z

Since lots of things are being refactored here, what do you think about also changing the terminology? My understanding is that the classes Symbol and Keyword are identical, except that keywords are symbols which are reserved by the language and can't be assigned values, etc. This means things like and, then, but also operators like +, etc. But with this PR users can define new keywords, which implies they are no longer reserved.

So I think we should create a new class Operator for only non-alphanumeric math operators which are distinct from reserved alphanumeric keywords.

pzinn · 2024-12-03T04:22:41Z

sure.

pzinn · 2024-12-03T04:24:26Z

though it would be nice to have a common ancestor, since I don't want to have two different methods makeKeyword and makeOperator which would do the exact same thing.

mahrud · 2024-12-03T04:40:56Z

I guess what I'm saying is we should not have makeKeyword at all, since by definition keywords are the preserved symbols.

edit: to clarify, a makeKeyword function to simplify declaring alphanumeric only reserved symbols (e.g. and, or etc.) in the interpreter is totally fine, I'm saying there shouldn't be a top level one. On the other hand, a makeOperator function to simplify things in the interpreter and also exporting it to allow users to define new operators is fine.

mahrud · 2024-12-03T04:55:34Z

M2/Macaulay2/m2/expressions.m2

+⋯:=symbol ⋯
+⋱:=symbol ⋱
+⋮:=symbol ⋮
+…:=symbol …
 -- used e.g. in chaincomplexes.m2


I'm confused about what's happening here, and also don't think this particular syntax is good practice.

For instance, maybe cdots = ⋯ = new Operator from symbol ⋯ is preferable, and later in this file it would be better to use the synonym cdots rather than ⋯.

well, what was there before was worse, since it defined a sub-Type of Symbol, which is not a good idea in general. this has the merit that it works.

but OK with synonyms, done

mahrud · 2024-12-03T05:00:13Z

M2/Macaulay2/m2/set.m2

 protect Flexible
-protect Binary
-protect Prefix
-protect Postfix


How come Flexible is kept here?

Tangentially, I think a syntax like:

otimes = ⊗ = new BinaryOperator from symbol ⊗ otimes.precedence = precedence symbol *

would be nice, though perhaps for flexible operators it's a bit more tricky.

the problem with using the syntax new ... from symbol ... is that there is no room for options, and I'm not sure how to implement this .precedence thing.

How come Flexible is kept here?

dunno I didn't touch this

should I have moved Flexible too? I'm not even clear what it means.
more generally I don't know what operatorAttributes is for, so I didn't touch it.

dunno I didn't touch this

You removed protect Binary and Prefix and Postfix but kept Flexible here. Why?

well, besides what I wrote above, the practical reason was that I didn't need Flexible.

pzinn · 2024-12-03T07:47:31Z

I guess what I'm saying is we should not have makeKeyword at all, since by definition keywords are the preserved symbols.

edit: to clarify, a makeKeyword function to simplify declaring alphanumeric only reserved symbols (e.g. and, or etc.) in the interpreter is totally fine, I'm saying there shouldn't be a top level one. On the other hand, a makeOperator function to simplify things in the interpreter and also exporting it to allow users to define new operators is fine.

err? that's exactly what this PR is doing... Example

makeKeyword "implies"
Boolean implies Boolean := (a,b) -> not a or b
false implies true

pzinn · 2024-12-03T07:51:44Z

to recap: two types of Keyword exist:

the ones that look like normal (alphanumeric) symbols, like and
the ones made of special characters, like **. These are the ones that require install for special parsing.

both are (and should be) definable using makeKeyword. The only confusing bit is that mathematical operators like × are classified as normal symbols (so in 1. above). This is what I was complaining about before, and should probably be reverted one day, though not in this PR since it would require a significant rewrite (not to mention the backward compatibility problem).

pzinn · 2024-12-03T07:58:23Z

oh yeah this PR also fixes this silly bug: getGlobalSymbol ""

mahrud · 2024-12-03T08:21:41Z

If a user can define implies as in your example, then it is by definition not a reserved symbol, therefore it shouldn't be called a keyword. It's just a new type of method attached to a symbol, which is very useful, but is different.

pzinn · 2024-12-03T08:24:37Z

sure, I'm happy to name it something else. but the point is, for all practical purposes, in that example, and and implies behave in the exact same way, and therefore should probably have the same class.

mahrud · 2024-12-03T08:26:55Z

for all practical purposes, in that example, and and implies behave in the exact same way, and therefore should probably have the same class.

Then users should not be able to define them in the top level. They should be defined in the interpreter, compiled, and fixed in the language.

pzinn · 2024-12-03T08:28:19Z

for all practical purposes, in that example, and and implies behave in the exact same way, and therefore should probably have the same class.

Then users should not be able to define them in the top level. They should be defined in the interpreter, compiled, and fixed in the language.

well, I disagree. It certainly doesn't harm to give users that power.

mahrud · 2024-12-03T08:36:53Z

It certainly does harm if it's hastily implemented without much longer discussion and evaluation. What happens if a package makes foo into a keyword and another makes it a symbol for a method or variable? What happens if there's a race condition in which one is defined first?

Again, I'm all in favor of allowing users to define non-alphanumeric math operators in the top level, because those can't be used as symbols for methods or variables, and we already know how to deal with different packages overriding the same method, but keywords are by definition preserved words which can't be altered by the user.

pzinn · 2024-12-03T08:40:07Z

actually, it makes no difference whether the keyword is alphanumeric or not. there will be a conflict if two packages define the same keyword. Right now it's a non issue because no package does so, but eventually we need a mechanism to resolve if say two different packages want to use × or ~~~ for different purposes. (The biggest problem I can see is, we can't reconcile them if they have different Syntax or Precedence)
edited to give an example that's actually non-alphanumeric (sigh)

pzinn · 2024-12-03T08:46:36Z

To clarify, Symbols can be in different dictionaries, but the underlying Words, which carry the parsing information, cannot (and shouldn't precisely because they affect parsing).

pzinn · 2024-12-16T22:19:26Z

Let me try to summarise some of the changes that have been requested during the M2internals meeting:

not let users define alphanumeric keywords. though I don't fully agree with this, allowing it does open a can of worms (as the hi example showed) and maybe for now it's simpler to prevent this.
if we do that, that definitely forces another change that was not mentioned: currently math symbols are considered as alphanumeric. This is a hack I introduced a while ago due to the backwards compatibility issue mentioned above in the discussion. This will have to go, otherwise we won't be able to define any math symbol as a keyword.
have more meaningful type names. Keyword is not really an appropriate name for symbols like ⊗. One could create a new type called Operator (which, just like Keyword, would be a fake type anyway... they're all Symbols internally).
Even for these operators, there'd still be an issue if two different packages try to define the same say cow symbol. There needs to be a way of dealing with this, and this remains the most problematic issue. One could make the symbols local (right now all Keyword are really global symbols), but that wouldn't really the solve the parsing issue -- if for some reason one package wants to use a symbol as unary and the other as binary, it will be hard to reconcile.
Only have makeKeyword in core, not packages: this one I strongly disagree with, and will not implement. Users are not babies. If they want they can already easily break M2 in all kinds of ways (but why would they?). Keywords (or operators, or whatever they will be called) should be definable in packages.

mahrud · 2024-12-17T01:21:45Z

Limiting makeKeyword to the interpreter or the Core was mainly a solution to the problematic issue that you mentioned. Ultimately this PR introduces some useful features which can be added now, but jumping directly from idea to production for major changes is a recipe for ending up with compatibility nightmares and packages that only work with a specific version of M2 or indecipherable issues that take valuable time to resolve.

Perhaps a good place to start is a wiki or discussion page where you lay out a proposal (similar to Python Enhancement Proposals) with concise technical specifications of how you believe keywords/operators/etc should be handled, from parsing and binding in the interpreter all the way to the packages. This can simultaneously serve as a documentation of your contribution to this part of the interpreter. This can be discussed and amended, until there's agreement.

pzinn marked this pull request as draft November 28, 2024 08:12

pzinn changed the title ~~1st attempt: m2 level makeKeyword~~ User-defined keywords Nov 28, 2024

1st attempt: m2 level makeKeyword

a0c7a5f

pzinn force-pushed the keyword branch from 5d4159c to a0c7a5f Compare November 28, 2024 08:18

start actors5.d cleanup

092541f

pzinn force-pushed the keyword branch from 16f6a48 to 092541f Compare November 28, 2024 09:50

implement user-defined prefix, postfix, binary keywords

45f7ca9

pzinn force-pushed the keyword branch from 4791da6 to 43b5d68 Compare November 28, 2024 23:28

fix augmented ops

5aac89c

pzinn force-pushed the keyword branch 4 times, most recently from 3736244 to 62b1eca Compare November 30, 2024 11:20

pzinn marked this pull request as ready for review December 1, 2024 10:01

d-torrance reviewed Dec 1, 2024

View reviewed changes

M2/Macaulay2/d/binding.d Outdated Show resolved Hide resolved

pzinn force-pushed the keyword branch from eb35b6b to a0cf2b2 Compare December 2, 2024 01:35

d-torrance reviewed Dec 2, 2024

View reviewed changes

M2/Macaulay2/m2/methods.m2 Show resolved Hide resolved

pzinn force-pushed the keyword branch from a0cf2b2 to 0fc2823 Compare December 3, 2024 04:28

mahrud reviewed Dec 3, 2024

View reviewed changes

pzinn force-pushed the keyword branch from 0fc2823 to 4bbe6ca Compare December 3, 2024 08:03

pzinn added 10 commits December 3, 2024 19:29

minor keyword/symbol fixes

4720867

clean up binding.d

f34fe1c

fix pseudocode/disassemble for keywords

288c882

prevent "new Keyword"

699cd90

makeKeyword checks and errors

9d48492

fix getSymbol bugs

d14dca3

add isvalidkeyword

5fe5403

removed unicode keywords, added tests

fe074d2

fix tex Symbol/Keyword

6449f20

doc makeKeyword

505f978

pzinn force-pushed the keyword branch from 4bbe6ca to 505f978 Compare December 3, 2024 08:30

d-torrance added the under discussion label Dec 16, 2024

User-defined keywords #3599

Are you sure you want to change the base?

User-defined keywords #3599

Conversation

pzinn commented Nov 28, 2024 • edited Loading

pzinn commented Dec 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzinn Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzinn Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mahrud commented Dec 3, 2024

pzinn commented Dec 3, 2024

pzinn commented Dec 3, 2024

mahrud commented Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzinn Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

pzinn Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzinn Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

pzinn Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pzinn commented Dec 3, 2024 • edited Loading

pzinn commented Dec 3, 2024

pzinn commented Dec 3, 2024

mahrud commented Dec 3, 2024

pzinn commented Dec 3, 2024

mahrud commented Dec 3, 2024

pzinn commented Dec 3, 2024

mahrud commented Dec 3, 2024 • edited Loading

pzinn commented Dec 3, 2024 • edited Loading

pzinn commented Dec 3, 2024

pzinn commented Dec 16, 2024 • edited Loading

mahrud commented Dec 17, 2024 • edited Loading

pzinn commented Nov 28, 2024 •

edited

Loading

pzinn Dec 2, 2024 •

edited

Loading

pzinn Dec 3, 2024 •

edited

Loading

mahrud commented Dec 3, 2024 •

edited

Loading

pzinn Dec 3, 2024 •

edited

Loading

pzinn Dec 3, 2024 •

edited

Loading

pzinn Dec 3, 2024 •

edited

Loading

pzinn Dec 3, 2024 •

edited

Loading

pzinn commented Dec 3, 2024 •

edited

Loading

mahrud commented Dec 3, 2024 •

edited

Loading

pzinn commented Dec 3, 2024 •

edited

Loading

pzinn commented Dec 16, 2024 •

edited

Loading

mahrud commented Dec 17, 2024 •

edited

Loading