Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-defined keywords #3599

Open
wants to merge 14 commits into
base: development
Choose a base branch
from
Open

User-defined keywords #3599

wants to merge 14 commits into from

Conversation

pzinn
Copy link
Contributor

@pzinn pzinn commented Nov 28, 2024

Work in progress to implement user-defined keywords: (cf #3593)

i1 : makeKeyword "×"

o1 = ×

o1 : Keyword

i2 : ZZ × ZZ := times

o2 = times

o2 : CompiledFunction

i3 : 1×2

o3 = 2

The most important change is to have a default operation (which applies to 99% of existing keywords, and to the newly created user-defined keywords) which is to look up the corresponding method. so no need to define all these identical functions one-by-one.

TODO:

  • treat all the different cases (unary, binary, precedence) by adding options to the m2 level makeKeyword
  • simplify drastically actors5.d by removing all unnecessary functions
  • transfer the creation of keywords to m2 level (core)
  • get rid of opsWithBinaryMethod etc
  • get rid of getBinopName etc and rewrite pseudocode accordingly
  • simplify the tex routines
  • possibly remove entirely the fictitious Keyword class, instead providing some information in the Symbol Afterprint
  • fix the other minor issues mentioned in User-created keywords #3593
  • document makeKeyword

@pzinn pzinn marked this pull request as draft November 28, 2024 08:12
@pzinn pzinn changed the title 1st attempt: m2 level makeKeyword User-defined keywords Nov 28, 2024
@pzinn pzinn force-pushed the keyword branch 4 times, most recently from 3736244 to 62b1eca Compare November 30, 2024 11:20
@pzinn
Copy link
Contributor Author

pzinn commented Dec 1, 2024

I might stop here, to keep this PR relatively small and to avoid the more controversial items on my list.

@pzinn pzinn marked this pull request as ready for review December 1, 2024 10:01
w:=makeUniqueWord(s.v, parseinfo(prec,bprec,uprec,parsefuns(u,t)));
when globalLookup(w) is x:Symbol do buildErrorPacket("symbol already in use")
else (
install(s.v,w); -- TODO check whether install is really needed (for mathematical symbols as opposed to words)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any updates on whether we need to call install here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is rather subtle. my understanding is, it's harmless to always have install here (except maybe for a tiny slowdown of the parsing?). The point is, this is useless for keywords that are actual words (like and) because you need to separate them with spaces anyway (so they behave more similarly to regular symbols). But at the moment I've put limited constraints on allowed keywords (see isvalidkeyword -- which may require more restrictions), so people can have keywords like @*@ for which install is definitely necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased slightly the comment, but haven't actually changed anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think about this some more. Relatedly, at this stage nothing would prevent us from removing the section in lex.d which you wrote else if ismathoperator(peek2(file)) then ( etc except maybe unicode synonyms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answering myself: no that doesn't work. math operators now have a slightly ambiguous status of being both "alphanumeric" (i.e., usable in a regular symbol) and have special parsing rules (so they can be used in keywords, e.g., binary operators).

Copy link
Contributor Author

@pzinn pzinn Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, right now, this PR doesn't change the status of math operators, so everything works fine as written.
but yes, as you pointed out, a nice feature of this PR is one can even define unary (prefix) operators which for all practical purposes work the same as a MethodFunction. so that would definitely be one way forward. might want to have a separate PR for this whole mess. In the meantime, I will fix the install thing today so we can resolve this particular comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be so much easier if we could declare math symbols to be non alphanumeric. In fact this was my original idea in my first PR on unicode symbols, but that breaks an existing package which defined the sum and product unicode symbols, so I had to change it. now we're paying the price for this backwards compatibility.

I don't follow this then. How does declaring math symbols to be non-alphanumeric break backwards compatibility with a package that uses non-alphanumeric method names? Unless you meant declare any non-alphanumeric string to be a math operator automatically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should aim for the ability to declare different types of operators, binary, unary left/right, with prescribed precedents.

Copy link
Contributor Author

@pzinn pzinn Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be so much easier if we could declare math symbols to be non alphanumeric. In fact this was my original idea in my first PR on unicode symbols, but that breaks an existing package which defined the sum and product unicode symbols, so I had to change it. now we're paying the price for this backwards compatibility.

I don't follow this then. How does declaring math symbols to be non-alphanumeric break backwards compatibility with a package that uses non-alphanumeric method names?

you mean, with a package that uses math-operator method names?
because methods have ordinary symbols (as opposed to keywords), and these are required to be alphanumeric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated so install is only called when needed.

M2/Macaulay2/d/binding.d Outdated Show resolved Hide resolved
@mahrud
Copy link
Member

mahrud commented Dec 3, 2024

Since lots of things are being refactored here, what do you think about also changing the terminology? My understanding is that the classes Symbol and Keyword are identical, except that keywords are symbols which are reserved by the language and can't be assigned values, etc. This means things like and, then, but also operators like +, etc. But with this PR users can define new keywords, which implies they are no longer reserved.

So I think we should create a new class Operator for only non-alphanumeric math operators which are distinct from reserved alphanumeric keywords.

@pzinn
Copy link
Contributor Author

pzinn commented Dec 3, 2024

sure.

@pzinn
Copy link
Contributor Author

pzinn commented Dec 3, 2024

though it would be nice to have a common ancestor, since I don't want to have two different methods makeKeyword and makeOperator which would do the exact same thing.

@mahrud
Copy link
Member

mahrud commented Dec 3, 2024

I guess what I'm saying is we should not have makeKeyword at all, since by definition keywords are the preserved symbols.

edit: to clarify, a makeKeyword function to simplify declaring alphanumeric only reserved symbols (e.g. and, or etc.) in the interpreter is totally fine, I'm saying there shouldn't be a top level one. On the other hand, a makeOperator function to simplify things in the interpreter and also exporting it to allow users to define new operators is fine.

⋯:=symbol ⋯
⋱:=symbol ⋱
⋮:=symbol ⋮
…:=symbol …
-- used e.g. in chaincomplexes.m2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about what's happening here, and also don't think this particular syntax is good practice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, maybe cdots = ⋯ = new Operator from symbol ⋯ is preferable, and later in this file it would be better to use the synonym cdots rather than .

Copy link
Contributor Author

@pzinn pzinn Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, what was there before was worse, since it defined a sub-Type of Symbol, which is not a good idea in general. this has the merit that it works.

Copy link
Contributor Author

@pzinn pzinn Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but OK with synonyms, done

Comment on lines 185 to -188
protect Flexible
protect Binary
protect Prefix
protect Postfix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come Flexible is kept here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangentially, I think a syntax like:

otimes = ⊗ = new BinaryOperator from symbol ⊗
otimes.precedence = precedence symbol *

would be nice, though perhaps for flexible operators it's a bit more tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem with using the syntax new ... from symbol ... is that there is no room for options, and I'm not sure how to implement this .precedence thing.

Copy link
Contributor Author

@pzinn pzinn Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come Flexible is kept here?

dunno I didn't touch this

Copy link
Contributor Author

@pzinn pzinn Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I have moved Flexible too? I'm not even clear what it means.
more generally I don't know what operatorAttributes is for, so I didn't touch it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dunno I didn't touch this

You removed protect Binary and Prefix and Postfix but kept Flexible here. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, besides what I wrote above, the practical reason was that I didn't need Flexible.

@pzinn
Copy link
Contributor Author

pzinn commented Dec 3, 2024

I guess what I'm saying is we should not have makeKeyword at all, since by definition keywords are the preserved symbols.

edit: to clarify, a makeKeyword function to simplify declaring alphanumeric only reserved symbols (e.g. and, or etc.) in the interpreter is totally fine, I'm saying there shouldn't be a top level one. On the other hand, a makeOperator function to simplify things in the interpreter and also exporting it to allow users to define new operators is fine.

err? that's exactly what this PR is doing... Example

makeKeyword "implies"
Boolean implies Boolean := (a,b) -> not a or b
false implies true

@pzinn
Copy link
Contributor Author

pzinn commented Dec 3, 2024

to recap: two types of Keyword exist:

  1. the ones that look like normal (alphanumeric) symbols, like and
  2. the ones made of special characters, like **. These are the ones that require install for special parsing.

both are (and should be) definable using makeKeyword. The only confusing bit is that mathematical operators like × are classified as normal symbols (so in 1. above). This is what I was complaining about before, and should probably be reverted one day, though not in this PR since it would require a significant rewrite (not to mention the backward compatibility problem).

@pzinn
Copy link
Contributor Author

pzinn commented Dec 3, 2024

oh yeah this PR also fixes this silly bug: getGlobalSymbol ""

@mahrud
Copy link
Member

mahrud commented Dec 3, 2024

If a user can define implies as in your example, then it is by definition not a reserved symbol, therefore it shouldn't be called a keyword. It's just a new type of method attached to a symbol, which is very useful, but is different.

@pzinn
Copy link
Contributor Author

pzinn commented Dec 3, 2024

sure, I'm happy to name it something else. but the point is, for all practical purposes, in that example, and and implies behave in the exact same way, and therefore should probably have the same class.

@mahrud
Copy link
Member

mahrud commented Dec 3, 2024

for all practical purposes, in that example, and and implies behave in the exact same way, and therefore should probably have the same class.

Then users should not be able to define them in the top level. They should be defined in the interpreter, compiled, and fixed in the language.

@pzinn
Copy link
Contributor Author

pzinn commented Dec 3, 2024

for all practical purposes, in that example, and and implies behave in the exact same way, and therefore should probably have the same class.

Then users should not be able to define them in the top level. They should be defined in the interpreter, compiled, and fixed in the language.

well, I disagree. It certainly doesn't harm to give users that power.

@mahrud
Copy link
Member

mahrud commented Dec 3, 2024

It certainly does harm if it's hastily implemented without much longer discussion and evaluation. What happens if a package makes foo into a keyword and another makes it a symbol for a method or variable? What happens if there's a race condition in which one is defined first?

Again, I'm all in favor of allowing users to define non-alphanumeric math operators in the top level, because those can't be used as symbols for methods or variables, and we already know how to deal with different packages overriding the same method, but keywords are by definition preserved words which can't be altered by the user.

@pzinn
Copy link
Contributor Author

pzinn commented Dec 3, 2024

actually, it makes no difference whether the keyword is alphanumeric or not. there will be a conflict if two packages define the same keyword. Right now it's a non issue because no package does so, but eventually we need a mechanism to resolve if say two different packages want to use × or ~~~ for different purposes. (The biggest problem I can see is, we can't reconcile them if they have different Syntax or Precedence)
edited to give an example that's actually non-alphanumeric (sigh)

@pzinn
Copy link
Contributor Author

pzinn commented Dec 3, 2024

To clarify, Symbols can be in different dictionaries, but the underlying Words, which carry the parsing information, cannot (and shouldn't precisely because they affect parsing).

@pzinn
Copy link
Contributor Author

pzinn commented Dec 16, 2024

Let me try to summarise some of the changes that have been requested during the M2internals meeting:

  • not let users define alphanumeric keywords. though I don't fully agree with this, allowing it does open a can of worms (as the hi example showed) and maybe for now it's simpler to prevent this.
  • if we do that, that definitely forces another change that was not mentioned: currently math symbols are considered as alphanumeric. This is a hack I introduced a while ago due to the backwards compatibility issue mentioned above in the discussion. This will have to go, otherwise we won't be able to define any math symbol as a keyword.
  • have more meaningful type names. Keyword is not really an appropriate name for symbols like . One could create a new type called Operator (which, just like Keyword, would be a fake type anyway... they're all Symbols internally).
  • Even for these operators, there'd still be an issue if two different packages try to define the same say cow symbol. There needs to be a way of dealing with this, and this remains the most problematic issue. One could make the symbols local (right now all Keyword are really global symbols), but that wouldn't really the solve the parsing issue -- if for some reason one package wants to use a symbol as unary and the other as binary, it will be hard to reconcile.
  • Only have makeKeyword in core, not packages: this one I strongly disagree with, and will not implement. Users are not babies. If they want they can already easily break M2 in all kinds of ways (but why would they?). Keywords (or operators, or whatever they will be called) should be definable in packages.

@mahrud
Copy link
Member

mahrud commented Dec 17, 2024

Limiting makeKeyword to the interpreter or the Core was mainly a solution to the problematic issue that you mentioned. Ultimately this PR introduces some useful features which can be added now, but jumping directly from idea to production for major changes is a recipe for ending up with compatibility nightmares and packages that only work with a specific version of M2 or indecipherable issues that take valuable time to resolve.

Perhaps a good place to start is a wiki or discussion page where you lay out a proposal (similar to Python Enhancement Proposals) with concise technical specifications of how you believe keywords/operators/etc should be handled, from parsing and binding in the interpreter all the way to the packages. This can simultaneously serve as a documentation of your contribution to this part of the interpreter. This can be discussed and amended, until there's agreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants