Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler doesn't always parse identifiers as atomic units #18229

Closed
MjrTom opened this issue Jan 12, 2025 · 6 comments
Closed

Compiler doesn't always parse identifiers as atomic units #18229

MjrTom opened this issue Jan 12, 2025 · 6 comments

Comments

@MjrTom
Copy link

MjrTom commented Jan 12, 2025

Please provide a succinct description of the issue.

The compiler doesn't always parse identifiers (such as those with components delimited by a .) as atomic units. This causes erroneous identification of "keywords" where the compiler should not be looking for keywords. In fact the compiler will then attempt to perform that "keyword" operation.

Repro steps

Provide the steps required to reproduce the problem:

Put the line:
open java.util.function
in an appropriate location in any .fs file. Regardless of what other issues may be true, the error FS0010 for the (reserved) keyword function is faulty. It is a component of an identifier, not subject to being scrutinized for keywords.

If possible attach a zip file with the repro case. This often makes it easier for others to reproduce.
The zip file should ideally represent the situation just before the call/step that is problematic.

A zipped VS2022 Solution folder containing a project with a single file demonstrating the bug for all keywords known to me is attached.

FSharpIdentifierParseTest.zip

Expected behavior

Provide a description of the expected behavior.

identifiers that contain only characters allowed for identifiers must not be parsed from identifier component delimiters (such as .) for keywords. That statement does not refer to identifiers that are only a keyword, but there might be additional corner-cases to consider (such as an attempt to establish an identifier that is a keyword followed by a delimiter and nothing else). A delimited identifier starting with a keyword followed by a delimiter is a more complex consideration.

Actual behavior

Provide a description of the actual behavior observed.

In statements with identifiers that use .as an identifier component delimiter in a permitted fashion, the compiler leaves identifier observation mode and erroneously parses for keywords, mistakenly identify identifier components as keywords.

This bug completely blocks interop with immutable established external code that has naming that is mis-parsed.

Known workarounds

Provide a description of any known workarounds.

All prospective workarounds I've tried cannot function because the parser bug prohibits them. This seems to be a general bug that I think might have no possible comprehensive workaround. The case I ran into it was a matter of 'open java.util.function'. For this specific type of example, I might be able to alias mis-parsed namespaces in C# and access the aliases from F#, but I haven't tried that yet.

Related information

Provide any related information (optional):

This might relate to:

#10043

  • Windows 11

  • .NET 9.101, .NET Framework Version 4.8.09032

  • Visual Visual Studio Community 2022 (64-bit), version 17.12.3

VSCode displays the same results of the compiler parser bug

@brianrourkeboll
Copy link
Contributor

What if you wrap the function in double backticks? ….``function``

@Martin521
Copy link
Contributor

@MjrTom
I am afraid this is the expected behavior.
F#, like most languages, treats keywords as reserved words that cannot be used as identifiers.
(This applies also to C#, except for some newer contextual keywords.)
For a workaround, see the above comment. (In C#, add a @ prefix.)

@MjrTom
Copy link
Author

MjrTom commented Jan 12, 2025

@MjrTom I am afraid this is the expected behavior. F#, like most languages, treats keywords as reserved words that cannot be used as identifiers. (This applies also to C#, except for some newer contextual keywords.) For a workaround, see the above comment. (In C#, add a @ prefix.)

It seems to me that the dot delimiter should not break atomicity of identifiers, but maybe I'm mistaken.

@MjrTom
Copy link
Author

MjrTom commented Jan 12, 2025

What if you wrap the function in double backticks? ….function

Previously, I had only tried, for example:

open ``java.util.function``

I discovered that atomic namespace identifiers wrapped in double backticks are not resolved the way I'd expect. For example:

open ``java.util``

results in: "The namespace or module 'java.lang' is not defined."

Even though:

open java.util

functions as expected. Information I'd encountered led me to think enclosing the whole thing should have been the solution.

After reading your precise wording carefully, it appears:

open java.util.``function``

Does enable proper operation. Thank you for clarifying that. I had not come across this detail in my attempts to resolve it on my own.

@Martin521
Copy link
Contributor

It seems to me that the dot delimiter should not break atomicity of identifiers, but maybe I'm mistaken.

The tokenizer (lexer) breaks the input text into tokens (like keywords and identifiers) according the these rules.
In a next step, long-idents are parsed by the parser according to these rules, accepting only valid identifiers (no keywords) as constituents.
This way, the long-ident represents a path that the compiler can easily search for.
By enclosing the whole long-ident in double backticks, you make it a single identifier that is not found in the tree of available modules.

@MjrTom
Copy link
Author

MjrTom commented Jan 12, 2025

It seems to me that the dot delimiter should not break atomicity of identifiers, but maybe I'm mistaken.

By enclosing the whole long-ident in double backticks, you make it a single identifier that is not found in the tree of available modules.

In reading those sections, I think what surprises me, and throws off my view, is that long-idents can include whitespace.

IF they could not include whitespace, then whitespace would be a terminator for long-idents, and it naively seems like the parsing order could be swapped, and java.util.function would be an atomically correct long-identifier, and then when it's scrutinized for constituent components, the tree is traversable with those components, and the components are assumed valid until proven otherwise. In this example, the traversal would succeed.

I recognize that long-idents including whitespace is codified and therefore a change away from that is surely quite unlikely. I'll accept that enclosing the offending component, and only it, in double backticks is just something to know in F# 9+.

Thanks for explaining it. I tried to read the specifications, but that didn't go well enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants