Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Html script closing-tag not detected when occurs immediately after 2 slashes #49

Closed
SDP190 opened this issue Jun 16, 2017 · 15 comments
Closed
Labels
bug Issue identified by VS Code Team member as probable bug
Milestone

Comments

@SDP190
Copy link

SDP190 commented Jun 16, 2017

Hi, I found a bug, with the html grammar in VS Code, when you place a closing script-tag in the same line immediately after 2 fwd slashes, as:
<script> //</script>
the grammar will not detect the closing tag, and will display it with wrong color, see images.

(I'm not sure if the issue is related only to vscode-textmate, or it belongs to textmate/html.tmbundle).

VS Code:
sample

Visual Studio 2015:
capture


  • VSCode Version: Code 1.13.1 (379d2efb5539b09112c793d3d9a413017d736f89, 2017-06-14T18:21:47.485Z)
  • OS Version: Windows_NT ia32 10.0.15063

Steps to Reproduce:

  1. Open any html document with VSCode.
  2. Insert a script element in the document, and place the script's closing-tag immediately after 2 fwd slashes.
@alexdima
Copy link
Member

I think this is not a vscode-textmate issue. The problem here is that the html grammar includes source.js which is HTML agnostic and "eats up" all characters of a line comment to the end of the line for example. IMHO we would need to have a special source.html-embedded.js or something which is always looking for </script>. AFAIK there are no other workarounds for an embedded TM grammar.

@SDP190
Copy link
Author

SDP190 commented Aug 29, 2017

I posted this issue here on textmate's repository, here is what @infininight commented:
This does not happen in TextMate with the current bundle. You will need to talk with the VSCode devs to see if it is up-to-date or a difference in the parsing engines.

So it seams the problem is indeed related to VSCode only!?

@alexdima
Copy link
Member

Yes, this is a VS Code only problem that occurs only with the combination of HTML + JS grammars that are shipped with VS Code. The problem does not occur with the HTML + JS grammars that are shipped with TextMate.

In other words, it is not a problem with the vscode-textmate (a TM grammar interpreter), but with the grammars and their usage together.

@SDP190
Copy link
Author

SDP190 commented May 31, 2018

Still not fixed, WOW! This should be marked as a BUG, since a Closing-Tag after 2 slashes is valid HTML syntax, and so should it be interpreted in VS Code.

Who is responsible for this problem?

@infininight
Copy link

So an explanation of this issue so it can better be tracked down:

In Javascript generally // marks the rest of the line as a comment, in a pure JavaScript document </script> tags could be present so ending on them isn't proper. To cover for this eventuality in the HTML grammar we include a special comment rule that is aware of the closing tag and include it higher than the JavaScript grammar include so it should match before the generic comment rule from the JS grammar itself.

So the HTML grammar portion of this should be the same in VSCode as far as I know. So I suspect the issue would be on the JavaScript side of the grammar with it's rule matching instead of the special case comment rule. I have very little knowledge of how VSCode is structured so I don't know how JavaScript is handled so further investigations on that side will have to be done by someone else.

There is a wider issue of blocks in general being able to break the </script> tag closure so this may be something we investigate in the future on the TextMate side as I have just introduced the initial support of being able to highlight JavaScript code in the event handler attributes and this will require a specialized grammar variant.

@SDP190
Copy link
Author

SDP190 commented Jun 3, 2018

@infininight, please understand that this issue does only occur when the closing tag appears right after the slashes, however if there is a space or any character before the closing tag, it is working properly, see embedded picture, so that's that's confusing, if the problem is because the grammar interprets this full line as a javascript comment why does is not happening even if the closing tag is not immediately after the (//) slashes?

capture

@SDP190 SDP190 closed this as completed Jun 3, 2018
@SDP190 SDP190 reopened this Jun 3, 2018
@msftrncs
Copy link
Contributor

@SDP190 , the HTML.TMLANGUAGE.JSON file needs the following fix, making line 1850 (from VS Code 127.2) read as:

	"end": "(?=</script)|(?!\\G)",

@alexdima
Copy link
Member

fyi @aeschli . Not sure where we landed on this. Do we now have a special embedded JS grammar that is HTML aware or not? In any case #49 (comment) proposes a change to the html grammar...

@infininight
Copy link

@msftclas I have not tested this in VSCode but I don't see why that would work, unless VSCode doesn't support the \G flag which is a non-standard addition we've added in TextMate.

In TextMate \G matches the end of the begin match. So what your end should do is end when it sees </script as a look-ahead or as soon as \G is not true. Since \G should be the end of the begin that means as soon as we advance the parse one character, wether we match anything or not. This isn't the desired outcome as the comment rule would end after a single space character.

Now it may be that VSCode doesn't support \G in the same manner which is totally possible. In this case the behavior would be undefined and you might be seeing the right behavior for the wrong reasons.

Since your investigating: (?=</script) should already be ending the rule when necessary, see if you can track down why it is not matching.

@msftrncs
Copy link
Contributor

The reason it was not matching, is because the search term was //</script> at the start, it matched // to begin, and was waiting for (?!\G) to end, while waiting at the <, but the (?!\G) doesn't 'HIT' until the next character, and with nothing else to match in that block, the cursor advanced to the /script> point, making any possibility to match </script impossible. So by giving the end an alternate of the actual term we're looking for </script (using a lookahead) we can short-circuit the begin-end rule.

I wonder if textmate answers the (?!\G) without advancing the cursor? Like maybe its 'looking ahead' to see if the next position would not match the anchor?

@infininight
Copy link

So there are two rules at play here (shown in plist format, technically simplified slightly):

{   begin = '(>)';
    end = '((<))(?=/(?i:script))';
},
{   begin = '(^[ \t]+)?(?=//)';
    end = '(?!\G)';
    beginCaptures = { 1 = { name = 'punctuation.whitespace.comment.leading.js'; }; };
		patterns = (
        {   name = 'comment.line.double-slash.js';
            begin = '//';
            end = '(?=</script)|\n';
        },
    );
},

In the first rule we enter the script tag consuming the tags closing >, at this point the second rule becomes active. The code in question is:

        //</script>

So we enter the second match on the first column of this line for the optional whitespace match. The actual comment interior match starts matching on the // characters. So at this point the parser is at the here:

        //‸</script>

The end of this interior rule is (?=</script)|\n. This should immediately be matched here exiting the rule before any characters are consumed and before the parser advances past the caret. This the failsafe added to this pattern array because the standard JavaScript comment rule is not aware of the script tag and would happily consume to the end of the line. It should be the one matching due to it being a higher in the processing order. I would verify this is true by changing the scope of this rule to make sure it is the one matching.

At this point we should have exited the interior rule and should also immediately exit it's parent rule due to it's \G no longer bing true leaving us in the pattern array of the first rule which should also immediately match. It is not until this first rule that the parser should have consumed a single < character and advanced the parser.

Advancing the parser to the next column should happen for two reasons: You consume a character or characters advancing the parser to the end of the consumed characters, or after every rule in the active pattern array has been tried and not matched at which point the parser would advance a single character and retry.

So I don't know what is tripping up the VSCode parser here. While the fix you cited might work it is not consistant with how the parser should work. Since this basic comment rule structure is used on almost every single TextMate grammar it would be good to track down the actual interaction with the parser that is causing the actual issue so the root cause can be corrected.

I'm not at at all familiar enough with how VSCode works to do this tinkering but the way I would do it would be to deconstruct the rules in question to the smallest reproducible test case possible. Because of changes to the <script> tag parsing that section has ballooned to 153 lines of code, need to narrow down the search parameters. If someone wants to tackle this feel free to ask questions if your unsure how the grammar structure works and I'm available on freenode IRC if someone wants interactive feedback.

@msftrncs
Copy link
Contributor

msftrncs commented Sep 22, 2018

Okay, the difference I can see with VSCode-TextMate is that the \G anchor point when it hits the (?=</script)|\n is as you described, pointing to the < and there is where it stays when that end-match completes, so also when the previous BeginEnd end match is tested for, (?!\G). At this point, we're stuck in this simple rule. The (?!\G) can find a match at the next character location, and the only other rule // cannot find any matches on the rest of the line, so the cursor is advanced to the next character, the end rule was given the match, and the rule is popped. Now we're looking for a (?=</(?i:script)) as the end match for the previous rule, but the engine sees /script>, no match.

I suspect that TextMate's handling of the anchor position (for matching \G) is different when it pop's from a rule, than is VSCode-TextMate's. I know from wondering through some test runs that VSCode's behavior is that if the cursor hasn't advanced past the end of the last BEGIN rule's match, \G is given the match, and that would make a big difference if TextMate treats it, after ENDing a rule, it restores the previous rule's BEGIN match point for the anchor comparison. I could see the syntax (without my proposed change) working in that situation.

Here is a debug of vscode (trimmed) ( x,{y} is cursor position {anchor position}, |x| only the active part of the line is shown, and matched: x / y is matched cursor position / final cursor position. )

@@scanNext 0,{0}: |//</script>\n|
  scanning for
   - -1: (?=</script)|(?!\G)
   - 1043: //
matched: 0 / 2
  pushing BeginEndRule#1043 @ html.tmLanguage.json:1852 - //
  token: |//|
      * text.html.basic
      * meta.embedded.block.html
      * source.js
      * comment.line.double-slash.js
      * punctuation.definition.comment.js

@@scanNext 2,{2}: |</script>\n|
  scanning for
   - -1: (?=</script)|\n
matched: 2 / 2
  popping BeginEndRule#1043 @ html.tmLanguage.json:1852 - (?=</script)|\n

@@scanNext 2,{2}: |</script>\n|
  scanning for
   - -1: (?!\G)
   - 1043: //
matched: 3 / 3
  popping BeginEndRule#1041 @ html.tmLanguage.json:1843 - (?!\G)
  token: |<|
      * text.html.basic
      * meta.embedded.block.html
      * source.js

Notice how in the last segment, the end rule fired, but at position 3, and passed over the <. So this is the next match vscode attempts:

@@scanNext 3,{2}: |/script>\n|
  scanning for
   - -1: (?=</(?i:script))
   - 1041: (^[ \t]+)?(?=//)
   … (400 lines of javascript rules removed)
matched: 3 / 4
  matched MatchRule#597 @ JavaScript.tmLanguage.json:2924 - %|\*|/|-|\+
  token: |/|
      * text.html.basic
      * meta.embedded.block.html
      * source.js
      * keyword.operator.arithmetic.js

In this case, I am betting that after the BeginEndRule#1043 is popped in TextMate's engine, it reverts the anchor match condition to 0 where the prior lookahead BeginEnd rule matcheda value representing that no anchor exists on the current line. I could see this being the correct behavior. However, I would prefer something in documentation that tells tmlanguage developers that is the case, that the anchor \G only matches at the end of the BEGIN rule for the current active BeginEnd rule.

@msftrncs
Copy link
Contributor

Note, the {anchor position} is my addition in my fork. so I could diagnose an anchor problem.

@infininight
Copy link

@msftrncs Sorry missed your followup question but got pointed to this from a Pug related question.

In this example when we exit the comment rule by (?=</script)|\n this would invalidate that rules \G match. I could check to see if the parent rules \G is restored but I can't think of any time it would be relevant. The two uses of \G would be make sure we are still beside the begin or ensure we aren't. Entering and leaving a pattern array without consuming any characters would be the only way we would need to rematch a previous \G match but this would cause an infinite loop anyway so not really relevant.

So I think any issues can be solved by making sure the anchor is cleared when leaving an end and either restoring the parent rules state if any or just removing it entirely. Or to answer your closing question, yes the \G anchor only matches the begin of the current begin/end rule or begin/while rule.

cc #82

@alexdima
Copy link
Member

alexdima commented Apr 4, 2019

Thank you @msftrncs , this issue is also fixed by #85

@alexdima alexdima closed this as completed Apr 4, 2019
@alexdima alexdima added the bug Issue identified by VS Code Team member as probable bug label Apr 4, 2019
@alexdima alexdima added this to the April 2018 milestone Apr 4, 2019
david-driscoll pushed a commit to OmniSharp/vscode-textmate that referenced this issue Oct 7, 2019
Add support for named ref/out arguments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue identified by VS Code Team member as probable bug
Projects
None yet
Development

No branches or pull requests

4 participants