Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added TypeScript PhpLexerBase for php grammar #3994

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tomaspiaggio
Copy link

I added the TypeScript class for using the php grammar from the typescript target. It's very similar to the JavaScript one.

@kaby76
Copy link
Contributor

kaby76 commented Mar 1, 2024

This code doesn't compile, and even after fixing the compilation errors, it does not work.

Please make the following changes:

  • Add "TypeScript" to <targets> to the desc.xml.
diff --git a/php/desc.xml b/php/desc.xml
index 67714256..2f5d45a0 100644
--- a/php/desc.xml
+++ b/php/desc.xml
@@ -1,5 +1,5 @@
 <?xml version="1.0" encoding="UTF-8" ?>
 <desc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../_scripts/desc.xsd">
    <antlr-version>^4.10</antlr-version>
-   <targets>CSharp;Java;Python3</targets>
+   <targets>CSharp;Java;Python3;TypeScript</targets>
 </desc>
  • Modify the declaration of PhpLexerBase.
$ diff Generated-TypeScript/PhpLexerBase.ts ./TypeScript/
6c6
< export default abstract class PhpLexerBase extends Lexer {
---
> export class PhpLexerBase extends Lexer {
  • Fix "here docs". It is not lexing correctly. This is what the lexer produces:
$ bash run.sh ../examples/heredoc.php -tokens
[@0,0:4='<?php',<4>,channel=4,1:0]
[@1,5:6='\n\n',<39>,channel=4,1:5]
[@2,7:9='foo',<225>,3:0]
[@3,10:10='(',<212>,3:3]
[@4,11:21='<<< HEREDOC',<235>,3:4]
[@5,22:22='\n',<243>,3:15]
[@6,23:38='Heredoc line 1.\n',<243>,4:0]
[@7,39:54='Heredoc line 2.\n',<243>,5:0]
[@8,63:63='HEREDOC\n;',<213>,7:0]
[@9,64:64=';',<220>,7:1]
[@10,65:66='\n\n',<39>,channel=4,7:2]
[@11,67:69='foo',<225>,9:0]
[@12,70:70='(',<212>,9:3]
[@13,71:82='<<< 'NOWDOC'',<234>,9:4]
[@14,83:83='\n',<243>,9:16]
[@15,84:98='Nowdoc line 1.\n',<243>,10:0]
[@16,99:113='Nowdoc line 2.\n',<243>,11:0]
[@17,114:120='NOWDOC\n',<243>,12:0]
[@18,121:123=');\n',<243>,13:0]
[@19,124:124='\n',<243>,14:0]
[@20,125:139='$str = "asdf";\n',<243>,15:0]
[@21,140:140='\n',<243>,16:0]
[@22,141:160='$str1 = <<<HEREDOC1\n',<243>,17:0]
[@23,161:173='Hello world!\n',<243>,18:0]
[@24,174:183='HEREDOC1;\n',<243>,19:0]
[@25,184:184='\n',<243>,20:0]
line 21:0 token recognition error at: '?>'
[@26,187:186='<EOF>',<-1>,21:2]
TypeScript 0 ../examples/heredoc.php fail 0.021
Total Time: 0.043
03/01-17:16:12 ~/issues/g4-3994/php/Generated-TypeScript

Here is what it should be:

03/01-17:17:03 ~/issues/g4-3994/php/Generated-CSharp
$ bash run.sh ../examples/heredoc.php -tokens
[@0,0:4='<?php',<4>,channel=4,1:0]
[@1,5:6='\n\n',<39>,channel=4,1:5]
[@2,7:9='foo',<225>,3:0]
[@3,10:10='(',<212>,3:3]
[@4,11:21='<<< HEREDOC',<235>,3:4]
[@5,22:22='\n',<243>,3:15]
[@6,23:38='Heredoc line 1.\n',<243>,4:0]
[@7,39:54='Heredoc line 2.\n',<243>,5:0]
[@8,63:63='HEREDOC\n;',<213>,7:0]
[@9,64:64=';',<220>,7:1]
[@10,65:66='\n\n',<39>,channel=4,7:2]
[@11,67:69='foo',<225>,9:0]
[@12,70:70='(',<212>,9:3]
[@13,71:82='<<< 'NOWDOC'',<234>,9:4]
[@14,83:83='\n',<243>,9:16]
[@15,84:98='Nowdoc line 1.\n',<243>,10:0]
[@16,99:113='Nowdoc line 2.\n',<243>,11:0]
[@17,121:121='NOWDOC\n;',<213>,13:0]
[@18,122:122=';',<220>,13:1]
[@19,123:124='\n\n',<39>,channel=4,13:2]
[@20,125:128='$str',<224>,15:0]
[@21,129:129=' ',<39>,channel=4,15:4]
[@22,130:130='=',<221>,15:5]
[@23,131:131=' ',<39>,channel=4,15:6]
[@24,132:132='"',<233>,15:7]
[@25,133:136='asdf',<239>,15:8]
[@26,137:137='"',<233>,15:12]
[@27,138:138=';',<220>,15:13]
[@28,139:140='\n\n',<39>,channel=4,15:14]
[@29,141:145='$str1',<224>,17:0]
[@30,146:146=' ',<39>,channel=4,17:5]
[@31,147:147='=',<221>,17:6]
[@32,148:148=' ',<39>,channel=4,17:7]
[@33,149:159='<<<HEREDOC1',<235>,17:8]
[@34,160:160='\n',<243>,17:19]
[@35,161:173='Hello world!\n',<243>,18:0]
[@36,174:183='HEREDOC1;\n',<220>,19:0]
[@37,184:184='\n',<39>,channel=4,20:0]
[@38,185:186='?>',<220>,21:0]
[@39,187:186='<EOF>',<-1>,21:2]

CSharp 0 ../examples/heredoc.php success 0.0923946
Total Time: 0.2273878
03/01-17:17:16 ~/issues/g4-3994/php/Generated-CSharp

Note the large number of missing tokens in your TypeScript port.

NB: There is a bug in the other targets with the token text for the closing parentheses. That should be fixed, but you don't have to fix that.

  • The grammar contains code that is basically target specific in the lexer.
    HtmlScriptOpen : '<script' { _scriptTag = true; } -> pushMode(INSIDE);
    HtmlStyleOpen : '<style' { _styleTag = true; } -> pushMode(INSIDE);
    . _scriptTag should be this._scriptTag, etc. (These should be wrapped in methods because "true" is not the correct syntax in some targets, but we'll ignore fixing that for now.)

@teverett teverett added the php label Mar 3, 2024
@tomaspiaggio
Copy link
Author

@teverett hi, thanks for reviewing this. I'm not too familiar with this repository. I had to implement this and it was working for what I was doing so I thought of contributing it. I fixed some of the problems but I'm not sure how to fix the HEREDOC problem. Could you point me in the right direction? Thank you!

// Add semicolon to the end of statement if it is absent.
// For example: <?php echo "Hello world" ?>
if (this._prevTokenType === PhpLexer.SemiColon || this._prevTokenType === PhpLexer.Colon || this._prevTokenType === PhpLexer.OpenCurlyBracket || this._prevTokenType === PhpLexer.CloseCurlyBracket) {
token = super.nextToken()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is wrong. Check out the CSharp port. The code should be token.channel = 1;. ("1" is the hidden channel.)

token = super.nextToken()
} else {
token = new CommonToken(undefined, PhpLexer.SemiColon, undefined, undefined, undefined);
token.text = ';';
Copy link
Contributor

@kaby76 kaby76 Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change these two lines to correspond to that in the CSharp port: token.type = PhpLexer.SemiColon; and delete the "new CommonToken()" call.

else if (this._mode === PhpLexer.HereDoc) {
// Heredoc and Nowdoc syntax support: http://php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc
if (token.type === PhpLexer.StartHereDoc || token.type === PhpLexer.StartNowDoc) {
this._heredocIdentifier = token.text.slice(3).trim().replace(/\'$/, '');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not correspond to the CSharp port. This fails because in C#, trim(char) removes both the leading and trailing char. But, in TypeScript, replace(pat, string) only replaces the first occurrence of the pattern. So, 'NOWDOC is replaced with NOWDOC'. The way I did this is to have two replace(), one for the beginning of the string, and another for the end. Or, you could just call replace(/\'/, '') twice.

if (token.text.trim().endsWith(';')) {
token = new CommonToken((CommonToken as any).EMPTY_SOURCE, PhpParser.SemiColon,
undefined, undefined, undefined)
token.text = `${heredocIdentifier};\n`;
Copy link
Contributor

@kaby76 kaby76 Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line (token.text = '$........;\n';. This rewrite of the text makes it very confusing. In addition, just delete the line with token = new CommonToken(...); because it's not in the CSharp port.

token.text = `${heredocIdentifier};\n`;
} else {
token = super.nextToken()
token.text = `${heredocIdentifier}\n;`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, delete this line token.text = ...;. Same reason as above--very confusing. If anything, it should just set the text to ';'.

this._heredocIdentifier = token.text.slice(3).trim().replace(/\'$/, '');
}

if (token.type === PhpLexer.HereDocText) {
Copy link
Contributor

@kaby76 kaby76 Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really should be else if (token.type === PhpLexer.HereDocText) } because it does not follow exactly the switch statement in the CSharp port. Notice the "break" following the two cases.

. The TypeScript port code should have a one-to-one correspondence with the C# target.

@kaby76
Copy link
Contributor

kaby76 commented Mar 9, 2024

Note, the above changes takes care of a number of problems but the code still fails for a couple of tests in the examples/ directory. I'm working out why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants