Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use angular's parser to analyze expression instead of Regex #105

Open
yankee42 opened this issue Aug 1, 2015 · 2 comments
Open

Use angular's parser to analyze expression instead of Regex #105

yankee42 opened this issue Aug 1, 2015 · 2 comments

Comments

@yankee42
Copy link

yankee42 commented Aug 1, 2015

The extractor currently finds filter expressions using regex. This has several drawbacks:

  • It leads to false positive (e.g. {{"Hello"|lowercase}} {{"Second"|translate}} will cause Hello"|lowercase}} {{"Second to be found as translatable string
  • It does not recognize expressions expressions if they are slightly more complex. E.g. {{ foo ? ('foo'|translate) : ('bar'|translate)}} will go unnoticed
  • It is hard to read. The regex currently says new RegExp(start + '\\s*(\'|"|"|')(.*?)\\1\\s*\\|\\s*translate\\s*(' + end + '|\\|)', 'g'); which is hard to read even though it is still pretty simple. It will get much harder when it is extended with further options such as contexts and comments or when trying to do something about the previously mentioned problems.

Thus I explored other options and had a look at how angular is parsing these expressions. And they are using a small recursive descent parser with the manageable amount of 16 node types. Using their parser would put us in a state where there will be no trouble with comparing syntaxes between gettext's parsers and the original.

Unfortunately the angular guys did not seem to anticipate that someone else but them will use their parser. But with a little bit of hacking it is possible. I created a proof of concept here: https://github.com/yankee42/angular-gettext-tools/blob/extract-using-angular-ast/test/parse.js

It is actually quite easy except of the hacks I needed so that angular runs on nodejs without browser and the hack I needed to get hold of AST and Lexer. I am no nodejs expert so maybe someone can come up with a better idea? Possibly angular's parse.js file could be used stand alone, but this requires figuring out which files are all dependencies and need to be included as well. My current approach at least has the advantage that the parser can be kept up to date with npm.

If you think that this is interesting I'd try to actually integrate this into gettext.

@kvantetore
Copy link

I find this approach very interesting. The current regex approach is very limited, and most of the advantage of angular-gettext is the ability to automatically extract translation strings, so improving that aspect seems very important to me.

It's a pretty funky hack to get access to the angular parser. Glancing on parse.js, it looks like it's only dependent on $filter, and even that looks like its using $filter mostly to determine if an expression is constant or not (which we don't really need). So I guess it should be readily mockable.

@kvantetore
Copy link

In kvantetore/angular-gettext-tools@7e5e520, I've changed the parser from regex based to the angular expression parser based on the work by @yankee42. I can format a proper pull request if there is interest in folding into the mainline.

Instead of looking for interpolation expressions globally in the html template, the new approach is to parse the attributes and text-tags of each node separately. This has caused a minor regression in the test suite, namely in "extract/Extracts strings from non-delimited attribute", the line number of the second string is reported to be on line 3 instead of 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants