Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki expression parser based on pymbolic parser #272

Merged
merged 3 commits into from
Apr 18, 2024

Conversation

MichaelSt98
Copy link
Collaborator

@mlange05 and @reuterbal : could you please have a look so we can have a discussion about whether this is the best approach to parse expressions from strings?

As parse_fparser_expression is not always usable/sufficient, this is a demonstrator for a possible parser based on pymbolic's parser. For example process_dimension_pragmas is used within fparser and thus can't utilise parse_fparser_expression. With trivial dimension pragmas and/or if those dimensions read from the pragmas are not further transformed, this is not a problem. However, for more complex dimension pragmas like !$loki dimension(A,0:B/2,C,D) this becomes a problem ...

>>> from loki.expression.parser import loki_parse
>>> expr = loki_parse('((a + b)/(a - b))**3 + 3.1415')
>>> expr
Sum((Power(Quotient(Sum((..., ...)), Sum((..., ...))), 3), FloatLiteral('3.1415', None)))
>>> type(expr)
<class 'loki.expression.symbols.Sum'>
>>> [type(_) for _ in expr.children]
[<class 'loki.expression.symbols.Power'>, <class 'loki.expression.symbols.FloatLiteral'>]

>>> expr = loki_parse('a:b')
>>> expr
RangeIndex((Variable('a'), Variable('b'), None))
>>> type(expr)
<class 'loki.expression.symbols.RangeIndex'>

It is also possible to pass a scope expr = loki_parse('((a + b)/(a - b))**3 + 3.1415', scope=routine).

More examples in tests/test_expression.py.

Copy link

github-actions bot commented Apr 4, 2024

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/272/index.html

@mlange05
Copy link
Collaborator

mlange05 commented Apr 4, 2024

Ok, I only had a brief look, but the overall direction of this is fantastic! The use of the FParser as a stand-in string/expression parser was always born our of necessity and a pure expression parser for Loki-expressions like this is a fantastic idea, if you ask me. Go for it!

@MichaelSt98 MichaelSt98 force-pushed the nams_loki_string_parser branch 2 times, most recently from 013819d to db84b1d Compare April 8, 2024 10:09
@reuterbal
Copy link
Collaborator

100% agreed with Michael's comment. This is great and very much looking forward to bringing this in! The implementation looks sensible so far, so very happy if you manage to finish this.

@MichaelSt98 MichaelSt98 marked this pull request as ready for review April 15, 2024 11:23
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated offline, this is a truly impressive piece of work and fantastic addition to Loki - not only unlocking the use of expressions in pragma annotations but also providing the tools for an entirely new way of creating expressions in transformations.

Overall I am very happy with how this looks and I have tried to get a rough understanding how it works but can't possibly claim to have really understood it in detail.

I flagged a few places where I think the implementation could be improved or where I see limitations. Some of these we have already discussed offline. There's also a few suggestions how to improve the documentation, which I think would be a very valuable thing to do.

Where I have flagged limitations, you can also decide to defer the fix to a later time and record it in an issue for now. I suspect we may find a few more oddities once we start using this more, and expect there will be a need for some follow-on PRs.

Pylint has also found a cyclic import in test_build:
R0401: Cyclic import (loki.expression -> loki.expression.parser) (cyclic-import)



# Fortran intrinsic procedures
intrinsic_procedures = ['DIM', 'SQRT', 'ADJUSTR', 'DATAN2', 'IEEE_SUPPORT_FLAG', 'MAXVAL', 'MAXVAL',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out offline: We could obtain the list of intrinsic names from fparser instead of maintaining our own, e.g. already used here:

try:
from fparser.two.Fortran2003 import Intrinsic_Name
_intrinsic_fortran_names = Intrinsic_Name.function_names
except ImportError:
_intrinsic_fortran_names = ()

We could consider making this a utility/constant of it's own, to avoid repeating this pattern too often.

return super().map_call(expr)


class LokiParser(ParserBase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest calling this ExpressionParser, as it might be misunderstood to be a full Fortran parser.


class LokiEvaluationMapper(EvaluationMapper):
"""
A mapper for evaluating expressions, based on pymbolic's `EvaluationMapper`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: you can directly link into the pymbolic documentation using :any:`pymbolic.mapper.evaluator.EvaluationMapper` . This is used, e.g., here:

node : :any:`Node` or :any:`pymbolic.primitives.Expression`

producing the output https://sites.ecmwf.int/docs/loki/main/loki.batch.item.html#loki.batch.item.ItemFactory.create_from_ir


class LokiParser(ParserBase):
"""
String Parser based on [pymbolic](https://github.com/inducer/pymbolic) 's parser.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, Sphinx understands only RST markup...

Suggested change
String Parser based on [pymbolic](https://github.com/inducer/pymbolic) 's parser.
String Parser based on `pymbolic's <https://github.com/inducer/pymbolic>`_ parser.


The Loki String Parser utilises and extends pymbolic's parser to incorporate
Fortran specific syntax and to map pymbolic expressions to Loki expressions, utilising
the mapper ``PymbolicMapper``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the mapper ``PymbolicMapper``.
the mapper :any:`PymbolicMapper`.

loki/expression/parser.py Show resolved Hide resolved
return sym.StringLiteral(s)


loki_parse = LokiParser()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest calling this parse_expr.

Please also add a docstring to this, e.g.,

Suggested change
loki_parse = LokiParser()
parse_expr = LokiParser()
"""
An instance of :any:`ExpressionParser` that allows parsing expression strings into a Loki expression tree.
See :any:`ExpressionParser.__call__` for a description of the available arguments.
"""

scope : :any:`Scope`
The scope to which symbol names inside the expression belong
min_precedence : int, optional
Minimum precedence
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a little on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any use case right now to not pass min_precedence=0 which is the default. Therefore, I'll just delete this option?!


.. note::
**Example:**
Reference a declaration node that contains variable "a"
Copy link
Collaborator

@reuterbal reuterbal Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a little out of sync with the example provided below. Generally, I think it would be helpful to add additional examples with variables and functions for evaluation, and the use of a scope.

Comment on lines 285 to 282
(_f_float, ("|", pytools.lex.RE(r"[0-9]+\.[0-9]*([eEdD][+-]?[0-9]+)?(_[a-zA-Z]*)", re.IGNORECASE))),
(_f_int, pytools.lex.RE(r"[0-9]+?(_[a-zA-Z]*)", re.IGNORECASE)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several limitations of these two patterns (depending how extensive you would like the support to be.

  • Currently it mandates a kind specifier (but only the underscore, not the actual specifier), I suppose we wouldn't want to enforce this?
  • Also note that kind specifiers can also include numbers or underscores (since it's a parameter name) and can even be only a number:
$ cat test_kind.F90
program main
implicit none
integer, parameter :: my_kind = selected_int_kind(5)
integer(kind=selected_int_kind(3)) :: i
i = 1_4 + 1_my_kind
print *,'i=',i
end program
$ gfortran test_kind.F90
$ ./a.out
 i=      2
  • Real literals can could in principle also be specified in scientific notation, e.g. 1e-8.

I leave it up to you what level of support you want to provide here, but just as an inspiration, the pattern that fparser uses for signed real literals:

>>> from fparser.two.pattern_tools import signed_real_literal_constant
>>> print(signed_real_literal_constant.pattern)
([+-])?\s*((\d+\s*[.]\s*(\d+)?|[.]\s*\d+)\s*([ED]\s*([+-])?\s*\d+)?\s*(_\s*(\d+|[A-Z][\w$]*))?|\d+\s*[ED]\s*([+-])?\s*\d+\s*(_\s*(\d+|[A-Z][\w$]*))?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice ... I suggest to allow for now: 2.4_jprb, 2._8, 2.4e18_my_kind8 and add corresponding tests.

@reuterbal reuterbal changed the title POC: Loki string parser based on pymbolic parser Loki expression parser based on pymbolic parser Apr 16, 2024
Loki string parser: correct ordering of statements, further improved functionality and testing

Load fortran intrinsic procedure names from FParser and expose via global var 'FORTRAN_INTRINSIC_PROCEDURES'

Renaming, improving documentation, improving fortran style numerical literals, use FParser fortran intrinsic procedure names
Copy link

codecov bot commented Apr 16, 2024

Codecov Report

Attention: Patch coverage is 96.29630% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 94.94%. Comparing base (3e555d5) to head (44bccfd).

Files Patch % Lines
loki/expression/parser.py 92.77% 18 Missing ⚠️
loki/expression/tests/test_expression.py 98.57% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #272      +/-   ##
==========================================
+ Coverage   94.92%   94.94%   +0.02%     
==========================================
  Files         152      153       +1     
  Lines       31421    32008     +587     
==========================================
+ Hits        29827    30391     +564     
- Misses       1594     1617      +23     
Flag Coverage Δ
lint_rules 96.39% <ø> (ø)
loki 95.12% <96.29%> (+0.01%) ⬆️
transformations 92.11% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - this looks extremely impressive and super useful! I have nothing to add but a big thank you!

GTG from me. 🙌 🙏 :shipit:

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks, this looks very good now. A small comment on how to better parametrize the tests, you can fix this here or include in a follow-up PR if you prefer.

@@ -1030,12 +1033,17 @@ def test_subexpression_match(expr, string, ref):
('5 + (4 + 3) - (2*1)', '5 + (4 + 3) - (2*1)'),
('a*(b*(c+(d+e)))', 'a*(b*(c + (d + e)))'),
])
def test_parse_fparser_expression(source, ref):
@pytest.mark.parametrize('parse', (parse_expr, parse_fparser_expression))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotating the parametrization with a condition like below makes the conditional in l.1042-1045 redundant

Suggested change
@pytest.mark.parametrize('parse', (parse_expr, parse_fparser_expression))
@pytest.mark.parametrize('parse', (
parse_expr,
pytest.param(parse_fparser_expression, marks=pytest.mark.skipif(not HAVE_FP))
))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew there was a better way to do this ... Thanks. I'll fix this in a follow-up PR already being in preparation.

@reuterbal reuterbal added the ready for merge This PR has been approved and is ready to be merged label Apr 18, 2024
@reuterbal reuterbal merged commit 6311d46 into main Apr 18, 2024
12 checks passed
@reuterbal reuterbal deleted the nams_loki_string_parser branch April 18, 2024 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for merge This PR has been approved and is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants