Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ir generation is too slow. #227

Closed
ecossevin opened this issue Feb 7, 2024 · 4 comments
Closed

Ir generation is too slow. #227

ecossevin opened this issue Feb 7, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@ecossevin
Copy link

When parsing apl_arpege_parallel.F90 (https://github.com/pmarguinaud/IAL/blob/openacccpglagstack84/arpifs/phys_dmn/apl_arpege_parallel.F90) with Sourcefile.from_file("apl_arpege_parallel.F90)", it takes about 50 seconds, which is far too long for what I'm doing.

Is there any way of running it faster please?

@mlange05
Copy link
Collaborator

mlange05 commented Feb 9, 2024

Hi @ecossevin this one looks actually pretty loaded, so let me start with a few general comments:

  • The file you link is not actually publicly accessible, and in general we prefer neutral examples, not stuff directly ripped out of large code bases and in private repos. I can see why this particular case it might be different though. 😉
  • I got a separate version of this file internally, and it is ~7Kloc of auto-generated code! 😮 So for the record, Loki is not originally built for single-file processing speed (python and all that) and I'm not surprised this is slow. It is generally recommended to start from unprocessed files of a reasonable size!

Nevertheless, the ratio fparser's AST processing to Loki's IR generation seems off (I get a total of ~68s, with 8s for FP AST and 60s for Loki IR), which does indeed hint at a severe performance problem in our frontend (usually I'd expect roughly a 1-to-1 ratio). I've tracked this down to some very inefficient searches in some utility passes and a sub-optimal feature in our Transformer base class, but an immediate quick-fix had some bad fallout. I'll have another go at this soon (time permitting), but in the meantime please be patient as this one seems to sit deep.

So, many thanks for flagging this and please hold tight - we were aware of some sub-optimalities from the early days, but I wasn't aware how bad this one was. 🙏

@mlange05 mlange05 self-assigned this Feb 9, 2024
@ecossevin
Copy link
Author

Hi,
Thanks a lot for your answer.

@mlange05
Copy link
Collaborator

Hi, so after some digging I've managed to drop the frontend processing times for this example quite a bit - for a detailed description of the problem please see PR #229 . Taking a slightly different version of the source file linked in the issue description, I get the following timings:

=====  Loki main  ======
In [1]: from loki import Sourcefile, config_override

In [2]: with config_override({'log-level': 'perf'}):
   ...:     Sourcefile.from_file("apl_arpege_parallel.F90")
   ...: 
[Loki::Frontend] Executed sanitize_input in 0.12s
[Loki::Sourcefile] Constructed from apl_arpege_parallel.F90 in 68.48s

vs.

====  Loki naml-frontend-optimisations  =====
In [1]: from loki import Sourcefile, config_override

In [2]: with config_override({'log-level': 'perf'}):
   ...:     Sourcefile.from_file("apl_arpege_parallel.F90")
   ...: 
[Loki::Frontend] Executed sanitize_input in 0.12s
[Loki::FP] Executed parse_fparser_source in 7.60s
[Loki::Frontend] Executed sanitize_ir in 0.26s
[Loki::Frontend] Executed sanitize_ir in 6.07s
[Loki::Frontend] Executed sanitize_ir in 1.00s
[Loki::FP] Executed parse_fparser_ast in 20.11s
[Loki::Sourcefile] Constructed from apl_arpege_parallel.F90 in 27.85s

There's probably a little more that can be done for the ~6s call to sanitize_ir - it still uses two unoptimised search-and-replace utilities, but it was ~45s before the improvements in #229 . Certain overhead from transforming an AST of that size is also expected, but this should alleviate the problem a little for now.

@mlange05
Copy link
Collaborator

mlange05 commented Mar 19, 2024

Ok, following the merges of PRs #241 #242 and #245 we now have the FP-frontend parsing time for this pathological file down to this:

In [3]: with config_override({'log-level': 'perf'}):
   ...:     Sourcefile.from_file("apl_arpege_parallel.F90")
   ...: 
[Loki::Frontend] Executed sanitize_input in 0.12s
[Loki::FP] Executed parse_fparser_source in 7.67s
[Loki::Frontend] Executed sanitize_ir in 0.11s
[Loki::Frontend] Executed sanitize_ir in 2.73s
[Loki::Frontend] Executed sanitize_ir in 0.48s
[Loki::FP] Executed parse_fparser_ast in 5.40s
[Loki::Sourcefile] Constructed from apl_arpege_parallel.F90 in 13.23s

This might not fix the "many small, unscoped parses" issue (discussed offline), but I think this is enough for this one to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants