Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SGraph draft before Scheduler integration #88

Merged
merged 23 commits into from
Sep 5, 2023

Conversation

reuterbal
Copy link
Collaborator

I'm filing this in a very rough shape against the scheduler staging development branch because the list of changes is already growing significantly and I want to give you at least a fighting chance to work through them. There is currently also some duplication in the item.py file, while the current Scheduler infrastructure still exists, of which substantial amounts are going to become redundant.

I fully intending to clean this up in the subsequent dev-PRs against the staging branch. In particular, proper documentation needs to be added. The downside of this approach is that the final picture is not clearly visible, yet.

For context, the next steps would be:

  1. to swap out the existing callgraph under the Scheduler, and replace this with an SGraph;
  2. to add the SFilter view methodology for graph traversals;
  3. to annotate Transformations with the relevant metadata for the SFilter traversals;

Design philosophy

Items

Dependencies are based on Item nodes, currently with the following node implementations:

  • FileItem: represents a Sourcefile object
  • ModuleItem: represents a Module
  • ProcedureItem: represents a Subroutine object (i.e. Fortran subroutine or function)
  • TypeDefItem: represents a TypeDef object
  • GlobalVariableItem: represents a variable declared in a module
  • ProcedureBindingItem: represents a procedure binding to a typedef
  • InterfaceItem: represents a Fortran interface declaration (not implemented, yet)

Each Item object can provide two key properties:

  1. definitions: All the things, that a node can define and which other nodes can potentially depend on (i.e. procedures, type definitions etc). Currently, this makes sense only on FileItem and ModuleItem.
  2. dependencies: All the things, that a node depends on (e.g. Import, CallStatement)

Importantly, since we now base everything on (incomplete) IR nodes from the beginning, both these properties return proper IR nodes rather than strings as before.

Note: ProcedureItem takes the place of the current SubroutineItem but I prefer the new name because it avoids the implicit link to the Fortran concept of subroutines.

Populating the “main tree”

Take a look at the test_item_graph test case, which illustrates the tree building without the container objects, which will then encapsulate this. It performs the following steps:

  1. Start with the set of FileItem for every path in the search tree, only parsing ProgramUnitClass (Module or subroutine) nodes with the REGEX frontend, and fills an item_cache with those.
  2. For every such FileItem it creates the “definition items”, i.e. the ModuleItem or ProcedureItem entries that are defined on the top level of the source file (notably, this excludes all the procedures declared inside modules). These provide our basic search space for the next step
  3. We instantiate the DiGraph and add the item corresponding to the seed routine as a node and as the only entry in a “work queue”.
  4. We run through the FIFO work queue, take out the top item, create the items corresponding to that item’s dependencies and add them to the graph as well as the queue.
  5. Repeat until the queue is empty

In the actual Scheduler, it is intended that steps 1 and 2 are carried out by the Scheduler object, and step 3-5 is done in the SGraph class.

Incremental parsing while populating the tree

The above population method is almost identical to what we had before, but the important difference is under the hood: The “definitions” and “dependencies” on items are now formalised, and every Item class has three class attributes:

  • _parser_class: The parser class that the REGEX frontend has to recurse into, in order to match the relevant IR nodes that are represented by that Item (Example: TypeDefItem._parser_class is RegexParserClass.TypeDefClass)
  • _defines_items: A list of Item class names that the Item can define, e.g., FileItem._defines_items is (’ModuleItem’, ‘SubroutineItem’).
  • _depends_class: The parser class that the REGEX frontend has to recurse into, to be able to match the relevant dependencies of this Item (Example: ModuleItem._defines_items is ('ProcedureItem', 'TypeDefItem', 'GlobalVariableItem’), which are the three things that a module can depend on via USE statements).

Whenever definitions of an Item are queried, we first “concretise” that Item’s IR by calling make_complete on it, adding just the parser classes that the defined item types list as their _parser_class. For that, we store the previously used parser classes on the relevant IR objects (next to the existing _incomplete property), thus incrementally expanding the details of the file. Or not doing anything if the requested parser classes are already included.

Similarly when querying dependencies of an Item, we are concretising with the _depends_class of the Item itself.

All generated Item objects via either of this method are always cached in the item_cache, ready to be used again in the future instead of re-generating them. The factory for this functionality is the Item._create_from_ir method, where dependency nodes are translated to the relevant item names and then either picked up from the cache or newly instantiated.

Scheduler

The new Scheduler is then intended to work similar to before. This has not been implemented, yet, but should in principle be possible now using the new structure introduced in this PR, and will be covered in a subsequent PR:

  1. "Discovery" takes only the minimum set of inventory for all files in the search tree, i.e., the top-level items in each source file: Modules or Subroutines - the latter if and only if (!) they are not wrapped in a module. This creates the initial item_cache that contains FileItem, ModuleItem and SubroutineItem nodes. This corresponds to step 1 and 2 above.
  2. "Populate" starts from the seed nodes to populate the actual dependency graph (SGraph)
  3. A full parse is triggered for the nodes in the graph
  4. Graph traversals to apply transformations, using SFilter to prune the tree to the relevant item classes

@github-actions
Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/88/index.html

@reuterbal reuterbal changed the title Nabr sgraph SGraph draft before Scheduler integration May 23, 2023
@codecov-commenter
Copy link

codecov-commenter commented May 23, 2023

Codecov Report

❗ No coverage uploaded for pull request base (nabr-scheduler-refactoring@1e18246). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 64b6a04 differs from pull request most recent head 1f3fefb. Consider uploading reports for the commit 1f3fefb to get more accurate results

@@                      Coverage Diff                      @@
##             nabr-scheduler-refactoring      #88   +/-   ##
=============================================================
  Coverage                              ?   91.81%           
=============================================================
  Files                                 ?       86           
  Lines                                 ?    16049           
  Branches                              ?        0           
=============================================================
  Hits                                  ?    14736           
  Misses                                ?     1313           
  Partials                              ?        0           
Flag Coverage Δ
lint_rules 97.36% <0.00%> (?)
loki 91.95% <0.00%> (?)
transformations 88.81% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

@reuterbal reuterbal marked this pull request as ready for review August 2, 2023 16:00
Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, first of all, apologies for the delay in reviewing this; this is obviously a huge amount of work and thought and the staging strategy is still much appreciated!

So, for the principle I clearly agree. The points raised about encoding the SGraph dependencies in a "skeleton" object that will let us filter the application space for transformations is obviously a huge improvement over the existing implementation.

I can't claim that I fully understood every detail of the new implementation, but the testing detail is thorough, as always, and I think the finer details will come out in the final integration stage. So for now, I think this is good to go for staging branch, and I'll fully stress-test this once it goes in, naturally.

@reuterbal reuterbal merged commit efdf6c5 into nabr-scheduler-refactoring Sep 5, 2023
11 checks passed
@reuterbal reuterbal deleted the nabr-sgraph branch September 5, 2023 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants