SGraph draft before Scheduler integration #88

reuterbal · 2023-05-22T23:38:28Z

I'm filing this in a very rough shape against the scheduler staging development branch because the list of changes is already growing significantly and I want to give you at least a fighting chance to work through them. There is currently also some duplication in the item.py file, while the current Scheduler infrastructure still exists, of which substantial amounts are going to become redundant.

I fully intending to clean this up in the subsequent dev-PRs against the staging branch. In particular, proper documentation needs to be added. The downside of this approach is that the final picture is not clearly visible, yet.

For context, the next steps would be:

to swap out the existing callgraph under the Scheduler, and replace this with an SGraph;
to add the SFilter view methodology for graph traversals;
to annotate Transformations with the relevant metadata for the SFilter traversals;

Design philosophy

Items

Dependencies are based on Item nodes, currently with the following node implementations:

FileItem: represents a Sourcefile object
ModuleItem: represents a Module
ProcedureItem: represents a Subroutine object (i.e. Fortran subroutine or function)
TypeDefItem: represents a TypeDef object
GlobalVariableItem: represents a variable declared in a module
ProcedureBindingItem: represents a procedure binding to a typedef
InterfaceItem: represents a Fortran interface declaration (not implemented, yet)

Each Item object can provide two key properties:

definitions: All the things, that a node can define and which other nodes can potentially depend on (i.e. procedures, type definitions etc). Currently, this makes sense only on FileItem and ModuleItem.
dependencies: All the things, that a node depends on (e.g. Import, CallStatement)

Importantly, since we now base everything on (incomplete) IR nodes from the beginning, both these properties return proper IR nodes rather than strings as before.

Note: ProcedureItem takes the place of the current SubroutineItem but I prefer the new name because it avoids the implicit link to the Fortran concept of subroutines.

Populating the “main tree”

Take a look at the test_item_graph test case, which illustrates the tree building without the container objects, which will then encapsulate this. It performs the following steps:

Start with the set of FileItem for every path in the search tree, only parsing ProgramUnitClass (Module or subroutine) nodes with the REGEX frontend, and fills an item_cache with those.
For every such FileItem it creates the “definition items”, i.e. the ModuleItem or ProcedureItem entries that are defined on the top level of the source file (notably, this excludes all the procedures declared inside modules). These provide our basic search space for the next step
We instantiate the DiGraph and add the item corresponding to the seed routine as a node and as the only entry in a “work queue”.
We run through the FIFO work queue, take out the top item, create the items corresponding to that item’s dependencies and add them to the graph as well as the queue.
Repeat until the queue is empty

In the actual Scheduler, it is intended that steps 1 and 2 are carried out by the Scheduler object, and step 3-5 is done in the SGraph class.

Incremental parsing while populating the tree

The above population method is almost identical to what we had before, but the important difference is under the hood: The “definitions” and “dependencies” on items are now formalised, and every Item class has three class attributes:

_parser_class: The parser class that the REGEX frontend has to recurse into, in order to match the relevant IR nodes that are represented by that Item (Example: TypeDefItem._parser_class is RegexParserClass.TypeDefClass)
_defines_items: A list of Item class names that the Item can define, e.g., FileItem._defines_items is (’ModuleItem’, ‘SubroutineItem’).
_depends_class: The parser class that the REGEX frontend has to recurse into, to be able to match the relevant dependencies of this Item (Example: ModuleItem._defines_items is ('ProcedureItem', 'TypeDefItem', 'GlobalVariableItem’), which are the three things that a module can depend on via USE statements).

Whenever definitions of an Item are queried, we first “concretise” that Item’s IR by calling make_complete on it, adding just the parser classes that the defined item types list as their _parser_class. For that, we store the previously used parser classes on the relevant IR objects (next to the existing _incomplete property), thus incrementally expanding the details of the file. Or not doing anything if the requested parser classes are already included.

Similarly when querying dependencies of an Item, we are concretising with the _depends_class of the Item itself.

All generated Item objects via either of this method are always cached in the item_cache, ready to be used again in the future instead of re-generating them. The factory for this functionality is the Item._create_from_ir method, where dependency nodes are translated to the relevant item names and then either picked up from the cache or newly instantiated.

Scheduler

The new Scheduler is then intended to work similar to before. This has not been implemented, yet, but should in principle be possible now using the new structure introduced in this PR, and will be covered in a subsequent PR:

"Discovery" takes only the minimum set of inventory for all files in the search tree, i.e., the top-level items in each source file: Modules or Subroutines - the latter if and only if (!) they are not wrapped in a module. This creates the initial item_cache that contains FileItem, ModuleItem and SubroutineItem nodes. This corresponds to step 1 and 2 above.
"Populate" starts from the seed nodes to populate the actual dependency graph (SGraph)
A full parse is triggered for the nodes in the graph
Graph traversals to apply transformations, using SFilter to prune the tree to the relevant item classes

github-actions · 2023-05-22T23:41:19Z

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/88/index.html

codecov-commenter · 2023-05-23T11:11:50Z

Codecov Report

❗ No coverage uploaded for pull request base (nabr-scheduler-refactoring@1e18246). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 64b6a04 differs from pull request most recent head 1f3fefb. Consider uploading reports for the commit 1f3fefb to get more accurate results

@@                      Coverage Diff                      @@
##             nabr-scheduler-refactoring      #88   +/-   ##
=============================================================
  Coverage                              ?   91.81%           
=============================================================
  Files                                 ?       86           
  Lines                                 ?    16049           
  Branches                              ?        0           
=============================================================
  Hits                                  ?    14736           
  Misses                                ?     1313           
  Partials                              ?        0

Flag	Coverage Δ
lint_rules	`97.36% <0.00%> (?)`
loki	`91.95% <0.00%> (?)`
transformations	`88.81% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

mlange05

Ok, first of all, apologies for the delay in reviewing this; this is obviously a huge amount of work and thought and the staging strategy is still much appreciated!

So, for the principle I clearly agree. The points raised about encoding the SGraph dependencies in a "skeleton" object that will let us filter the application space for transformations is obviously a huge improvement over the existing implementation.

I can't claim that I fully understood every detail of the new implementation, but the testing detail is thorough, as always, and I think the finer details will come out in the final integration stage. So for now, I think this is good to go for staging branch, and I'll fully stress-test this once it goes in, naturally.

reuterbal changed the title ~~Nabr sgraph~~ SGraph draft before Scheduler integration May 23, 2023

reuterbal force-pushed the nabr-scheduler-refactoring branch from 8c1923c to 1e18246 Compare August 2, 2023 13:59

reuterbal added 23 commits August 2, 2023 15:01

Remove typedefs from sourcefile definitions

54d28d2

Store parser_classes on program units and sourcefiles

ac2f2f8

WIP: Towards SGraph

4405f13

Expose defined IR nodes in Module and Subroutine

62b0e5c

Incremental parsing stubs via Item.definitions

fd94284

Initial sources for batch processing tests

315fd7b

Link items to parser classes and name dependencies/definitions

107d39f

Expose global variables declared in a module

d2bad75

Introduce an empty RegexParserClass

d0b2145

First draft of on-demand regex logic and graph build

65facf8

Homogenize access to typedefs and imports in program units

22426a8

regex: parse kind in declarations

cc844f7

On-demand graph building improved and more tests

df58f87

Add a rudimentary SGraph implementation

63394be

Refactored item creation

2507811

Enhance the SchedulerConfig with accessors

19305f2

Enhance Item creation with config and support for disable

1f6112a

Expand disable testing

7d187b0

Some documentation on new items

4b63d8a

Support for ignore and block in SGraph

92cff45

Fix typos and Linter warnings

7f59831

Resilience against items that cannot be found in non-strict discovery

6912641

Python 3.8 compatibility

1f3fefb

reuterbal force-pushed the nabr-sgraph branch from 64b6a04 to 1f3fefb Compare August 2, 2023 14:50

reuterbal marked this pull request as ready for review August 2, 2023 16:00

mlange05 approved these changes Aug 30, 2023

View reviewed changes

reuterbal merged commit efdf6c5 into nabr-scheduler-refactoring Sep 5, 2023
11 checks passed

reuterbal deleted the nabr-sgraph branch September 5, 2023 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGraph draft before Scheduler integration #88

SGraph draft before Scheduler integration #88

reuterbal commented May 22, 2023

github-actions bot commented May 22, 2023

codecov-commenter commented May 23, 2023 •

edited by codecov bot

Loading

mlange05 left a comment

SGraph draft before Scheduler integration #88

SGraph draft before Scheduler integration #88

Conversation

reuterbal commented May 22, 2023

Design philosophy

Items

Populating the “main tree”

Incremental parsing while populating the tree

Scheduler

github-actions bot commented May 22, 2023

codecov-commenter commented May 23, 2023 • edited by codecov bot Loading

Codecov Report

mlange05 left a comment

Choose a reason for hiding this comment

codecov-commenter commented May 23, 2023 •

edited by codecov bot

Loading