-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SGraph draft before Scheduler integration #88
Conversation
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/88/index.html |
Codecov Report
@@ Coverage Diff @@
## nabr-scheduler-refactoring #88 +/- ##
=============================================================
Coverage ? 91.81%
=============================================================
Files ? 86
Lines ? 16049
Branches ? 0
=============================================================
Hits ? 14736
Misses ? 1313
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. |
8c1923c
to
1e18246
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, first of all, apologies for the delay in reviewing this; this is obviously a huge amount of work and thought and the staging strategy is still much appreciated!
So, for the principle I clearly agree. The points raised about encoding the SGraph dependencies in a "skeleton" object that will let us filter the application space for transformations is obviously a huge improvement over the existing implementation.
I can't claim that I fully understood every detail of the new implementation, but the testing detail is thorough, as always, and I think the finer details will come out in the final integration stage. So for now, I think this is good to go for staging branch, and I'll fully stress-test this once it goes in, naturally.
I'm filing this in a very rough shape against the scheduler staging development branch because the list of changes is already growing significantly and I want to give you at least a fighting chance to work through them. There is currently also some duplication in the
item.py
file, while the current Scheduler infrastructure still exists, of which substantial amounts are going to become redundant.I fully intending to clean this up in the subsequent dev-PRs against the staging branch. In particular, proper documentation needs to be added. The downside of this approach is that the final picture is not clearly visible, yet.
For context, the next steps would be:
Design philosophy
Items
Dependencies are based on
Item
nodes, currently with the following node implementations:FileItem
: represents aSourcefile
objectModuleItem
: represents aModule
ProcedureItem
: represents aSubroutine
object (i.e. Fortran subroutine or function)TypeDefItem
: represents aTypeDef
objectGlobalVariableItem
: represents a variable declared in a moduleProcedureBindingItem
: represents a procedure binding to a typedefInterfaceItem
: represents a Fortran interface declaration (not implemented, yet)Each
Item
object can provide two key properties:definitions
: All the things, that a node can define and which other nodes can potentially depend on (i.e. procedures, type definitions etc). Currently, this makes sense only onFileItem
andModuleItem
.dependencies
: All the things, that a node depends on (e.g.Import
,CallStatement
)Importantly, since we now base everything on (incomplete) IR nodes from the beginning, both these properties return proper IR nodes rather than strings as before.
Note:
ProcedureItem
takes the place of the currentSubroutineItem
but I prefer the new name because it avoids the implicit link to the Fortran concept of subroutines.Populating the “main tree”
Take a look at the
test_item_graph
test case, which illustrates the tree building without the container objects, which will then encapsulate this. It performs the following steps:FileItem
for every path in the search tree, only parsingProgramUnitClass
(Module or subroutine) nodes with theREGEX
frontend, and fills anitem_cache
with those.FileItem
it creates the “definition items”, i.e. theModuleItem
orProcedureItem
entries that are defined on the top level of the source file (notably, this excludes all the procedures declared inside modules). These provide our basic search space for the next stepDiGraph
and add the item corresponding to the seed routine as a node and as the only entry in a “work queue”.In the actual Scheduler, it is intended that steps 1 and 2 are carried out by the
Scheduler
object, and step 3-5 is done in theSGraph
class.Incremental parsing while populating the tree
The above population method is almost identical to what we had before, but the important difference is under the hood: The “definitions” and “dependencies” on items are now formalised, and every
Item
class has three class attributes:_parser_class
: The parser class that the REGEX frontend has to recurse into, in order to match the relevant IR nodes that are represented by thatItem
(Example:TypeDefItem._parser_class
isRegexParserClass.TypeDefClass
)_defines_items
: A list of Item class names that theItem
can define, e.g.,FileItem._defines_items
is(’ModuleItem’, ‘SubroutineItem’)
._depends_class
: The parser class that the REGEX frontend has to recurse into, to be able to match the relevant dependencies of thisItem
(Example:ModuleItem._defines_items
is('ProcedureItem', 'TypeDefItem', 'GlobalVariableItem’)
, which are the three things that a module can depend on viaUSE
statements).Whenever
definitions
of anItem
are queried, we first “concretise” that Item’s IR by callingmake_complete
on it, adding just the parser classes that the defined item types list as their_parser_class
. For that, we store the previously used parser classes on the relevant IR objects (next to the existing_incomplete
property), thus incrementally expanding the details of the file. Or not doing anything if the requested parser classes are already included.Similarly when querying
dependencies
of anItem
, we are concretising with the_depends_class
of theItem
itself.All generated
Item
objects via either of this method are always cached in theitem_cache
, ready to be used again in the future instead of re-generating them. The factory for this functionality is theItem._create_from_ir
method, where dependency nodes are translated to the relevant item names and then either picked up from the cache or newly instantiated.Scheduler
The new Scheduler is then intended to work similar to before. This has not been implemented, yet, but should in principle be possible now using the new structure introduced in this PR, and will be covered in a subsequent PR:
item_cache
that containsFileItem
,ModuleItem
andSubroutineItem
nodes. This corresponds to step 1 and 2 above.SFilter
to prune the tree to the relevant item classes