-
Notifications
You must be signed in to change notification settings - Fork 24
Planning dynamic indexing implementation
The idea here is to support run-time determination of descendant nodes but (at first) do so for a static graph. This will allow benchmarking of the computational cost. For dynamic dependencies, we will also need to do topological sorting at run-time, and we can add that step later. If we get getDependencies
working, I think it will be simple to add a topological sorting step with perhaps just one additional function call. (CJP: I think the conclusion is that the last two sentences should be removed.)
-
Seeing what goes into C++ getDependencies from R currently: When we do
model$getDependencies
in R, the call to C++ comes from here which goes through here. -
Seeing the C++
nimbleGraph
andgraphNode
classes: In C++, the relevant classes are in include/nimble/nimbleGraph.h and include/CppCode/nimbleGraph.cpp -
Seeing where the
getDependencies
is called from inside C++ when called from R: The SEXP entry point is here. This unpacks animbleGraph*
(passed in a SEXP external pointer) and calls itsgetDependencies
method. -
Seeing the C++ representation of a "nodeFunctionVector:" The C++ "nodeFunctionVector" class is
NodeVectorClassNew
. (There is an old version lying around the code that should probably be cleaned up). -
Seeing how a
nodeVectorClassNew
object is used bycalculate()
: A function likecalculate
iterates through thevector<oneNodeUseInfo> useInfoVec
. EachoneNodeUseInfo
has anodeFun*
and auseInfoForIndexedNodeInfo
. This last class contains only avector<int>
, but it is set up as a class so it is easier to put additional content into it if we need to in the future. If we use the idea of wrapping a nodeFunctionVector in a "getNodes
" function, we won't need to modifycalculate
etc. -
Seeing where a
nodeFunctionVector
is created in R. This is normally done bynewSetupCode
generated from keyword processing ofcalculate
or one of the other model methods. The keyword processor uses thenodeFunctionVector_SetupTemplate
, which shows it will create a call tonimble:::nodeFunctionVector
. If you want to actually see thenewSetupCode
, here is an example of how to do so:
library(nimble)
m <- nimbleModel( nimbleCode({a ~ dnorm(0,1)}))
nf <- nimbleFunction(
setup = function(model, node) {},
run = function() {
ans <- model$calculate(node)
return(ans)
returnType(double())
}
)
nf1 <- nf(m, 'a')
compiled <- compileNimble(m, nf1)
getNimbleProject(compiled$nf1)$getNimbleFunctionNFproc(compiled$nf1)$newSetupCode
which shows this line of newSetupCode
:
model_node_nodeFxnVector_includeData_TRUE_SU <- nimble:::nodeFunctionVector(model = model,
nodeNames = node, excludeData = FALSE, sortUnique = TRUE,
errorContext = "calculate(model = model, nodes = node)")
-
Seeing where the contents of a C++
nodeFunctionVectorNew
are populated from R duringcompileNimble
: This is done bypopulateNodeFxnVecNew
There are two places from which this is called, corresponding to our "full interfaces" and "multi-interfaces":copyFromRObjectViaActiveBindings
(which for anodeFxnVec
does not actually use active bindings, despite the function name); andcopyFromRobject
-
Seeing the mapping from BUGS declaration IDs to nodeFunction pointers: For convenience when populating C++ objects from R, early on we create a C++ object that stores the nodeFunction pointer for every BUGS declaration ID. Then when we need them later we can easily get them. An example of using this object is here. That external pointer SEXP may be needed in the suggested plan below. (For future reference, it is populated here )
I'm sure this won't be all right, but here is a try at outlining the steps.
-
Make a version of
model$getDependencies
that returns an object annotated to need run-time dependencies. Since the current return object is a character vector, I think it will be easiest to simply add an attribute to it. Making a new class for it would require modification at every step where it is used, and that could become a huge pain since we sometimes do expect it to behave as a character vector. [CJP: could the new class behave such that trying to get its value as if it were a character vector behaves as it currently does, but potentially via more complicated processing? Sort of like the indexVar idea for triggering run-time graph updates...] -
Make the keyword processor for
calculate
(assuming that is the trial function for now) change from substituting oncalculate(nodeFxnVector = NODEFUNVEC_NAME)
to substituting oncalculate(nodeFxnVector = getNodes(NODEFUNVEC_NAME))
. -
For R execution: write
getNodes(nodes)
so ifnodes
is annotated for run-time dependencies, it does that step and returns the dependencies. Otherwise it just returnsnodes
as is. -
Modify C++
NodeVectorClassNew
to have a new variable flagging whether it requires run-time dependency lookup. It probably also needs to have animbleGraph*
, unless we plumb that information in via another route. And it probably needs to have access to the.nodeFxnPointers_byDeclID
(or equivalent) in order to convert BUGS declaration IDs into nodeFunction pointers. -
Modify R
populateNodeFxnVecNew
and the C++ SEXP function it calls to populate additional new content of theNodeVectorClassNew
when necessary. Note that inpopulateNodeFxnVecNew
, theRobject
will be whatever was returned fromnimble:::nodeFunctionVector
(in thenewSetupCode
constructed from the setup template during keyword processing). -
Write a C++ version of
getNodes(nodeFxnVec)
that either returnsnodeFxnVec
as is (if run-time dependency is not needed) or uses thenimbleGraph
to obtain run-time dependencies as needed. -
Either in C++
getNodes
or a helper function, rearrange the results ofnimbleGraph::getDependencies
into thenodeFxnVec
member data, similarly to how it comes out viapopulateNodeFxnVecNew
. This will require looking up the nodeFunction from BUGS declaration IDs using the "NumberedObjects" object pointed to by.nodeFxnPointers_byDeclID
in R.
(TheNumberedObjects
class is an abstraction that didn't really take off and so may look silly.) Also beware of 1-based vs. 0-based indexing. When populating aNodeVectorClassNew
from R, we go throughpopulateNodeFxnVectorNew_byDeclID
, which subtracts 1s from indices.
- Move C++ version of
topologicalSort
into working branch and get it called at run-time as well. (Per Slack discussion, this probably won't be necessary, since we can do topologicalSort at compile-time.)
-
make
indexVar
class that contains:- value
- assignment operator that also triggers update of graph
- graphID whose value this is
- etc.
-
write graph update function that takes dynamic index node, and updates parent info for its indexed dependents and updates child info of the relevant parents
-
create
getIndices
analog togetParam
to compute current values of indices; this will allow recomputation of edges when a node involved in a stochastic index changes
-
The monster function, misnamed, that does a huge amount of graph construction steps and is crying to be refactored, is
modelDefClass::genExpandedNodeAndParentNames3
. -
I suggest we get scalar dynamic indexing working before expanding to cases like
mu[static_i, foo(stochastic_k[i])]
ormu[ stochastic_k[i], stochastic_j[i] ]
. (CJP changed:
to,
as I believe that is what PdV intended)
-
Modify
getSymbolicParentNodes
so that if we havey[i] ~ dnorm(mu[ foo(k[i]) ], 1)
on the RHS, withk[i]
non-constant, theBUGSdeclClass
objects in thedeclInfo
list will have:-
symbolicParentNodes
will includemu[NA]
andUSED_IN_INDEX_(k[i])
(or similar idea). -
parentIndexNamePieces
has some suitable entry formu[NA]
, maybelist(NA)
.
-
-
Update uses of
symbolicParentNodes
andparentIndexNamePieces
(e.g. for determining variables and their declared ranges) to not break for entries likemu[NA]
. -
At some step in
genExpandedNodeAndParentNames3
, create a vector or "vars" environment to recordusedInDynamicIndex
status (TRUE | FALSE) and strip the "USED_IN_INDEX_" fromsymbolicParentNodes
entries. (CJP: expecting this to be straightforward but not yet done) -
(CJP) I think we may want an object within each declInfo that indicates the nodes that stochastically determine the indexing of any parameters (somewhat analogous to parentIndexNamePieces) -- this will allow us to know that a node's parents for a given parameter are determined by stochastic indexing. But it may be that simply having mu[i] depend on k[i] (with k[i] flagged as a stochastic index) and having getIndex() will allow us to do the edge determination we will need to do when we get dependencies.
-
At some step in
genExpandedNodeAndParentNames3
, create vertices likemu_UNKNOWN_INDEX_
for nodes that appear anywhere withNA
in asymbolicParentNodes
entry. -
Add collection of edges from each
mu[i]
tomu_UNKNOWN_INDEX_
and frommu_UNKNOWN_INDEX
to each 'y[i]`. -
Determine where
mu_UNKOWN_INDEX_
fits in the graph if there are also split vertices formu
. I think it would be a descendent of any split vertices. (CJP note - yes that is how I've implemented it) -
We discussed not lifting expressions of dynamic indices to such that don't need to worry about deterministic dependencies when recalculate graph upon change to the relevant node values
-
(General opportunity for BUGS model processing) Write unit tests of individual steps, not just as we have of at entire model at once. This will help us keep clear in future on relevant use cases.
-
save dependencies when no stochastic indices are present and simply return them
-
save fixed dependencies and don't recompute them; simply concatenate them with the recomputed dependencies
-
explore possibility of checking in the dependencies object if graph has been recomputed; if not simply return cached dependencies; this might rely on having a running counter in the graph that indicates the present state of the graph; a dependencies object can store what it thinks is the current state and only recalculate if the states are discordant