Skip to content

Design of type declarations and object creation

Christopher Paciorek edited this page Nov 20, 2016 · 19 revisions

We have had an ongoing and evolving discussion about how to declare and create objects. These issues get pretty sandy.

Places in code where this is relevant:

  • run-code argument type declarations, currently of the form x = double(2) for a matrix, etc.
  • returnType statement, which uses same format as argument type declarations.
  • numeric object creation via x <- numeric(2, value = 1), along with matrix, array and integer, where that "2" is for length, not number of dimensions.
  • in the near future: declaration of nimbleList elements, e.g. myListDef <- nimbleList(A = double(2), B = integer()) or some other system TBD.

Issues / concerns / problems with the current system:

  • Syntax used in type declarations, "integer", "double", and "logical" aren't used in the same way as the same-named functions in R.
  • Different systems for declaration and creation, like "double(2)" for a declaration but "matrix(9, nrow = 3)" for creation.
  • Argument semantics that look like R default values but in nimbleFunction run-code are really type declarations.

Possible solutions

Option 1: A function-like syntax for type declaration

This would look like

  • nimbleFunction( run = function(x = nimType(numeric(2))) {...})

or something like that, with the syntax within the nimType() statement TBD.

Pros:

  • It makes type declaration more explicit and hence unmistakably distinct from object creation.

Cons:

  • It seems cumbersome.
  • We'd need a discussion similar to the present one about what goes inside nimType().
  • It's still inconsistent with default-argument syntax in R.

Option 2: Character strings.

This would look like

  • nimbleFunction( run = function(x = 'numeric[,]') {...})

or something like that.

Pros:

  • It would avoid ambiguity with R's double(), etc.

Cons:

  • We'd still have to discuss formatting details.
  • It uses a string instead of code (could be argued as a "pro")
  • It's still inconsistent with default-argument syntax in R.

Option 3: Something more C-like for declarations

This would look like

  • nimbleFunction( run = function(numeric(x, ...)) {...})
  • (Note that function( numeric x ) {} would not survive R's parser.

or something like that.

Pros:

  • It highlights that this is a type declaration and reduces ambiguity with default arguments.

Cons:

  • It's non-R-like in that the declared variable appears as an argument (could be argued as a pro).
  • The details would have to be discussed (dimensions, initial scalar value, etc.)

Option 4: As closely as possible, adopt the the R-like type creation system for declarations.

This would look like:

  • nimbleFunction( run = function(x = matrix()) {...})

or something like that.

Pros:

  • Syntax would be similar for creation and declaration (could be argued as a con, since these are distinct purposes).
  • More R-like and compact than some of the other options

Cons:

  • Would still look like a default value entry but it isn't.
  • Syntax may still have to differ somewhat from object creation, since sizes are really necessary upon creation but not at declaration. Maybe this isn't an issue if there are default sizes.
Some syntax extensions

Upon discussion, we talked about having keywords constructed by joining [numeric|integer|logical] to [Vector|Matrix|Array], e.g. numericArray or integerVector, etc.

These could be implemented in the DSL as well (e.g. X <- numericArray(...)), to maintain similarity between type declaration and object creation.

These could either:

  • replace the type argument in the existing system, i.e. once numericArray(...) is available, then array(..., type = "numeric") would be deprecated.
  • OR supplement the existing system, i.e. either numericArray(...) or array(..., "numeric") would work.

Thoughts?

The issue of scalars

Upon discussion, we want to think more about an appropriate way to declare scalars in this option. The underlying problem is that R has no native scalar type, so there is nothing naturally R-like to imitate if nimble is going to have a native scalar type.

Some ideas:

  • include scalar(..., type = "numeric") and/or numericScalar(). Pros: It's consistent. Cons: Scalars are the simplest type, so typing "numericScalar()" seems heavy.
  • modify numeric(), integer() and logical() to allow creation of scalars in some way, but what? Pros: It's consistent to use those functions/declarations. Cons: It's not consistent with imitating R.
  • allow type inference from default values, e.g. x = 0 means numeric, x = true means logical, x = 0L means integer. It was pointed out in discussion that this idea re-mixes defaults with type declarations, creating potential confusion. However if we get through option 5 below at some stage, then defaults would make more sense as a way to declare types. Pros: It's simple. If we ever do support defaults, it would make the most sense. Cons: Most people aren't familiar with "0L" etc. but it would be dangerous to assume that "0" means integer. Until we support defaults, it's awkward to be the only type inferred by example rather than declaration.

Thoughts?

  • (DT) I'm not too set on this, but my initial feeling is to have scalar() take a single "type" argument, with default value "numeric". So scalar() is for a numeric scalar (perhaps the most common?), and scalar("integer") and scalar("logical") for integer and logical scalars. Then imitate numeric(), matrix() and array() syntax from the DSL, for higher-dimension objects.
  • (DT) Then, when we get to handling default values, if we decide it's not too confusing, also allowing (x = 3) to denote a numeric scalar with default value 3, and similarly x=3L, or x=TRUE, for other scalar types with default values. But also supporting scalar(default=3), scalar("integer", default=3L), and scalar("logical", default=TRUE).
  • (DT) But like I said, I'm not set on this. It seems the best to me right now, but I could be convinced otherwise if there's a really compelling suggestion.
  • numericScalar(), integerScalar(), logicalScalar(), and numericVector(), integerVector(), logicalVector(), and numericArray(), integerArray(), logicalArray(), could be another alternative. It's nicely self-consistent, but very long-winded! Also, does not generalize well for higher dimensions, so the array() version would end up having to take a dim argument, anyway.
  • (CP) Something about having the dimensionality or type as arguments bugs me, so I still like the long-winded [integer|numeric|logical][Scalar|Vector|Matrix|Array], with [integer|numeric|logical] defaulting to scalars, but for arrays we will need the dimensionality as an argument, unless we do Array1d, Array2d, Array3d, Array4d,...
  • (CP) I had been hoping that the type decs and the object creation could be similar syntax but I'm not seeing a natural way to do that, unless in type declarations we have numeric/integer/logical default to scalars and in object creation we have it default to vectors (of course this could be confusing), and then we use [integer|numeric|logical][Vector|Matrix|Array] as the full names in both cases. Of course in object creation they then take size and initial value arguments, unlike in type declaration.
  • (CP) For backward compatibility, we could have numeric() = double() and have numeric(0),numeric(1),... and integer(0), integer(1), ... and logical(0), logical(1), ... behave as they do now. So numeric/integer/logical have dimension as their first arg, with default of 0.

Here is my (PD) attempt at merging and choosing among ideas:

  • Support both [integer|numeric|logical][Scalar|Vector|Matrix|Array]() and scalar(), numeric(), matrix(), array() for both type declaration and object creation. I agree the former are clearer because the type is not really dynamic and so is better indicated in the keyword, not an argument. But the latter are convenient because they more closely mimic familiar R functions. I agree there is an explicit need to declare scalars. The only difference is the latter four "functions" can take a type argument that defaults to "numeric"
  • Also support scalar type declaration via providing a default, such as x = 3 or x = 3L or x = TRUE. Since this is feasible to do now, I don't think we should wait until we provide support for non-scalar defaults.

More CP comments:

  • PD synthesis sounds reasonable to me. I'd like to have us have some discussion of defaults and argument ordering for these various functions/declarations.
  • Of scalar(), numeric(), matrix() and array(), one of those (numeric) is not like the others. Would we also allow integer() and logical()? If so, we actually have three approaches:
    • [integer|numeric|logical][Scalar|Vector|Matrix|Array]()
    • scalar(), matrix(), array()
    • numeric(), logical(), integer()
  • How about we allow the default to be what is currently "double(0)", so that someone could simply do: run = function(x, y) and have x and y be scalar doubles.
  • Will we have backward compatibility and allow integer(dim), logical(dim), double(dim) type declarations at least for a version or two?

Option 5: Option 4 with extension of nimbleFunctions to use limited default arguments

Syntax would look like option 4.

The idea here is that we could arrange for a limited set of default argument styles to be valid. E.g. if x is omitted we can initialize it with an uninitialized size 1 matrix.

It could still be confusing that x must always be the same type as its default value, but that is consistent with the rest of the nimble DSL: one can't re-use the same variable for a different type.

With some thought it might work to evaluate more meaningful default arguments, e.g. run = function(x = matrix(), y = chol(x)) {...}. I think what the compiler would have to do is add a line in a calling function to evaluate the default argument if it is needed (i.e. if it was omitted from the call, which is known at compile time). We'd have to sort the order of default argument evaluation and do type and size processing on them, e.g., determine that y is going to be a matrix based on chol(x) and the type of x. But at the moment it seems feasible since we have such as system that would just have to be used in a slightly new context.

Example code posted previously about the issues of this page (just preserving this).

Modify how we specify nf argument types/dims and returnValue types/dims so more consistent with new use of numeric(), matrix(), integer(), array() and so more clear that the information is parsed and not evaluated as function calls. E.g., below is a confusing mix:

    rDynamicOccupancy <- nimbleFunction(
      run = function(n = integer(),
        nrep = integer(),
        psi1 = double(),
        phi = double(1),
        gamma = double(1),
        p = double(1),
        log = integer(0, default = 0)) {
        nyear <- length(p)
        ans <- matrix()   # a user naturally had written this:  ans <- double(2)
          # also note that old DSL would have had:  declare(ans, double(2))
        setSize(ans, nrep, nyear)
        ## could populate ans here, but I'm just doing this as a placeholder
        returnType(double(2))
        return(ans)
      }
      )
Clone this wiki locally