Skip to content

Developer Guide Transforms

Ben Murray edited this page Nov 23, 2022 · 9 revisions

Transforms

Note: transform design is changing due to work for lazy resampling (#4855). This documentation will be updated in due course, but for now, the proposed design changes can be found [here](Developer Guide Lazy Transforms).

scope

Transforms (or preprocessors) are callables that convert input data into a form that is ready to be consumed by deep learning models. In general, a transform can also have internal states, so that calling it with different data inputs will give consistently processed outputs.

In MONAI, the transform takes the following pattern:

class Transform:

  def __init__(self, system_params):
    # set states using system parameters
    self.some_states = ...  # from system_params
  
  def __call__(self, input_data, data_params):
    # using self.some_states and data_params
    #   process data and return output_data

A typical usage of Transform is:

transform = Transform(system_params)  # construct a transform instance
output_data = transform(input_data, data_params)  # apply the transform

The uses and developers will directly interact with these interfaces.

With the goal of readability and flexibility, the following sections discuss assumptions and designs of the interfaces.

system_params

System parameters are "static" information that is not data-dependent. They are known and fixed parameters before we have access to input_data.

The parameters are stored as instance variables (self.some_states) in each transform.

In a multi-processing context, the transform's instance variables are not shared among different workers. Once constructed, each worker process will operate on it's own states.

input_data, data_params

MONAI provides both

The rationale is described here. The main differences are in the assumptions on input_data and data_params:

vanilla transform

should work seamlessly with numpy ndarrays.

 def __call__(self, input_data, data_params):
   # process input_data
  • input_data: a multi-dimensional array,
  • data_params: as additional information from data, to be used when processing the input_data. The data_params are runtime parameters, any static parameters should go into system_params in the transform's constructor.

For example, a vanilla RandRotate90 transform works as the following:

img = np.array((1, 2, 3, 4)).reshape((1, 2, 2))
rotator = RandRotate90(prob=0.0, max_k=3, axes=(1, 2))
img_result = rotator(img)
print(type(img_result))
print(img_result)

# output:
>>> 
<class 'numpy.ndarray'>
[[[1 2]
  [3 4]]]

The vanilla transforms are located in monai/transforms/transforms.py in the codebase.

dictionary-based transform

assumes input_data is a dictionary with ndarray, and data_params is also a part of the dictionary. The transform's call method therefore has the form:

def __call__(self, input_data):
  # process dict(input_data)

For example, a dictionary-based RandRotate90d transform works as the following:

data = {
    'img': np.array((1, 2, 3, 4)).reshape((1, 2, 2)),
    'seg': np.array((1, 2, 3, 4)).reshape((1, 2, 2)),
    'unused': 5,
}
rotator = RandRotate90d(keys=('img', 'seg'), prob=0.8)
data_result = rotator(data)
print(data_result)

# output:
>>>
{'unused': 5, 'img': array([[[4, 3],
        [2, 1]]]), 'seg': array([[[4, 3],
        [2, 1]]])}

These transforms are adaptors on top of the vanilla transforms, to facilitate the compositions of multiple transforms:

composed = monai.transforms.Compose([Transform1d(system_params), 
                                     Transform2d(system_params),
                                     ...])
output_data = composed(input_data)  # input_data is a dictionary

The dictionary-based transforms are located in monai/transforms/composables.py in the codebase.

These transforms take [TransformClassName]d as the class name, indicating that it is a dictionary-based adaptor for the vanilla transform monai.transforms.transforms.TransformClassName.

shape convention

Most of the pre-processing transforms assume the input ND arrays has the shape: [num_channels, spatial_dims], where

  • spatial_dims may have
    • 0 element ([num_channels], e.g. classification labels),
    • 1 element ([num_channels, w], spatially 1D),
    • 2 elements ([num_channels, h, w], spatially 2D)
    • ...
    • N elements ([num_channels, d, h, w, ...], spatially ND).
  • num_channels must be greater or equal than 1 (input data with shape [spatial_dims] has to be reshaped into [1, spatial_dims] beforehand).
  • each transform may or may not support all spatially ND inputs.
  • the returned ND arrays from a transform should take the same shape convention.

Most of the post-processing transforms assume the input ND arrays has the shape: [batch_size, num_channels, spatial_dims] [num_channels, spatial_dims] (updated since v0.6).

randomized transforms

MONAI provides a randomizable interface so that each transform can generate processed data subject to some random factors (often used in training data augmentation).

The interface has:

  • an R variable to store the random number generator container np.random.RandomState. All derived classes should use self.R instead of np.random to generate random factors, E.g., np.random.rand() should be replaced by self.R.rand().
  • a randomize() method, where all self.R related random factors are generated.
  • a set_random_state method, to set the random number generator container's state.

The interface is located at monai/transforms/compose.py in the codebase.

These transforms take Rand[TransformClassName][d] as the class name, indicating that it is a randomized transform for monai.transforms.transforms.TransformClassName[d].

For the randomized dictionary-based transforms, when len(keys) is greater than 1, we expect values of the corresponding keys would be updated simultaneously (most of the transforms of this kind follow this pattern).

Clone this wiki locally