[ENHANCEMENT] Add layer name in a layer to improve code debugging #1198

rybakov · 2024-10-04T23:48:43Z

Is your feature request related to a problem? Please describe.
I am adding new features in TranformerEngine(TE) and observe issues with model quality (gap in the loss with loss spikes).
I am debugging Megatron with TE, by storing tensor statistics in impacted layers.
But I do not have information about layer name and layer order(index) in the model topology.

Describe the solution you'd like
It would be great to add proper layer name with its order in the model, so that customers can use it for model debugging.

Describe alternatives you've considered
There are multiple frameworks which support this simple feature, e.g:
Lingvo based on TF
Praxis based on JAX

Proposed implementation
I propose to add a layer_name filed which will be a unique name with layer hierarchy and its index/order (if there are multiple layers with the same name)

Here is an example:
'''
class TransformerBlock(MegatronModule):
"""Transformer class."""

def __init__(
    self,
   ...
    layer_name: str = "TransformerBlock",
):
    # offset is implicit in TransformerLayer
    self.layers = torch.nn.ModuleList(
        [
            build_layer(layer_spec, i + 1, f"{self.layer_name}.blocks" if self.layer_name else None)
            for i, layer_spec in enumerate(self.submodules.layer_specs)
        ]
    )

'''
Additional context
In our local branch, this feature is already used by multiple people.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Add layer name in a layer to improve code debugging #1198

[ENHANCEMENT] Add layer name in a layer to improve code debugging #1198

rybakov commented Oct 4, 2024

[ENHANCEMENT] Add layer name in a layer to improve code debugging #1198

[ENHANCEMENT] Add layer name in a layer to improve code debugging #1198

Comments

rybakov commented Oct 4, 2024