You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I am adding new features in TranformerEngine(TE) and observe issues with model quality (gap in the loss with loss spikes).
I am debugging Megatron with TE, by storing tensor statistics in impacted layers.
But I do not have information about layer name and layer order(index) in the model topology.
Describe the solution you'd like
It would be great to add proper layer name with its order in the model, so that customers can use it for model debugging.
Describe alternatives you've considered
There are multiple frameworks which support this simple feature, e.g: Lingvo based on TF Praxis based on JAX
Proposed implementation
I propose to add a layer_name filed which will be a unique name with layer hierarchy and its index/order (if there are multiple layers with the same name)
Here is an example:
'''
class TransformerBlock(MegatronModule):
"""Transformer class."""
def __init__(
self,
...
layer_name: str = "TransformerBlock",
):
# offset is implicit in TransformerLayer
self.layers = torch.nn.ModuleList(
[
build_layer(layer_spec, i + 1, f"{self.layer_name}.blocks" if self.layer_name else None)
for i, layer_spec in enumerate(self.submodules.layer_specs)
]
)
''' Additional context
In our local branch, this feature is already used by multiple people.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
I am adding new features in TranformerEngine(TE) and observe issues with model quality (gap in the loss with loss spikes).
I am debugging Megatron with TE, by storing tensor statistics in impacted layers.
But I do not have information about layer name and layer order(index) in the model topology.
Describe the solution you'd like
It would be great to add proper layer name with its order in the model, so that customers can use it for model debugging.
Describe alternatives you've considered
There are multiple frameworks which support this simple feature, e.g:
Lingvo based on TF
Praxis based on JAX
Proposed implementation
I propose to add a layer_name filed which will be a unique name with layer hierarchy and its index/order (if there are multiple layers with the same name)
Here is an example:
'''
class TransformerBlock(MegatronModule):
"""Transformer class."""
'''
Additional context
In our local branch, this feature is already used by multiple people.
The text was updated successfully, but these errors were encountered: