What is the intended way of doing EMA with PEFT? #1557

samedii · 2024-03-12T16:46:28Z

samedii
Mar 12, 2024

Currently I'm doing a full copy of the entire model instead of just the LoRA. Would be great to not have to do this. Maybe if there was a way to easily get only the LoRA weights in a nn.Module.

I normally use PostHocEMA from https://github.com/lucidrains/ema-pytorch.

Is there an intended way of doing this that I've just missed?

Answered by samedii

Mar 13, 2024

I ended up using a wrapper that keeps track of all the trained parameters.

from torch import nn


class TrainablesContainer(nn.ModuleDict):
    @classmethod
    def from_module(cls, module: nn.Module, parent_name=""):
        module_dict = cls()
        for name, sub_module in module.named_children():
            full_name = f"{parent_name}.{name}" if parent_name else name
            if list(sub_module.children()):  # If the submodule has children, recurse
                module_dict[name] = cls.from_module(sub_module, full_name)
            else:
                # Create a ParameterDict for leaf module
                param_dict = nn.ParameterDict()
                for param_name, param…

View full answer

BenjaminBossan · 2024-03-12T17:00:18Z

BenjaminBossan
Mar 12, 2024
Maintainer

Just did a quick glance. I think the issue is this:

https://github.com/lucidrains/ema-pytorch/blob/c77058c4ff6d1eefbe512776eb5059d760897d5a/ema_pytorch/ema_pytorch.py#L140-L144

Here, all parameters are used and there is no way on the PEFT side that we could prevent this. There is a filter for parameter_names, but that's only filtering for the dtype and cannot be changed by the user. If you modify your local copy of the code, you could add extra logic here to skip parameters without "lora_" in the name.

Not sure if lucidrains accepts PRs, but you could try to suggest adding a filter_fn parameter to EMA that would allow passing a callable or regex so that users can determine how to filter the parameters. Then this line could be changed:

- self.parameter_names = {name for name, param in self.ema_model.named_parameters() if param.dtype in [torch.float, torch.float16]}
+ self.parameter_names = {name for name, param in self.ema_model.named_parameters() if param.dtype in [torch.float, torch.float16] and filter_fn(name)}

1 reply

samedii Mar 13, 2024
Author

Thanks for the response! I could indeed make a PR or just copy the code since it's basically just one file anyway.

samedii · 2024-03-13T19:13:23Z

samedii
Mar 13, 2024
Author

I ended up using a wrapper that keeps track of all the trained parameters.

from torch import nn


class TrainablesContainer(nn.ModuleDict):
    @classmethod
    def from_module(cls, module: nn.Module, parent_name=""):
        module_dict = cls()
        for name, sub_module in module.named_children():
            full_name = f"{parent_name}.{name}" if parent_name else name
            if list(sub_module.children()):  # If the submodule has children, recurse
                module_dict[name] = cls.from_module(sub_module, full_name)
            else:
                # Create a ParameterDict for leaf module
                param_dict = nn.ParameterDict()
                for param_name, parameter in sub_module.named_parameters():
                    if parameter.requires_grad:
                        # Use the parameter's name directly, without the module prefix
                        param_dict[param_name] = nn.Parameter(
                            parameter.data, requires_grad=False
                        )
                module_dict[name] = param_dict
        return module_dict

This can then just be given to the EMA like this:

ema = EMA(TrainablesContainer.from_module(my_peft_model))

And you can load the trainables state dict into the original model because it has the same structure.

my_peft_model.load_state_dict(my_trainables.state_dict(), strict=False)

I feel like I've ran into problems related of not being able to access the PEFT part model twice in a short time because it is always injected or part of the larger model.

7 replies

BenjaminBossan Mar 14, 2024
Maintainer

The latter should be possible. To functionally run inference without the adapter, use the with model.disable_adapter() context. To actually remove the adapter, use model.unload().

samedii Mar 14, 2024
Author

Sure but I would like to replace the adapter with an ema-version, run sampling, and then continue training. It would've been nice if for example model.unload() returned a "peft object" with state that I can inject again later.

I'm sure that it's possible already to do what I want with some combination of the current functions, deepcopy, and using named_parameters and matching names with "lora". It would've been nice if the design made it feel easier / more natural.

BenjaminBossan Mar 14, 2024
Maintainer

Did you try adding the ema-adapter to the existing peft model using load_adapter, then switching between the adapters using set_adapter? That way, you should be able to avoid unloading. However, as you said earlier, it would require saving the corresponding adapters to disk each time they change, since there is currently no way to load an adapter from a Python object.

samedii Mar 14, 2024
Author

Yes this was one of the methods I tried. Really liking PEFT so far and this being unable to get a "LoRA" object to save and manipulate has really been the only gripe in the design for me.

BenjaminBossan Mar 15, 2024
Maintainer

I think it should be possible to add this. When calling unload, we could optionally return a dict with one entry per adapter (key being the adapter name), whose values would be another dict that contains the config and the state_dict. And for load_adapter, we could modify it to accept state_dicts as well, so that the unloaded adapters could be reloaded again.

This would probably be a very niche feature. Still, if you want to give this a go and create a PR, feel free to do so.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the intended way of doing EMA with PEFT? #1557

{{title}}

Replies: 2 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

What is the intended way of doing EMA with PEFT? #1557

samedii Mar 12, 2024

Replies: 2 comments · 8 replies

BenjaminBossan Mar 12, 2024 Maintainer

samedii Mar 13, 2024 Author

samedii Mar 13, 2024 Author

BenjaminBossan Mar 14, 2024 Maintainer

samedii Mar 14, 2024 Author

BenjaminBossan Mar 14, 2024 Maintainer

samedii Mar 14, 2024 Author

BenjaminBossan Mar 15, 2024 Maintainer

samedii
Mar 12, 2024

Replies: 2 comments 8 replies

BenjaminBossan
Mar 12, 2024
Maintainer

samedii Mar 13, 2024
Author

samedii
Mar 13, 2024
Author

BenjaminBossan Mar 14, 2024
Maintainer

samedii Mar 14, 2024
Author

BenjaminBossan Mar 14, 2024
Maintainer

samedii Mar 14, 2024
Author

BenjaminBossan Mar 15, 2024
Maintainer