Enabling custom MoE routing distributions at runtime #2497

Mutinifni · 2024-11-25T20:23:16Z

For MoE models such as Mixtral, I want to override router expert selection to use custom routing distributions (as mentioned in #2331). To run my benchmarks, I am using gptManagerBenchmark, which requires a pre-built engine. However, I want to pass the routing distribution as a runtime config parameter which I can modify across runs without needing to rebuild a new engine. This would be similar to the routing string in the MoE layer microbenchmarks, but for the entire model instead of just 1 layer. Is there any way to do this dynamically? Thanks!

My current approach requires rebuilding the engine for different routing distributions. Specifically, I pass the routing distribution for each layer as a list of probabilities to the MixtureOfExperts module. The distribution is read from an external file.

class MixtureOfExperts(Module):

    def __init__(self,
                 [...]
                 experts_distribution: Optional[float] = None):

I then override the router output in forward() by sampling according to the experts_distribution:

def adjust_routing_to_distribution(self, routing, experts_distribution, k):
    num_tokens, num_experts = routing.shape
    assert experts_distribution.shape[0] == num_experts, "Distribution size must match number of experts"

    # Sample top-k experts based on the distribution
    adjusted_routing = torch.zeros_like(routing)
    for i in range(num_tokens):
        sampled_experts = torch.multinomial(experts_distribution, k, replacement=False)
        adjusted_routing[i, sampled_experts] = routing[i, sampled_experts]

    # Normalize the adjusted routing probabilities to ensure they sum to 1
    adjusted_routing = adjusted_routing / adjusted_routing.sum(dim=1, keepdim=True)

    return adjusted_routing

The text was updated successfully, but these errors were encountered:

djns99 · 2024-12-04T03:31:59Z

Hi @Mutinifni, thanks for your request. We don't currently have a plan to do a generic implementation of a "fake routing" module like you describe. I think your manual solution is probably the best approach for now.
To avoid the engine rebuild the best way is likely to add a new input tensor for the TRT network that replaces the router and simply use that as your input to the MOE plugin. You can then initialise this to whatever distribution you like in the runtime. Though whatever approach will require some manual modifications for the time being.
I'll close this issue for now, feel free to comment further if you have anything to add

Mutinifni changed the title ~~Enabling runtime model configs~~ Enabling custom MoE routing distributions at runtime Nov 25, 2024

hello-11 added the triaged Issue has been triaged by maintainers label Dec 2, 2024

djns99 added the feature request New feature or request label Dec 4, 2024

djns99 closed this as completed Dec 4, 2024

hello-11 assigned djns99 Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling custom MoE routing distributions at runtime #2497

Enabling custom MoE routing distributions at runtime #2497

Mutinifni commented Nov 25, 2024 •

edited

Loading

djns99 commented Dec 4, 2024

Enabling custom MoE routing distributions at runtime #2497

Enabling custom MoE routing distributions at runtime #2497

Comments

Mutinifni commented Nov 25, 2024 • edited Loading

djns99 commented Dec 4, 2024

Mutinifni commented Nov 25, 2024 •

edited

Loading