You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For MoE models such as Mixtral, I want to override router expert selection to use custom routing distributions (as mentioned in #2331). To run my benchmarks, I am using gptManagerBenchmark, which requires a pre-built engine. However, I want to pass the routing distribution as a runtime config parameter which I can modify across runs without needing to rebuild a new engine. This would be similar to the routing string in the MoE layer microbenchmarks, but for the entire model instead of just 1 layer. Is there any way to do this dynamically? Thanks!
My current approach requires rebuilding the engine for different routing distributions. Specifically, I pass the routing distribution for each layer as a list of probabilities to the MixtureOfExperts module. The distribution is read from an external file.
I then override the router output in forward() by sampling according to the experts_distribution:
defadjust_routing_to_distribution(self, routing, experts_distribution, k):
num_tokens, num_experts=routing.shapeassertexperts_distribution.shape[0] ==num_experts, "Distribution size must match number of experts"# Sample top-k experts based on the distributionadjusted_routing=torch.zeros_like(routing)
foriinrange(num_tokens):
sampled_experts=torch.multinomial(experts_distribution, k, replacement=False)
adjusted_routing[i, sampled_experts] =routing[i, sampled_experts]
# Normalize the adjusted routing probabilities to ensure they sum to 1adjusted_routing=adjusted_routing/adjusted_routing.sum(dim=1, keepdim=True)
returnadjusted_routing
The text was updated successfully, but these errors were encountered:
Mutinifni
changed the title
Enabling runtime model configs
Enabling custom MoE routing distributions at runtime
Nov 25, 2024
Hi @Mutinifni, thanks for your request. We don't currently have a plan to do a generic implementation of a "fake routing" module like you describe. I think your manual solution is probably the best approach for now.
To avoid the engine rebuild the best way is likely to add a new input tensor for the TRT network that replaces the router and simply use that as your input to the MOE plugin. You can then initialise this to whatever distribution you like in the runtime. Though whatever approach will require some manual modifications for the time being.
I'll close this issue for now, feel free to comment further if you have anything to add
For MoE models such as Mixtral, I want to override router expert selection to use custom routing distributions (as mentioned in #2331). To run my benchmarks, I am using
gptManagerBenchmark
, which requires a pre-built engine. However, I want to pass the routing distribution as a runtime config parameter which I can modify across runs without needing to rebuild a new engine. This would be similar to the routing string in the MoE layer microbenchmarks, but for the entire model instead of just 1 layer. Is there any way to do this dynamically? Thanks!My current approach requires rebuilding the engine for different routing distributions. Specifically, I pass the routing distribution for each layer as a list of probabilities to the MixtureOfExperts module. The distribution is read from an external file.
I then override the router output in
forward()
by sampling according to the experts_distribution:The text was updated successfully, but these errors were encountered: