-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable IPEXModel on XPU #663
Comments
Hi @jiqing-feng, would it be a similar integration to what was integrated in ipex-llm ? |
Not exactly the same, we plan to keep only one attn forward but will split into different parts and will let tensor device to chose which op should be used, like llama_attn_forward:
key_cache, value_cache = preprocess_for_optimize(hidden_states, past_key_value, kwargs)
query, key, value = self.qkv_gemm(hidden_states, key_cache, value_cache, kwargs)
key, value = self.rope(key, value, position_ids, past_key_value, kwargs)
present = get_present(key, value, past_key_value)
attn_output, attn_weight, past_key_value = self.sdpa(query, key, value, attention_mask, past_key_value, kwargs)
attn_output = attn_output.transpose(1, 2)
attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
if not output_attentions:
attn_weights = None
return attn_output, attn_weights, past_key_value
self.sdpa:
if cpu:
sdpa = self.ipex_scale_dot_product
elif xpu:
sdpa = self.sdpa_xpu
(attn_output, attn_weights, past_key_value) = sdpa (
query,
key,
value,
math.sqrt(self.head_dim),
past_key_value,
None,
attention_mask,)
return attn_output, attn_weights, past_key_value |
For me it would make sense to keep this integration to ipex-llm and to only enable loading of exported model in optimum-intel (through |
Hi @jiqing-feng, I see that different llama modeling (and other additional architectures) were introduced in both ipex and ipex-llm to introduce ipex optimization. I think redefining the modeling of transformers modeling (for different architecture and different optimziation) is not something that we want to introduce in For these reasons I'd be in favor of keeping modeling_utils only for adding changes that are required for the export (like done in https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/utils/modeling_utils.py#L25) and to move the rest to an other repo (itrex or ipex-llm could be good candidates for example) that could be used by |
Hi @echarlaix . I want to enable all model utils in ipex (modeling_utils) on XPU; it may need some changes including another if-branch in forward or 2 forward functions (1 for CPU and 1 for GPU), the k-v cache is also different.
Is there any XPU issue on optimum-intel that may block our work, like the XPU version and CI tests? I also need your integration advice. Thx!
The text was updated successfully, but these errors were encountered: