Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable intfloat/e5-mistral-7b-instruct model with 32k token lens on hpu device #2715

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ZhengHongming888
Copy link
Contributor

@ZhengHongming888 ZhengHongming888 commented Jun 4, 2024

This PR belongs to one of enabling Intel's Gaudi2 GPU supported tasks for Sentence Transformer's inference/training

This PR enables intfloat/e5-mistral-7b-instruct model with 32k token lens input on hpu device and it is the revision of PR#2656.

There are two parts for updates -

  1. Efficient new padding for bigger token lens input by using multiple of 128 instead of original power of 2 to reduce the padding overhead when the input token lens is bigger which is not efficient for power of 2.

  2. Bring in the 7b mistral 32k token lens support with hpu device by using the specific arguments in high level encode arguments which is not hard coded as previous PR.

The usage example for 7b mistral with 32k token lens will be -

hpu_kwargs = {"attn_softmax_bf16": True, "reuse_cache": True, "use_flash_attention":True,"flash_attention_recompute": True,"flash_attention_causal_mask": True, }
emb = model.encode(sentences, batch_size=32, kwargs={"hpu_kwargs" : hpu_kwargs})

any questions please comments.

thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant