-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Gemma2-2b model for inference in Android #5610
Comments
related issue:#5570 from MediaPipe official website.it says we can use AI Edge Torch to convert Gemma2-2b to suitable format but there are no more details: If MediaPipe Python Convert tool can support this conversion it would be good. thanks for all of you developers. |
Hi @FranzKafkaYu, Came across a similar source,
Thanks in advance. Other related issues: |
It seems that the issue was raised based on the following link: The method for converting using AI Edge Torch is detailed in the guidelines provided in the above link. Based on this, it seems the conversion process would be as follows: graph TD
A[Kaggle .ckpt file] --> B[AI Edge Torch .tflite conversion]
B --> C[MediaPipe Android inference]
P.S. It seems there might be a typo in the Android guide. "AI Edge Troch" should be corrected to "AI Edge Torch in the website. |
I think the documentation should mention about https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/gemma/convert_gemma2_to_tflite.py |
@KennethanCeyer Hi Ken,you can find more details via this link,if you wanna use a model with MediaPipe Solutions/Framework,you need to conver model,safetensors/pytorch format->tflite format->MediaPipe format. currently if you use MediaPipe has provided a python library for converting safetensors/pytorch format->MediaPipe format with two different methods,details here,but now this library doesn't support But I have checked AI Edge Torch,it lacks details to how can we complete this convert first,and in MediaPipe LLM Inference API demonstrations there are little informations about how can we use these “bundled model”,which is ended with *.task,the sample code used a native model,which is ended with *.bin. I have tried other project,like llama.cpp and gemma.cpp,the performance is not good because they mainlly use CPU to excute inference.You can have a try but I think MediaPipe witch GPU backend would be better. I am not a native English speaker,so my English is not very good.Hope these info can help you. |
GOOD,I will try this script and see whether we can go to the next step |
Hi @FranzKafkaYu, Thank you for the explanation you've done an excellent job explaining the situation. I've actually been investigating the same issue of using Gemma 2 with LiteRT( Based on the most recent visible documentation, it appears we need to convert the .ckpt file to .tffile using AI Edge Torch, and then use it according to each specific use case. (It seems like the documentation is lacking. It doesn’t look like it’s been around for very long) The code I mentioned above seems to be the closest thing to an official guide at the moment. I'm currently working on this myself, and I'm planning to write a blog post about it when I'm done. Once it's ready, I'll make sure to share the link here in this issue for reference. Thanks again for your helpful insights and creating this issue, Franz. |
With quite a few questions expected around running Gemma 2 with MediaPipe, I made a Colab used for the conversion along with related issues and PRs. The notebook will be continuously updated until the official tflite or MediaPipe tasks are released. |
Hi @FranzKafkaYu, Apologies for the delayed response. Support for Gemma 2-2B is now available, and ongoing discussions are happening here. Please let us know if you require any further assistance, or if we can proceed to close the issue and mark it as internally resolved, as the feature has been implemented. Thank you!! |
Hi @FranzKafkaYu , On Google Colab: On local machine: The same line outputs Segmentation Fault. I've checked my system's memory, and it's not an issue of insufficient memory. The same error occurs consistently in both environments. Any suggestions on what could be causing this segmentation fault or how to troubleshoot further would be greatly appreciated! Thanks in advance! colab logs: |
Hi, Just wanted to update this issue with the latest info. Previously (as is discussed in this issue), Gemma 2 2B was only available in the LLM Inference API by going through a conversion pathway via They have the extension |
Tiny correction: the CPU model is a |
MediaPipe Solution (you are using)
Android library:com.google.mediapipe:tasks-genai:0.10.14
Programming language
Android Java
Are you willing to contribute it
None
Describe the feature and the current behaviour/state
currently we have no suitable MediaPipe format for Gemma2-2b running in Android,MediaPipe Python libraries can't complete conversion
Will this change the current API? How?
no
Who will benefit with this feature?
all of us
Please specify the use cases for this feature
use the latest Gemma2 model with mediapipe
Any Other info
No response
The text was updated successfully, but these errors were encountered: