You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Initially, I noticed that the PhiForCausalLM class in L27 should be Phi3ForCausalLM instead. If not corrected, this leads to a config error:
AttributeError: 'Phi3Config' object has no attribute 'partial_rotary_factor'.
Possibly wrong usage in training
Despite the above fix, the pre-trained VLM is not usable. I checked the saved checkpoint you provide (phi-instruct-3+4b+clip, and printed the keys and corresponding values.
Upon investigating the structure of Phi2-3b, I found that the parameters match, except for the tensor sizes. This leads me to suspect that during training, the model might have used phi2 instead of phi3, but with phi3's dimensions.
I believe this discrepancy won't affect the most conclusions in the paper. But could you please look into this issue?
Thanks for your attention to this matter.
Qinyu
The text was updated successfully, but these errors were encountered:
We had engineering problems with Phi 3 and decided not to spend the effort fixing it. I don't believe we included it in the paper (although please double check). The other models should work though.
Hi,
Thank you for your great work!
I've been trying to use the Phi-3-Instruct-4B VLM models, but encountered several issues:
https://github.com/RylanSchaeffer/prismatic-vlms/blob/95e3097f7a3bcc7f5ac95357daccb28b33a19363/prismatic/models/backbones/llm/phi.py#L9C1-L30C10
Initially, I noticed that the PhiForCausalLM class in L27 should be Phi3ForCausalLM instead. If not corrected, this leads to a config error:
AttributeError: 'Phi3Config' object has no attribute 'partial_rotary_factor'.
Despite the above fix, the pre-trained VLM is not usable. I checked the saved checkpoint you provide (phi-instruct-3+4b+clip, and printed the keys and corresponding values.
The following are the keys and values of the checkpoint in microsoft/Phi-3-mini-4k-instruct
The following are the keys and values of the checkpoint in phi-2+3b
Upon investigating the structure of Phi2-3b, I found that the parameters match, except for the tensor sizes. This leads me to suspect that during training, the model might have used phi2 instead of phi3, but with phi3's dimensions.
I believe this discrepancy won't affect the most conclusions in the paper. But could you please look into this issue?
Thanks for your attention to this matter.
Qinyu
The text was updated successfully, but these errors were encountered: