Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi-3-Based VLMs Not Usable Possibly Due to Incorrect Model Configuration #25

Open
Qinyu-Allen-Zhao opened this issue Aug 15, 2024 · 2 comments

Comments

@Qinyu-Allen-Zhao
Copy link

Hi,

Thank you for your great work!

I've been trying to use the Phi-3-Instruct-4B VLM models, but encountered several issues:

  • Incorrect LLM backbone choice in phi.py:

https://github.com/RylanSchaeffer/prismatic-vlms/blob/95e3097f7a3bcc7f5ac95357daccb28b33a19363/prismatic/models/backbones/llm/phi.py#L9C1-L30C10

Initially, I noticed that the PhiForCausalLM class in L27 should be Phi3ForCausalLM instead. If not corrected, this leads to a config error:

AttributeError: 'Phi3Config' object has no attribute 'partial_rotary_factor'.

  • Possibly wrong usage in training

Despite the above fix, the pre-trained VLM is not usable. I checked the saved checkpoint you provide (phi-instruct-3+4b+clip, and printed the keys and corresponding values.

Key: llm.model.embed_tokens.weight | Param shape: torch.Size([32064, 3072])
Key: llm.model.layers.0.self_attn.q_proj.weight | Param shape: torch.Size([3072, 3072])
Key: llm.model.layers.0.self_attn.q_proj.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.self_attn.k_proj.weight | Param shape: torch.Size([3072, 3072])
Key: llm.model.layers.0.self_attn.k_proj.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.self_attn.v_proj.weight | Param shape: torch.Size([3072, 3072])
Key: llm.model.layers.0.self_attn.v_proj.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.self_attn.dense.weight | Param shape: torch.Size([3072, 3072])
Key: llm.model.layers.0.self_attn.dense.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.mlp.fc1.weight | Param shape: torch.Size([8192, 3072])
Key: llm.model.layers.0.mlp.fc1.bias | Param shape: torch.Size([8192])
Key: llm.model.layers.0.mlp.fc2.weight | Param shape: torch.Size([3072, 8192])
Key: llm.model.layers.0.mlp.fc2.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.input_layernorm.weight | Param shape: torch.Size([3072])
Key: llm.model.layers.0.input_layernorm.bias | Param shape: torch.Size([3072])
....

The following are the keys and values of the checkpoint in microsoft/Phi-3-mini-4k-instruct

Key: model.embed_tokens.weight | Param shape: torch.Size([32064, 3072])
Key: model.layers.0.self_attn.o_proj.weight | Param shape: torch.Size([3072, 3072])
Key: model.layers.0.self_attn.qkv_proj.weight | Param shape: torch.Size([9216, 3072])
Key: model.layers.0.mlp.gate_up_proj.weight | Param shape: torch.Size([16384, 3072])
Key: model.layers.0.mlp.down_proj.weight | Param shape: torch.Size([3072, 8192])
Key: model.layers.0.input_layernorm.weight | Param shape: torch.Size([3072])
Key: model.layers.0.post_attention_layernorm.weight | Param shape: torch.Size([3072])

The following are the keys and values of the checkpoint in phi-2+3b

llm.model.embed_tokens.weight torch.Size([50304, 2560])
llm.model.layers.0.self_attn.q_proj.weight torch.Size([2560, 2560])
llm.model.layers.0.self_attn.q_proj.bias torch.Size([2560])
llm.model.layers.0.self_attn.k_proj.weight torch.Size([2560, 2560])
llm.model.layers.0.self_attn.k_proj.bias torch.Size([2560])
llm.model.layers.0.self_attn.v_proj.weight torch.Size([2560, 2560])
llm.model.layers.0.self_attn.v_proj.bias torch.Size([2560])
llm.model.layers.0.self_attn.dense.weight torch.Size([2560, 2560])
llm.model.layers.0.self_attn.dense.bias torch.Size([2560])
llm.model.layers.0.mlp.fc1.weight torch.Size([10240, 2560])
llm.model.layers.0.mlp.fc1.bias torch.Size([10240])
llm.model.layers.0.mlp.fc2.weight torch.Size([2560, 10240])
llm.model.layers.0.mlp.fc2.bias torch.Size([2560])
llm.model.layers.0.input_layernorm.weight torch.Size([2560])
llm.model.layers.0.input_layernorm.bias torch.Size([2560])

Upon investigating the structure of Phi2-3b, I found that the parameters match, except for the tensor sizes. This leads me to suspect that during training, the model might have used phi2 instead of phi3, but with phi3's dimensions.

I believe this discrepancy won't affect the most conclusions in the paper. But could you please look into this issue?

Thanks for your attention to this matter.

Qinyu

@RylanSchaeffer
Copy link
Owner

We had engineering problems with Phi 3 and decided not to spend the effort fixing it. I don't believe we included it in the paper (although please double check). The other models should work though.

@Qinyu-Allen-Zhao
Copy link
Author

Thank you for your response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants