Phi-3-Based VLMs Not Usable Possibly Due to Incorrect Model Configuration #25

Qinyu-Allen-Zhao · 2024-08-15T04:50:42Z

Hi,

Thank you for your great work!

I've been trying to use the Phi-3-Instruct-4B VLM models, but encountered several issues:

Incorrect LLM backbone choice in phi.py:

https://github.com/RylanSchaeffer/prismatic-vlms/blob/95e3097f7a3bcc7f5ac95357daccb28b33a19363/prismatic/models/backbones/llm/phi.py#L9C1-L30C10

Initially, I noticed that the PhiForCausalLM class in L27 should be Phi3ForCausalLM instead. If not corrected, this leads to a config error:

AttributeError: 'Phi3Config' object has no attribute 'partial_rotary_factor'.

Possibly wrong usage in training

Despite the above fix, the pre-trained VLM is not usable. I checked the saved checkpoint you provide (phi-instruct-3+4b+clip, and printed the keys and corresponding values.

Key: llm.model.embed_tokens.weight | Param shape: torch.Size([32064, 3072])
Key: llm.model.layers.0.self_attn.q_proj.weight | Param shape: torch.Size([3072, 3072])
Key: llm.model.layers.0.self_attn.q_proj.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.self_attn.k_proj.weight | Param shape: torch.Size([3072, 3072])
Key: llm.model.layers.0.self_attn.k_proj.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.self_attn.v_proj.weight | Param shape: torch.Size([3072, 3072])
Key: llm.model.layers.0.self_attn.v_proj.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.self_attn.dense.weight | Param shape: torch.Size([3072, 3072])
Key: llm.model.layers.0.self_attn.dense.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.mlp.fc1.weight | Param shape: torch.Size([8192, 3072])
Key: llm.model.layers.0.mlp.fc1.bias | Param shape: torch.Size([8192])
Key: llm.model.layers.0.mlp.fc2.weight | Param shape: torch.Size([3072, 8192])
Key: llm.model.layers.0.mlp.fc2.bias | Param shape: torch.Size([3072])
Key: llm.model.layers.0.input_layernorm.weight | Param shape: torch.Size([3072])
Key: llm.model.layers.0.input_layernorm.bias | Param shape: torch.Size([3072])
....

The following are the keys and values of the checkpoint in microsoft/Phi-3-mini-4k-instruct

Key: model.embed_tokens.weight | Param shape: torch.Size([32064, 3072])
Key: model.layers.0.self_attn.o_proj.weight | Param shape: torch.Size([3072, 3072])
Key: model.layers.0.self_attn.qkv_proj.weight | Param shape: torch.Size([9216, 3072])
Key: model.layers.0.mlp.gate_up_proj.weight | Param shape: torch.Size([16384, 3072])
Key: model.layers.0.mlp.down_proj.weight | Param shape: torch.Size([3072, 8192])
Key: model.layers.0.input_layernorm.weight | Param shape: torch.Size([3072])
Key: model.layers.0.post_attention_layernorm.weight | Param shape: torch.Size([3072])

The following are the keys and values of the checkpoint in phi-2+3b

llm.model.embed_tokens.weight torch.Size([50304, 2560])
llm.model.layers.0.self_attn.q_proj.weight torch.Size([2560, 2560])
llm.model.layers.0.self_attn.q_proj.bias torch.Size([2560])
llm.model.layers.0.self_attn.k_proj.weight torch.Size([2560, 2560])
llm.model.layers.0.self_attn.k_proj.bias torch.Size([2560])
llm.model.layers.0.self_attn.v_proj.weight torch.Size([2560, 2560])
llm.model.layers.0.self_attn.v_proj.bias torch.Size([2560])
llm.model.layers.0.self_attn.dense.weight torch.Size([2560, 2560])
llm.model.layers.0.self_attn.dense.bias torch.Size([2560])
llm.model.layers.0.mlp.fc1.weight torch.Size([10240, 2560])
llm.model.layers.0.mlp.fc1.bias torch.Size([10240])
llm.model.layers.0.mlp.fc2.weight torch.Size([2560, 10240])
llm.model.layers.0.mlp.fc2.bias torch.Size([2560])
llm.model.layers.0.input_layernorm.weight torch.Size([2560])
llm.model.layers.0.input_layernorm.bias torch.Size([2560])

Upon investigating the structure of Phi2-3b, I found that the parameters match, except for the tensor sizes. This leads me to suspect that during training, the model might have used phi2 instead of phi3, but with phi3's dimensions.

I believe this discrepancy won't affect the most conclusions in the paper. But could you please look into this issue?

Thanks for your attention to this matter.

Qinyu

The text was updated successfully, but these errors were encountered:

RylanSchaeffer · 2024-09-05T22:41:15Z

We had engineering problems with Phi 3 and decided not to spend the effort fixing it. I don't believe we included it in the paper (although please double check). The other models should work though.

Qinyu-Allen-Zhao · 2024-10-14T07:16:49Z

Thank you for your response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phi-3-Based VLMs Not Usable Possibly Due to Incorrect Model Configuration #25

Phi-3-Based VLMs Not Usable Possibly Due to Incorrect Model Configuration #25

Qinyu-Allen-Zhao commented Aug 15, 2024

RylanSchaeffer commented Sep 5, 2024

Qinyu-Allen-Zhao commented Oct 14, 2024

Phi-3-Based VLMs Not Usable Possibly Due to Incorrect Model Configuration #25

Phi-3-Based VLMs Not Usable Possibly Due to Incorrect Model Configuration #25

Comments

Qinyu-Allen-Zhao commented Aug 15, 2024

RylanSchaeffer commented Sep 5, 2024

Qinyu-Allen-Zhao commented Oct 14, 2024