Three questions #55

zulihit · 2022-05-26T07:38:12Z

Thank you for your work and I have three questions:

Why do you use this method to calculate the initialization range? I didn't see the relevant introduction in your paper. What's the purpose of this method?

self.embedding_range = nn.Parameter(
torch.Tensor([(self.gamma.item() + self.epsilon) / hidden_dim]),
requires_grad=False
)

self.entity_embedding = nn.Parameter(torch.zeros(nentity, self.entity_dim))
nn.init.uniform_(
tensor=self.entity_embedding,
a=-self.embedding_range.item(),
b=self.embedding_range.item()
)

This range is also used when pluralizing relationships. Why can this be done？

phase_relation = relation/(self.embedding_range.item()/pi)
re_relation = torch.cos(phase_relation)
im_relation = torch.sin(phase_relation)

In the rotate model, the calculations of head batch and tail batch are different in sign, but in the paper i can't find the head-batch part, i can't understand this part

if mode == 'head-batch':
re_score = re_relation * re_tail + im_relation * im_tail
im_score = re_relation * im_tail - im_relation * re_tail
re_score = re_score - re_head
im_score = im_score - im_head
else:
re_score = re_head * re_relation - im_head * im_relation
im_score = re_head * im_relation + im_head * re_relation
re_score = re_score - re_tail
im_score = im_score - im_tail

albernar · 2024-01-19T15:43:29Z

I hope this can be of help for anybody who struggled as I did understanding point 2 (and, as a consequence, point 1, I guess): the reason why the values of the embeddings are projected in [-pi, pi] is that, if we initialize the weights in a uniform way as done with Xavier initialization, for example, the range of values assigned to the relation embeddings would be very close to zero. According to some experiments I ran, the model, in this case, tends to learn rotations with angles very close to zero, thus making triples like (head, relation, head) be extremely plausible: indeed, the rotation would be almost null, so that
$$h \circ r \approx h$$.
This would basically force the MRR and H@1 to collapse to zero, while leaving H@3, H@10 and MR good.

Instead, if we project the values of the relation embeddings in the range $[-\pi, \pi]$ (by using phase_relation = relation/(self.embedding_range.item()/pi)), the rotations would not all be almost null, but there would be more variability so that we could get better representations and hence better results.

In light of this, I believe the initialization of the relations as in point 1 of the above question is just a convenient way for having a uniform initialization (as for Xavier), but with more straight forward extremes.

fanglin1 · 2024-09-03T08:19:05Z

For Question 3, after researching and comparing the paper, I find a solution that may explain it. The original concept is based on complex numbers, specifically:

$e^{i\theta} = \cos \theta + i \sin \theta$

Thus, in the code, the entity and relation embeddings are split into real and imaginary parts. Therefore, an entity and relation can be written as:

$h = a + bi$

$r = c + di$

The rotation operation on the entity can then be written as:

$h \times r = (a + bi) \times (c + di) = ac + adi + bci + bdi^2 = ac - bd + (ad + bc)i$

This results in the code corresponding to Question 3. I hope this helps with understanding this part, and please feel free to correct me if there are any mistakes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Three questions #55

Three questions #55

zulihit commented May 26, 2022

albernar commented Jan 19, 2024

fanglin1 commented Sep 3, 2024

Three questions #55

Three questions #55

Comments

zulihit commented May 26, 2022

albernar commented Jan 19, 2024

fanglin1 commented Sep 3, 2024