Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Three questions #55

Open
zulihit opened this issue May 26, 2022 · 2 comments
Open

Three questions #55

zulihit opened this issue May 26, 2022 · 2 comments

Comments

@zulihit
Copy link

zulihit commented May 26, 2022

Thank you for your work and I have three questions:

  1. Why do you use this method to calculate the initialization range? I didn't see the relevant introduction in your paper. What's the purpose of this method?

self.embedding_range = nn.Parameter(
torch.Tensor([(self.gamma.item() + self.epsilon) / hidden_dim]),
requires_grad=False
)

self.entity_embedding = nn.Parameter(torch.zeros(nentity, self.entity_dim))
nn.init.uniform_(
tensor=self.entity_embedding,
a=-self.embedding_range.item(),
b=self.embedding_range.item()
)

  1. This range is also used when pluralizing relationships. Why can this be done?

phase_relation = relation/(self.embedding_range.item()/pi)
re_relation = torch.cos(phase_relation)
im_relation = torch.sin(phase_relation)

  1. In the rotate model, the calculations of head batch and tail batch are different in sign, but in the paper i can't find the head-batch part, i can't understand this part

if mode == 'head-batch':
re_score = re_relation * re_tail + im_relation * im_tail
im_score = re_relation * im_tail - im_relation * re_tail
re_score = re_score - re_head
im_score = im_score - im_head
else:
re_score = re_head * re_relation - im_head * im_relation
im_score = re_head * im_relation + im_head * re_relation
re_score = re_score - re_tail
im_score = im_score - im_tail

@albernar
Copy link

I hope this can be of help for anybody who struggled as I did understanding point 2 (and, as a consequence, point 1, I guess): the reason why the values of the embeddings are projected in [-pi, pi] is that, if we initialize the weights in a uniform way as done with Xavier initialization, for example, the range of values assigned to the relation embeddings would be very close to zero. According to some experiments I ran, the model, in this case, tends to learn rotations with angles very close to zero, thus making triples like (head, relation, head) be extremely plausible: indeed, the rotation would be almost null, so that
$$h \circ r \approx h$$.
This would basically force the MRR and H@1 to collapse to zero, while leaving H@3, H@10 and MR good.

Instead, if we project the values of the relation embeddings in the range $[-\pi, \pi]$ (by using phase_relation = relation/(self.embedding_range.item()/pi)), the rotations would not all be almost null, but there would be more variability so that we could get better representations and hence better results.

In light of this, I believe the initialization of the relations as in point 1 of the above question is just a convenient way for having a uniform initialization (as for Xavier), but with more straight forward extremes.

@fanglin1
Copy link

fanglin1 commented Sep 3, 2024

For Question 3, after researching and comparing the paper, I find a solution that may explain it. The original concept is based on complex numbers, specifically:

$e^{i\theta} = \cos \theta + i \sin \theta$

Thus, in the code, the entity and relation embeddings are split into real and imaginary parts. Therefore, an entity and relation can be written as:

$h = a + bi$

$r = c + di$

The rotation operation on the entity can then be written as:

$h \times r = (a + bi) \times (c + di) = ac + adi + bci + bdi^2 = ac - bd + (ad + bc)i$

This results in the code corresponding to Question 3. I hope this helps with understanding this part, and please feel free to correct me if there are any mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants