-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifing false triplets rather than corruption #72
Comments
Currently it is not supported. But this is a good suggestion and we discussed and thought that it would be a good feature to add immediately. This is what we propose and plan to implement: For training:
type takes the values from the list ['default', 'external', 'mix']
For evaluation: Each triple in test set would be ranked against the corresponding corruptions present in the 'x_neg' OR it would be ranked against all the corruptions(i.e. current strategy) What is your suggestion? Would the above plan suffice for this feature? |
I don't understand why it is necessary to have at least the same number of negative examples compared to positive examples ? Can't we just have less negative ones ? As for the evaluation, I don't get the point of giving external false statements as input at all. The embedding model is not modified at this point right ? Is it to classify them as true or false ? Apart from these thoughts, it seems good to me, I am looking forward to this feature ! |
If I am correct, you have labelled triples (i.e. positive and negative triples) and you would like to train/evaluate this in a binary classification task (i.e. compute precision, recall, F1, accuracy, etc). We can convert the scores returned by Regarding the other point, Ampligraph currently follows the negatives generation protocol described in literature (corruptions based in the local closed-world assumption as described in Bordes2013, i.e. triples that are not present are not False but they are just unseen (they may be positive or negative). The same protocol requires generating at least The above discussion on negatives during training holds if we want to preserve the local closed world assumption on which our training loop relies on. For example: If you consider the pairwise loss function, that necessarily requires negatives that differ only in the object of the subject. Otherwise the intuition behind it falls apart. You process triple after triple in the training set, and for each of those triples you use negative(s) that only differ in either the subject or the object. And that is because you want to train a model to distinguish positives from negatives, so you need meaningful negatives at each step. Using negatives from an external list picked at random without any similarity to the currently processed triple would result in a poorly trained model, I believe. And this is why, for each positive, we want to make sure that there are enough external negatives that only differ in the subject or the object (i.e. this means complying to the LCWA, local closed world assumption) When generating negatives we rely on LCWA exactly because we want meaningful negatives. LCWA says that corrupting a triple "locally" (è.g. only on one side) guarantees to obtain a corruption which has better chances of being a negative. (There is a paper that kind of proves that) |
Yes exactly, thank you for the explanations ! |
Background and Context$
Hi, it seems that all the models can generate false triplets by inversing the subjects and objects of existing ones. However, I try to generate embedding from a graph where each of the triplets have a label 'True' or 'False'. So I would like to explicit the false triplets for the training rather than generate new ones. Is it possible in the current version ?
The text was updated successfully, but these errors were encountered: