-
-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save all received messages to DB even if duplicate ID is used #1899
Comments
The Short Stanza-ID Problem (Unreliable History in DB)AbstractReports of unreliable message history in the database led to a diagnostic commit (#1893), which helped users to provide additional data on the issue. The data proved that the problem is in fact in the short stanza-ids produced by some non-compliant clients and have a form of short letter-number combination, such as "ab20a", "a13baa". Most of the related things were discussed and described at #1893, so I won't provide repetitive details here. Here I provide a potential solution as well as a discussion on the topic. DB StructureTable: ChatLogsNote ID, stanza_id (stanza-id), replace_id (replace-id)
Visualization
The CaseWe have 2 messages from the same (or different) senders at different points in time. Both of them are replaced, but both of them have the same stanza-ID. It might be a tough edge case, but that's the point of a good design: to withstand harsh conditions.
the end result of the visualization should be
Wrong Approaches
Proposed solutionTo address unreliable message history caused by non-unique stanza-ids, I propose introducation of a new column: So, for short, This way unnecessary complexity of other potential (and sometimes even working) solutions might be avoided, as we can preserve unique identifier to which we can rely upon. The
As you can see here, we can handle variety of cases with multiple replacements, non-unique stanza-ids, but maintain all the functionality (LMC). ConclusionThe problem provides a challenge, but with a minor DB redesign, it's possible to solve it in a sustainable and well-suited manner. |
Detailed report :) It might be that one of us is misunderstanding something. I don't see the value of Even if that would work then you wouldn't need replaces_db_id to have an unique idientifier you could just use the DB ID field, right? The abstract of XEP-0308 sais: So what I believe is the correct behaviour is to replace the newest message with the specified ID as mentioned in #1893 (comment) a week ago :) And as mentioned above (and in the 1893 PR) it would be interesting to check what Dino, Conversations, Gajim are doing. My guess is that they will do exactly that. |
The server should find the last record with that particular stanza-id and user in the DB, in the absence of which replacement considered incorrect (no message to replace) -- that's the association part. "Because you will have several messages in the database with the same ID", that would be different, because if I to correct the second part again, it would correct the second part only and vice versa, as we choose only the last correction message.
1st replacement: "Hlo" -> "Hello" (impossible to do by relying purely on stanza_id)
It does not work in the aforementioned case since we can't track which change affects which message.
See naive-2. The problem with that would be that old messages are going to stay uncorrected and also that correction might apply to a new messages (though today I get an idea of workaround for preventing replacement of new messages by checking that replacement message's db id > db id of original, but it won't prevent replacing of older message and would require even worse SQL request than we have now). So to check that the replacement is historically inbound, we have to select next message with the same id and make sure that the replacement is applied to the Nth "original message", which is too complicated for one SQL query, hard to debug, bad for performance.
Ok, I can check them and add it do my report, though I guess that no one has a little sophisticated, but powerful system that was proposed. |
But that is how XEP-0308 works. Last message correction. And I still didn't see how your I hope somebody else can comment here since we seem to don't understand each other on this topic. |
Additional overview (with code examples) of the potential solutionsRequirements for the solution
|
ID | stanza_id | replace_id | message |
---|---|---|---|
0 | aa | NULL | Hlo |
1 | xx | aa | Hello |
2 | aa | NULL | Hw re you |
3 | yy | aa | How are you |
4 | zz | aa | How are you? |
In this example if rely only upon those 2 we can either replace both messages, replace the last message with the last edit (though it's going to be hard to even make this SQL query not to mention performance of it will violate the first condition).
If we additionally would use ID for this purpose, we can potentially track some bounds in which correction can be applied (but we should priorly find all the "neighbouring" stanzas (rows with the same stanza_id) and then set them as boundaries).
What I offer is to add an additional column replaces_db_id
which would track to which exact message's (by DB ID) the change should be applied.
ID | stanza_id | replace_id | replaces_db_id | message |
---|---|---|---|---|
0 | aa | NULL | NULL | Hlo |
1 | xx | aa | 0 | Hello |
2 | aa | NULL | NULL | Hw re you |
3 | yy | aa | 2 | How are you |
4 | zz | aa | 2 | How are you? |
On insert we already do the following:
if (message->replace_id) {
auto_sqlite char* replace_check_query = sqlite3_mprintf("SELECT `from_jid` FROM `ChatLogs` WHERE `stanza_id` = '%q'",
message->replace_id ? message->replace_id : "");
to ensure integrity of the message's sender.
But what we can do instead is this:
if (message->replace_id) {
auto_sqlite char* original_message_query = sqlite3_mprintf("SELECT `id` FROM `ChatLogs` WHERE `stanza_id` = '%q' AND `from_jid` = '%q' AND `replace_db_id` = NULL ORDER BY `id` DESC",
message->replace_id ? message->replace_id : "",
from_jid->barejid ? from_jid->barejid : "");
This way we can always find the original message and at the same time check integrity of the message's sender.
I offer to do a mental exercise and go one by one in the example table above so you can see how the ID of the exactly previous (at the point of insertion) "original" message is selected (don't forget the condition in C before the query). You can skip the from_jid
condition as it's out of scope of the problem and is needed only for integrity check. As you can see, it will be kept exactly at the appropriate message at the time.
Visualization for the selection would be about the same, but we would use instead of replace_id
, replace_db_id
. Dixi.
Gajim's way
Gajim offers us a different view and approach on the problem. It's much more performant than other solutions and rely on different mechanisms.
It replaces the message in the DB, but leaves original message in the additional_data
column stored in JSON, hence it's possible to restore the original message, though it would require us to lose correction metadata (time, stanza_id) or include it in the JSON, which can grow to the questionable sizes, especially in the case of big messages (though it's not often the case). Anyway, on the most important part (data visualization) it's the most performant solution as it does not require any JOINs. The solution for me looks the most appealing due to its performance.
Off topic: The third correction's replace-id will be the second correction's message-id, not the first message-id.
On topic: You shouldn't need replace_db_id. You can select the right message by querying for the most recent message-id and sender-jid match. Reasoning: Querying for sender-jid will solve the case where there are two messages with the same message-id sent from different jids. As for the case where there are two messages with the same message-id sent from the same jid, replace the most recent one. my two cents. |
The problem arises from the fact that some senders send non-unique stanza-ids, which was described at the PR and here as well. The sender's integrity check is already present. Replacing the most recent one represents a challenge as well, since we need to find which one is the most recent one and for older ones, we should show them "as is", hence the SQL query gets less performant, not to mention that it will be less than perfect solution, especially in case if correction was important (e.g. there was a typo in IBAN), as we can't (and don't have to) enforce users not to use corrections for important changes and in the end it would violate corrected message storage integrity. |
Ah! So:
We display the message and save it to DB.
We save the message to DB. We look for the message with the id
We save the message to DB. We look for message with id So that's why you want to save So that would mean that we need a new field in the db and in the Will need to think a bit more about it and and consider Gajim solution. Did you also check other clients btw? |
@Ppjet6 do you remember what approach you are taking? |
It's rather to keep the message history coherent. We don't touch the original message, but the original message is different at different point in time. It's hard to calculate it dynamically. E.g. today I write to you my IBAN (stanza-id=aa) "122345", I make a mistake there, so I replace it with "12345", tomorrow I write you "Hell" (stanza-id=aa), I fix it with "Hello". But when we are visualizing data, how do we know that we should replace 122345 with "12345" and not "Hello" and that we should replace "Hell" with "Hello" and not "12345" if the message ID is the same? Especially it's problematic when we have multiple edits over the stanza that we meet multiple times. And I've shown that all the originals are getting replace either by last replacement or first replacement, which is wrong. Each message should have its own replacement, but to do that in SQL we either have to calculate every time which replacement applying to which message (it would require additional selects and let's directly forget about it) or use something reliable and unique to know which exact message should be replaced with what. Also we will use replacement dynamically and internally, to initially assign the replacement, we check the latest "original" message (yesterday - IBAN, today - Hell, tomorrow - anything else or maybe still Hell). But let's focus on the Gajim approach as I like it the most (due to performance gain). In the next few hours I am going to explain their approach, why is it good and how can we implement it (a few ways). |
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp` column as well. Performance boost is noticeable. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp` column as well. Performance boost is noticeable. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp` column as well. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Write messages with duplicate stanza IDs in the DB and allow LMC for them. Index `timestamp` column as well. Further details are available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Write messages with duplicate stanza IDs in the DB and allow LMC for them. Index `timestamp` column as well. Further details are available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Write messages with duplicate stanza IDs in the DB and allow LMC for them. Index `timestamp` column as well. Further details are available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Lol, we forgot to close it @jubalh |
This is to track a discussion that we had in the MUC and via PM.
Most of the things were already mentioned in
So I believe what needs to be done is:
It still would be valuable to find out what other clients do in this case and write it here on the issue.
Issue will most likely be worked on by @H3rnand3zzz (only can assign you after commented on this thread).
The text was updated successfully, but these errors were encountered: