Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entry_dedupe flip-flops between entries #340

Closed
lemon24 opened this issue Jun 16, 2024 · 3 comments · Fixed by #343
Closed

entry_dedupe flip-flops between entries #340

lemon24 opened this issue Jun 16, 2024 · 3 comments · Fixed by #343

Comments

@lemon24
Copy link
Owner

lemon24 commented Jun 16, 2024

entry_dedupe flip-flops between two entries on every update; this seems to be because the duplicates are in the feed itself.

2024-06-16T14:00

('https://qntm.org/rss.php', 'https://qntm.org/630') (title: 'HATETRIS') duplicates: ['https://qntm.org/597']
set_entry_read(('https://qntm.org/rss.php', 'https://qntm.org/630'), True, None)
set_tag(('https://qntm.org/rss.php', 'https://qntm.org/630'), '.readtime', {'seconds': 42})
set_entry_recent_sort(('https://qntm.org/rss.php', 'https://qntm.org/630'), datetime.datetime(2021, 6, 14, 13, 40, 37, tzinfo=datetime.timezone.utc))
delete_entries([('https://qntm.org/rss.php', 'https://qntm.org/597')])
...
0:00:26.096246  115/170 https://qntm.org/rss.php        2 new, 0 modified, 30 total

2024-06-16T14:01

('https://qntm.org/rss.php', 'https://qntm.org/597') (title: 'HATETRIS') duplicates: ['https://qntm.org/630']
set_entry_read(('https://qntm.org/rss.php', 'https://qntm.org/597'), True, None)
set_tag(('https://qntm.org/rss.php', 'https://qntm.org/597'), '.readtime', {'seconds': 42})
set_entry_recent_sort(('https://qntm.org/rss.php', 'https://qntm.org/597'), datetime.datetime(2021, 6, 14, 13, 40, 37, tzinfo=datetime.timezone.utc))
...
0:00:28.967629  115/170 https://qntm.org/rss.php        2 new, 0 modified, 30 total

2024-06-16T15:00

('https://qntm.org/rss.php', 'https://qntm.org/630') (title: 'HATETRIS') duplicates: ['https://qntm.org/597']
set_entry_read(('https://qntm.org/rss.php', 'https://qntm.org/630'), True, None)
set_tag(('https://qntm.org/rss.php', 'https://qntm.org/630'), '.readtime', {'seconds': 42})
set_entry_recent_sort(('https://qntm.org/rss.php', 'https://qntm.org/630'), datetime.datetime(2021, 6, 14, 13, 40, 37, tzinfo=datetime.timezone.utc))
delete_entries([('https://qntm.org/rss.php', 'https://qntm.org/597')])
...
0:00:24.421819  115/170 https://qntm.org/rss.php        2 new, 0 modified, 30 total
@lemon24
Copy link
Owner Author

lemon24 commented Jun 22, 2024

Related: #292 / c7e516f.

@lemon24
Copy link
Owner Author

lemon24 commented Jun 23, 2024

Hmm... it seems recent sort was not set correctly (new entry shows way up in the feed); need to look more into this.

Update: also, it doesn't stop the flip flopping.

Update #2: it did stop the flip flopping, but update still adds one issue every time (the one the plugin deletes right away); this is expected / unavoidable without some kind of tombstone (#96 (comment)).

@lemon24
Copy link
Owner Author

lemon24 commented Jun 24, 2024

The issue fixed by fc80a49 (and in part, this whole issue), happened because dedupe happens at different points between "on-line" dedupe (after an entry is added/updated), and backfill (on demand, for groups of entries that are all the same). This could arguably be fixed by making "on-line" be more like backfill; related: #246 (comment).

Also related, in case we unify the pipelines:

# unlike _get_entry_groups, we cannot rely on e.last_updated,
# because for duplicates in the feed, we'd end up flip-flopping
# (on the first update, entry 1 is deleted and entry 2 remains;
# on the second update, entry 1 remains because it's new,
# and entry 2 is deleted because it's not modified,
# has lower last_updated, and no update hook runs for it; repeat).
#
# it would be more correct to sort by (is in new feed, last_retrieved),
# but as of 3.14, we don't know about existing but not modified entries
# (the hook isn't called), and entries don't have last_retrieved.

lemon24 added a commit that referenced this issue Jun 24, 2024
@lemon24 lemon24 closed this as completed Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant