Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekly snapshots not being retained #1094

Open
daveTheOldCoder opened this issue May 27, 2020 · 26 comments · May be fixed by #1819
Open

Weekly snapshots not being retained #1094

daveTheOldCoder opened this issue May 27, 2020 · 26 comments · May be fixed by #1819
Assignees
Labels
Bug Discussion decision or consensus needed Medium

Comments

@daveTheOldCoder
Copy link

daveTheOldCoder commented May 27, 2020

I'm using the default values for Smart Remove. Hourly and daily snapshots are kept or deleted as expected, but no weekly snapshots are kept.

This seems to be the relevant code in /usr/share/backintime/common/snapshots.py:

        #keep one per week for the last keep_one_per_week weeks
        if keep_one_per_week > 0:
            d = now - datetime.timedelta( days = now.weekday() + 1 )
            for i in range( 0, keep_one_per_week ):
                keep_snapshots = self._smart_remove_keep_first_( snapshots, keep_snapshots, d, d + datetime.timedelta(days=8) )
                d = d - datetime.timedelta(days=7)

I'm trying to figure it out, but I thought I'd ask first if there's a known issue.

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented May 31, 2020

I found a potential fix for this problem.

I added some "logger.debug" calls, and changed the cron table entry to create log files so that I could see what's happening.

The function "smart_remove_keep_first" keeps the newest snapshot in the specified date/time range, since the snapshots are sorted in descending order of date/time. I added a function "smart_remove_keep_last" that reverses the order of the snapshots and thus keeps the oldest snapshot.

In the code in the previous post, I replaced the call to "smart_remove_keep_first" with a call to "smart_remove_keep_last". So far it seems to be working. But it needs further testing and analysis.

If this is actually a bug, I don't understand why it hasn't been reported before. Can someone else review the code/algorithm and provide an opinion?

@buhtz

This comment was marked as outdated.

@daveTheOldCoder

This comment was marked as outdated.

@daveTheOldCoder

This comment was marked as outdated.

@daveTheOldCoder

This comment was marked as outdated.

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented Sep 25, 2022

This can be fixed with a one-line change in common/snapshots.py:

In def smartRemoveKeepFirst, change:
for sid in snapshots:
to:
for sid in reversed(snapshots):

But it would probably be "cleaner" to rename smartRemoveKeepFirst to smartRemoveKeepLast or simply smartRemove, and then change the calls to it.

@buhtz buhtz added this to the 1.3.5 / 1.4.0 milestone Mar 19, 2023
@buhtz

This comment was marked as off-topic.

@buhtz buhtz added Discussion decision or consensus needed Feedback needs user response, may be closed after timeout without a response labels Mar 20, 2024
@buhtz buhtz self-assigned this Mar 20, 2024
@daveTheOldCoder

This comment was marked as outdated.

@buhtz

This comment was marked as outdated.

@daveTheOldCoder
Copy link
Author

screenshot

@buhtz

This comment was marked as outdated.

@buhtz

This comment was marked as outdated.

@daveTheOldCoder

This comment was marked as outdated.

@emtiu

This comment was marked as outdated.

@emtiu emtiu self-assigned this Jul 17, 2024
@emtiu emtiu removed Feedback needs user response, may be closed after timeout without a response Close after cooling-off period labels Jul 28, 2024
@emtiu
Copy link
Member

emtiu commented Jul 28, 2024

I have created a testbed with the following shell code in a VM:
for N in {1..60}; do sudo date -s '+7 days'; echo date > backupfile; backintime backup; done

In extreme cases, it appears to work as expected: Keeping "one snapshot per week for the last 52 weeks" really does leave 1 years' worth of weekly snapshots. However, I'm not so sure about shorter timespans. Calculating dates is hard!

My current guess is that we may be dealing with an off-by-1 problem here, where a setting of "one snapshot per week for the last n weeks" only actually keeps one for the last n–1 weeks. This would be especially significant for the default configuration of:

  • "one snapshot per day for the last 7 days"
  • "one snapshot per week for the last 4 weeks"
  • "one snapshot per month for the last 24 months"

@buhtz
Copy link
Member

buhtz commented Jul 28, 2024

I once had a tool called faketime I used for things like this. Worked well.

@emtiu
Copy link
Member

emtiu commented Jul 28, 2024

I think I'm very close to cracking this case. In this function:

backintime/common/snapshots.py

Lines 1686 to 1696 in 22f468c

# keep one per week for the last keep_one_per_week weeks
if keep_one_per_week > 0:
d = now - datetime.timedelta(days=now.weekday() + 1)
for i in range(0, keep_one_per_week):
keep |= self.smartRemoveKeepFirst(
snapshots,
d,
d + datetime.timedelta(days=8),
keep_healthy=True)
d -= datetime.timedelta(days=7)

smartRemoveKeepFirst is called with a max_date of d + datetime.timedelta(days=8) — not days=7!

Therefore, in a routine that says "keep only the youngest snapshot of these 8 days to consider", it will also delete the youngest snapshot of the previous week — but only if it's caught by the last in a series of "keep one per week" calls.

In effect, of "one snapshot per week for the last n weeks", the n-th week has its youngest snapshot thrown away, even though it shouldn't.

I'll need to investigate if changing the call of smartRemoveKeepFirst to a max_date of d + datetime.timedelta(days=7) fixes the problem. There must have been some reason to set an 8 there.

P.S.: The official soundtrack for this bug is Only For The Week.

@buhtz
Copy link
Member

buhtz commented Jul 29, 2024

There are some unit tests in test_snapshot.py about that smart remove thing.
I touched some once long ago but then gave up. A full refactoring of that tests would be the best. But a first step would be to see if this use case is covered by that tests and add such a test if not.

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented Jul 29, 2024

It occurred to me that what makes the "smart remove" behavior confusing is that the snapshots are named using only the timestamp. The names don't contain "hourly", "daily", "weekly", etc.

I wonder if adding an index file that identifies daily, weekly, monthly and yearly snapshots would help.

Or maybe I'm wrong. :)

@emtiu
Copy link
Member

emtiu commented Jul 29, 2024

There are some unit tests in test_snapshot.py about that smart remove thing. I touched some once long ago but then gave up. A full refactoring of that tests would be the best. But a first step would be to see if this use case is covered by that tests and add such a test if not.

I'm confident that I can fix this bug, and to make sure I don't introduce any new ones, I'm doing some manual testing in a VM.

I'm totally unfamiliar with automatic testing, though. I'll take a look, but I'd rather not put this bugfix on the "waiting list" until I've cracked the automatic testing of Smart Remove, because that would probably take a very long time ;)

@emtiu
Copy link
Member

emtiu commented Jul 29, 2024

It occurred to me that what makes the "smart remove" behavior confusing is that the snapshots are named using only the timestamp. The names don't contain "hourly", "daily", "weekly", etc.

I wonder if adding an index file that identifies daily, weekly, monthly and yearly snapshots would help.

Or maybe I'm wrong. :)

I see what you mean, but BiT doesn't work like that. Snapshots are not created as "weekly" or "monthly" snapshots. In fact, they are only characterized by their date+time of saving (plus a random three-digit number, which is called the "tag"). Together, this makes a "Snapshot ID" (SID for short): 20241117-115358-562

The only time when a snapshot might be characterized as "weekly" or "monthly" is when the Smart Remove routine determines: "this snapshot should be kept, because it's the only one from that week/month, and it should be kept according to the removal rules".

Therefore, any single snapshot can start out as an "hourly" snapshot, then become a "weekly" one, and maybe even a "monthly" one later.

@daveTheOldCoder
Copy link
Author

I see what you mean, but BiT doesn't work like that.

I know that. To figure out the workaround I posted above, I added logging code in the smart-remove processing to log the "kept" and "removed" snapshots, so I could see what was happening.

@emtiu
Copy link
Member

emtiu commented Jul 29, 2024

I know that. To figure out the workaround I posted above, I added logging code in the smart-remove processing to log the "kept" and "removed" snapshots, so I could see what was happening.

I see, and I've also understood why your workaround is successful. It's hard to put into words ;)

What happens is that the code considers snapshots from 8 days (instead of 7) when determining which to keep for a specific week. But in a one-snapshot-per-week environment, 8 days will likely contain two snapshots. Since smartRemoveKeepFirst keeps only the newer one, the older one is lost.

This has no consequences as long as the previous week is also considered before deletion. But it fails for the last of the weeks to be considered (resulting in the n–1 situation I described above).

Your workaround reverses the list of snapshots, so in the "8-day-week" that is the last to be considered, the older of the two snapshots is kept, and your problem disappears.

However, it is my understanding that the problem disappears completely when days=8 is changed into days=7 :)

@daveTheOldCoder
Copy link
Author

After verifying that the bug was still present in version 1.4.3 (the latest version available for Ubuntu 24.04), I'm testing your fix (days=8 -> days=7) by making that one-line change in that version.

It will take a few more weeks. I want to verify that the monthly snapshots are retained too.

@buhtz
Copy link
Member

buhtz commented Oct 10, 2024

It will take a few more weeks. I want to verify that the monthly snapshots are retained too.

Cool! Real time testing is the best. 😄

@daveTheOldCoder
Copy link
Author

Ideally, I'd like to write a simulator for the code in snapshots.py that could do the testing in seconds.

But I'm too lazy and senile to accomplish that now. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Discussion decision or consensus needed Medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants