Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

medusa verify failing because files do not exist #541

Closed
StevenLacerda opened this issue Nov 18, 2022 · 3 comments
Closed

medusa verify failing because files do not exist #541

StevenLacerda opened this issue Nov 18, 2022 · 3 comments
Labels
done Issues in the state 'done' duplicate This issue or pull request already exists

Comments

@StevenLacerda
Copy link

StevenLacerda commented Nov 18, 2022

Project board link

First, this is happening with several nodes. For some reason, some nodes are good, but others keep failing. It's not just one backup either, it's any backup that gets created. We'll just use .108 to simplify though.

We have a manifest that shows the data exists on .108:

        "columnfamily": "ts_week-8adc4610255d11eb9bdac391857ab3e2",
        "keyspace": "srm",
        "objects":
        [
            {
                "MD5": "e8e91c409b3f2a019064dc735912cc4d",
                "path": "10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Statistics.db",
                "size": 12496
            },
            {
                "MD5": "09d3a0f19828711f0da12f3ed3dafd33",
                "path": "10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Index.db",
                "size": 9993803
            },
            {
                "MD5": "14365df1e03f853c037fd69ccd845585",
                "path": "10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Summary.db",
                "size": 83511
            },
            {
                "MD5": "fdccebf89caa8dcaee8cbb7bd4349153",
                "path": "10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-CompressionInfo.db",
                "size": 46287
            },
            {
                "MD5": "23ea83afa40106382267d98661852f96-13",
                "path": "10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Data.db",
                "size": 106613208
            },

However, when we look at the bucket in s3, it does not exist:

first

second

It's an older sstable file, so is there something restricting based on time? The sstable no longer exists on disk, that's verified but it should exist in the backup.

When we do the verification, it's failing with:

  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Statistics.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Index.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Summary.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-CompressionInfo.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Data.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Digest.crc32] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-Filter.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-175-big-TOC.txt] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-177-big-Statistics.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-177-big-Index.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-177-big-Summary.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-177-big-CompressionInfo.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-177-big-Data.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-177-big-Digest.crc32] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-177-big-Filter.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-177-big-TOC.txt] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-173-big-Statistics.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-173-big-Index.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-173-big-Summary.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-173-big-CompressionInfo.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-173-big-Data.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-173-big-Digest.crc32] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-173-big-Filter.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-173-big-TOC.txt] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-171-big-Statistics.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-171-big-Index.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-171-big-Summary.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-171-big-CompressionInfo.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-171-big-Data.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-171-big-Digest.crc32] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-171-big-Filter.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-171-big-TOC.txt] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-72-big-Statistics.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-72-big-Index.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-72-big-Summary.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-72-big-CompressionInfo.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-72-big-Data.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-72-big-Digest.crc32] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-72-big-Filter.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-72-big-TOC.txt] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-179-big-Statistics.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-179-big-Index.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-179-big-Summary.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-179-big-CompressionInfo.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-179-big-Data.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-179-big-Digest.crc32] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-179-big-Filter.db] Doesn't exists
  - [10.186.59.108/data/srm/ts_week-8adc4610255d11eb9bdac391857ab3e2/nb-179-big-TOC.txt] Doesn't exists

There are more sstable files failing, but you get the gist.

Medusa.ini attached as medusa.txt.
Manifest.json attached as manifest.txt.
Medusa.log attached.

medusa.txt
manifest.txt
medusa-verify-stg1-iblock001-001.log

@adejanovski adejanovski added new Issues requiring triage and removed new Issues requiring triage labels Nov 18, 2022
@adejanovski
Copy link
Contributor

I really don't understand how these files could be missing, unless another process deleted them or the upload never really happened in the first place 🤯
In order to check if it's the latter, what would be needed is a verify right after the backup which would detect the issue, and the backup logs so that we can see if the file was reported as uploaded.
One of the problems we have with how differential uploads are made, is that we rely on the manifests to decide whether or not we need to upload a file. But manifests aren't updated after their creation, so they may not reflect what's really present in the data folder. I need to think of a way we can instead rely on the data folder content, without killing performance.

@adejanovski adejanovski added the ready Issues in the state 'ready' label Dec 13, 2022
@adejanovski adejanovski added assess Issues in the state 'assess' and removed ready Issues in the state 'ready' labels Aug 16, 2023
@rzvoncek
Copy link
Contributor

rzvoncek commented Apr 4, 2024

We've shipped #716, so @adejanovski's suggestion basically got implemented.

I'd propose closing this ticket because it reads like a duplicate of what we already did.

@rzvoncek rzvoncek added to-groom and removed assess Issues in the state 'assess' labels Apr 4, 2024
@adejanovski adejanovski added assess Issues in the state 'assess' and removed to-groom labels Apr 4, 2024
@rzvoncek rzvoncek added to-groom and removed assess Issues in the state 'assess' labels Apr 4, 2024
@StevenLacerda
Copy link
Author

This is very old, so I'm okay with closing it if you believe it's a duplicate and fixed.

@rzvoncek rzvoncek added duplicate This issue or pull request already exists and removed to-groom labels Apr 4, 2024
@rzvoncek rzvoncek closed this as completed Apr 4, 2024
@adejanovski adejanovski added the done Issues in the state 'done' label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
done Issues in the state 'done' duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants