Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Erase Repo Contents" has a bug, or at least is not working immediately #1031

Closed
jerrodcodecov opened this issue Jan 9, 2024 · 8 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@jerrodcodecov
Copy link

jerrodcodecov commented Jan 9, 2024

Describe the bug
Through two customer reports and my own reproduction, it is clear that our "Erase Repo Contents" is either not working, or not working immediately.

codecov/feedback#207

Internal investigation: https://github.com/codecov/internal-issues/issues/131

Environment (please complete the following information):

  • Chrome

To Reproduce
Steps to reproduce the behavior:

  1. Go to a repo settings page

  2. Scroll to danger zone
    image

  3. Click "erase repo coverage content"

  4. Navigate back to the repo page

Expected behavior
Expected: Repo data (flags, historical coverage, commit list) is deleted
Actual behavior: Data is still rendering in frontend

Additional context

Bug: If the deletion process is underway and needs time, at minimum we need to make it clear that "deletion request is processing, please allow X minutes/hours"

Potential feature: At maximum, rework the underlying deletion process

Relevant roadmap issue: codecov/roadmap#38

@jerrodcodecov jerrodcodecov added the bug Something isn't working label Jan 9, 2024
@drazisil-codecov drazisil-codecov transferred this issue from codecov/feedback Jan 10, 2024
@eliatcodecov
Copy link

We will tackle the roadmap issue as a separate problem, since that is repo deletion. This isn't repo deletion, it is content erasure.

We should update the text by the "erase content" button to read: "For larger repositories, this process may not be able to complete automatically. In that case, please reach out to support for help"

In the meantime we will solve the backend issue to scale content erasure to repos of any size.

@katia-sentry
Copy link
Contributor

Ticket to update button's adjacent text -> #1032

@trent-codecov
Copy link
Contributor

@giovanni-guidini Look for quick improvements and low hanging fruit improvements. Time box to 2-4 hours. If we don't identify a quick fix lets tackle the larger fix after investigating the solution and prioritizing.

@drazisil-codecov
Copy link

One of the self-hosted customer is seeing the following traceback, which would be different than the issue we having on prod, due to this one being related to S3, when we use GCS

Traceback (most recent call last):
  File \\"/usr/local/lib/python3.10/site-packages/celery/app/trace.py\\", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
  File \\"/usr/local/lib/python3.10/site-packages/celery/app/trace.py\\", line 734, in __protected_call__
    return self.run(*args, **kwargs)
  File \\"/worker/tasks/base.py\\", line 232, in run
    return asyncio.run(
  File \\"/usr/local/lib/python3.10/asyncio/runners.py\\", line 44, in run
    return loop.run_until_complete(main)
  File \\"/usr/local/lib/python3.10/asyncio/base_events.py\\", line 649, in run_until_complete
    return future.result()
  File \\"/worker/tasks/flush_repo.py\\", line 40, in run_async
    deleted_archives = archive_service.delete_repo_files()
  File \\"/worker/services/archive.py\\", line 272, in delete_repo_files
    results = self.storage.delete_files(self.root, [obj[\\"name\\"] for obj in objects])
  File \\"/usr/local/lib/python3.10/site-packages/shared/storage/minio.py\\", line 243, in delete_files
    for del_err in self.minio_client.remove_objects(
  File \\"/usr/local/lib/python3.10/site-packages/minio/api.py\\", line 2001, in remove_objects
    result = self._delete_objects(
  File \\"/usr/local/lib/python3.10/site-packages/minio/api.py\\", line 1934, in _delete_objects
    response = self._execute(
  File \\"/usr/local/lib/python3.10/site-packages/minio/api.py\\", line 403, in _execute
    return self._url_open(
  File \\"/usr/local/lib/python3.10/site-packages/minio/api.py\\", line 386, in _url_open
    raise response_error
minio.error.MinioException: S3 operation failed; code: NotImplemented, message: A header or query you provided requested a function that is not implemented., resource: None, request_id: None, host_id: None

@giovanni-guidini
Copy link

giovanni-guidini commented Jan 30, 2024

I did some investigation on the current state of this feature.

It seems that the plumbing is all in place. I did try from prod deleting contents of a repo, and the task that does this was called, but never finished (idk why, logging is terrible).
I suspect for repos with some dozen of commits it might take too long to delete certain rows, but I didn't see any error messages implying it timed out.
OOM errors or some other sort of silent failure can't be discarded.

I took the liberty of refactoring the task to improve logging, add traces so we can check performance, and breaking it up by the different pieces we have to delete. That's captured in codecov/worker#246
I will try to add more test cases with more data and verify that we are deleting all the things we're supposed to delete as well.

That being said, testing locally with a repo that had some 4 commits the process did work.
Still the flow is strange and certainly can be improved. Some details that I found strange:

  • The repo isn't marked as "deleted" in the DB. Idk if that's for the repo being deleted with the provider?
  • Should it revert back to the case in which the repo is not setup? Cause it doesn't
  • When you press the button to actually erase contents there's a green banner, but the page is still the same. I'd expect it to change to something else
  • Certainly big repos will take quite some time to delete.

Things that should be deleted by are not

  • Raw uploads are not deleted from storage, just the reports
  • Repo upload tokens
  • LabelAnalysisRequestError (actually errors out)
  • ComponentComparison

I'd suggest we leverage the transactional email capibilities and chenge the UI to display a message informing the user the admins of the repo will receive an email when content deletion is complete. Optionally we can add some stats in the email, but that's another effort to be considered.

I'd also suggest forcing the user to type the repo name to be able to delete the repo, as github does. As opposed to pressing 2 buttons.

@giovanni-guidini
Copy link

re @drazisil-codecov

One of the self-hosted customer is seeing the following traceback, which would be different than the issue we having on prod, due to this one being related to S3, when we use GCS

I believe this is because we are trying to use this function that it's not implemented. Maybe it changed over version? Got deprecated and removed or something like that...

I'd open a specific ticket for this as part of this epic (it should be an epic if it's not)

@drazisil-codecov
Copy link

I believe this is because we are trying to use this function that it's not implemented. Maybe it changed over version? Got deprecated and removed or something like that...

The error is actually in a minio function by the same name. Codecov isn't wrong to call it, it's in the minio SDK.

@giovanni-guidini
Copy link

Marking this as closed.

It has been determined that the deletion process is triggered but the task is not operating as expected. And that some data is missing to be deleted.

Issue #1127 as been created as an epic to capture future work that will be done to fix these issues.
Please use that epic to add tasks that haven't yet been created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

6 participants