Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: refactor FlushRepoTask #246

Merged
merged 2 commits into from
Jan 31, 2024
Merged

Conversation

giovanni-guidini
Copy link
Contributor

These changes refactor the FlushRepoTask for 2 reasons:

  1. Progress reports through the task.
    logging in the task was virtually inexistent short of the "task called" one
    but there was no indication on the progress of the task
  2. Performance metrics through sentry_sdk.trace
    this will allow us to understand where the bottlenecks of this task are and
    provide better insight for improving it

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

These changes refactor the FlushRepoTask for 2 reasons:

1. Progress reports through the task.
    logging in the task was virtually inexistent short of the "task called" one
    but there was no indication on the progress of the task
2. Performance metrics through sentry_sdk.trace
    this will allow us to understand where the bottlenecks of this task are and
    provide better insight for improving it
@codecov-staging
Copy link

codecov-staging bot commented Jan 30, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   98.10%   98.09%   -0.02%     
==========================================
  Files         375      375              
  Lines       30776    30835      +59     
==========================================
+ Hits        30193    30247      +54     
- Misses        583      588       +5     
Flag Coverage Δ
integration 98.09% <96.20%> (-0.02%) ⬇️
latest-uploader-overall 98.09% <96.20%> (-0.02%) ⬇️
unit 98.09% <96.20%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.16% <95.94%> (-0.03%) ⬇️
OutsideTasks 97.92% <ø> (ø)
Files Coverage Δ
tasks/tests/unit/test_flush_repo.py 100.00% <100.00%> (ø)
tasks/flush_repo.py 95.53% <95.94%> (-4.47%) ⬇️

@codecov-qa
Copy link

codecov-qa bot commented Jan 30, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (2cc687b) 98.10% compared to head (09cf82a) 98.09%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   98.10%   98.09%   -0.02%     
==========================================
  Files         375      375              
  Lines       30776    30835      +59     
==========================================
+ Hits        30193    30247      +54     
- Misses        583      588       +5     
Flag Coverage Δ
integration 98.09% <96.20%> (-0.02%) ⬇️
latest-uploader-overall 98.09% <96.20%> (-0.02%) ⬇️
unit 98.09% <96.20%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.16% <95.94%> (-0.03%) ⬇️
OutsideTasks 97.92% <ø> (ø)
Files Coverage Δ
tasks/tests/unit/test_flush_repo.py 100.00% <100.00%> (ø)
tasks/flush_repo.py 95.53% <95.94%> (-4.47%) ⬇️

Copy link

codecov-public-qa bot commented Jan 30, 2024

Codecov Report

Merging #246 (09cf82a) into main (2cc687b) will decrease coverage by 0.02%.
The diff coverage is 96.20%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   98.10%   98.09%   -0.02%     
==========================================
  Files         375      375              
  Lines       30776    30835      +59     
==========================================
+ Hits        30193    30247      +54     
- Misses        583      588       +5     
Flag Coverage Δ
integration 98.09% <96.20%> (-0.02%) ⬇️
latest-uploader-overall 98.09% <96.20%> (-0.02%) ⬇️
unit 98.09% <96.20%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.16% <95.94%> (-0.03%) ⬇️
OutsideTasks 97.92% <ø> (ø)
Files Coverage Δ
tasks/tests/unit/test_flush_repo.py 100.00% <100.00%> (ø)
tasks/flush_repo.py 95.53% <95.94%> (-4.47%) ⬇️

Copy link

codecov bot commented Jan 30, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (2cc687b) 98.08% compared to head (09cf82a) 98.07%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   98.08%   98.07%   -0.01%     
==========================================
  Files         406      406              
  Lines       31477    31536      +59     
==========================================
+ Hits        30873    30929      +56     
- Misses        604      607       +3     
Flag Coverage Δ
integration 98.09% <96.20%> (-0.02%) ⬇️
latest-uploader-overall 98.09% <96.20%> (-0.02%) ⬇️
unit 98.09% <96.20%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.09% <95.94%> (-0.01%) ⬇️
OutsideTasks 97.92% <ø> (ø)
Files Coverage Δ
tasks/tests/unit/test_flush_repo.py 100.00% <100.00%> (ø)
tasks/flush_repo.py 97.32% <95.94%> (-2.68%) ⬇️

This change has been scanned for critical changes. Learn more

log.info("Deleting repo contents", extra=dict(repoid=repoid))
repo = db_session.query(Repository).filter_by(repoid=repoid).first()
@sentry_sdk.trace
def _delete_archive(self, repo: Repository) -> int:
archive_service = ArchiveService(repo)
deleted_archives = archive_service.delete_repo_files()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this fn be async? I suppose we don't necessarily care to wait for the files to be deleted, but we don't know for certain if the call succeeded otherwise right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of the ArchiveService (and underlying storage systems) is not async. Even if I wanted to make it async refactoring all that service is faaaar from the scope of this ticket.

I actually think we would benefit from making it async. Same goes for the DB interactions (there are certain queries we could parallelize there), but we are constrained by the interfaces we have at hand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea agree, not part of this scope, just found it interesting that the archive service wasn't aysnc in the first place, makes sense

CommitReport.commit_id.in_(commit_ids)
)
@sentry_sdk.trace
def _delete_reports(self, db_session: Session, report_ids, repoid: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty for all these helper methods, a lot more organized + easy to read


self._delete_commit_details(db_session, commit_ids, repoid)

# TODO: Component comparison
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we still leave the commit and component comparisons existing in the DB? I think since commit comparisons rely on commit ids existing, that the delete_commits fn below would fail wouldnt it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is true. It's not the only piece of data that we should be removing but aren't (LabelAnalysisRequestProcessingError has the same issue).

But I wanted to be very intentional with these changes and limit them to a refactor.
This for a couple of reasons that include:

  1. Checking all the missing pieces of data, adding them in and adding tests for all of that consumes a lot more time.
  2. These changes are meant to give more information so we can make better decisions on how to improve the feature overall. We might come to the conclusion that we need to tear the whole thing apart and start from scratch. In this case the extra effort I'd have to put to add the missing details would be lost.
  3. Currently there's no "epic" or "plan" on how to tackle this flow. This needs to be discussed and prioritized. I personally think that this knowledge that some data is not being deleted when it should needs to factor in on the "how big this project is gonna be?"

So while you are absolutely right and we should (and will) fix it I hope we can ignore the fact that this is partially broken for now. It is one of the reasons I left the comment there.
Also this comment captures some of the data that we know is missing, in case you're worried it'll be lost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah big time, didn't mean to imply we needed to add those as yeah this should be contained to a refactor, was mostly trying to see if what I said made sense, still learning the ropes here 😅. And ty writing the things we're missing somewhere too 👌

Copy link
Contributor

@adrian-codecov adrian-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to take a peek at, hmu when it's ready for a 2nd review!

@giovanni-guidini giovanni-guidini merged commit 1cb285c into main Jan 31, 2024
19 of 31 checks passed
@giovanni-guidini giovanni-guidini deleted the gio/refactor-flush-repo branch January 31, 2024 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants