refactor: refactor FlushRepoTask #246

giovanni-guidini · 2024-01-30T17:07:38Z

These changes refactor the FlushRepoTask for 2 reasons:

Progress reports through the task.
logging in the task was virtually inexistent short of the "task called" one
but there was no indication on the progress of the task
Performance metrics through sentry_sdk.trace
this will allow us to understand where the bottlenecks of this task are and
provide better insight for improving it

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

These changes refactor the FlushRepoTask for 2 reasons: 1. Progress reports through the task. logging in the task was virtually inexistent short of the "task called" one but there was no indication on the progress of the task 2. Performance metrics through sentry_sdk.trace this will allow us to understand where the bottlenecks of this task are and provide better insight for improving it

codecov-staging · 2024-01-30T17:12:15Z

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   98.10%   98.09%   -0.02%     
==========================================
  Files         375      375              
  Lines       30776    30835      +59     
==========================================
+ Hits        30193    30247      +54     
- Misses        583      588       +5

Flag	Coverage Δ
integration	`98.09% <96.20%> (-0.02%)`	⬇️
latest-uploader-overall	`98.09% <96.20%> (-0.02%)`	⬇️
unit	`98.09% <96.20%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`96.16% <95.94%> (-0.03%)`	⬇️
OutsideTasks	`97.92% <ø> (ø)`

Files	Coverage Δ
tasks/tests/unit/test_flush_repo.py	`100.00% <100.00%> (ø)`
tasks/flush_repo.py	`95.53% <95.94%> (-4.47%)`	⬇️

codecov-qa · 2024-01-30T17:12:29Z

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (2cc687b) 98.10% compared to head (09cf82a) 98.09%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   98.10%   98.09%   -0.02%     
==========================================
  Files         375      375              
  Lines       30776    30835      +59     
==========================================
+ Hits        30193    30247      +54     
- Misses        583      588       +5

Flag	Coverage Δ
integration	`98.09% <96.20%> (-0.02%)`	⬇️
latest-uploader-overall	`98.09% <96.20%> (-0.02%)`	⬇️
unit	`98.09% <96.20%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`96.16% <95.94%> (-0.03%)`	⬇️
OutsideTasks	`97.92% <ø> (ø)`

Files	Coverage Δ
tasks/tests/unit/test_flush_repo.py	`100.00% <100.00%> (ø)`
tasks/flush_repo.py	`95.53% <95.94%> (-4.47%)`	⬇️

codecov-public-qa · 2024-01-30T17:12:43Z

Codecov Report

Merging #246 (09cf82a) into main (2cc687b) will decrease coverage by 0.02%.
The diff coverage is 96.20%.

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   98.10%   98.09%   -0.02%     
==========================================
  Files         375      375              
  Lines       30776    30835      +59     
==========================================
+ Hits        30193    30247      +54     
- Misses        583      588       +5

Flag	Coverage Δ
integration	`98.09% <96.20%> (-0.02%)`	⬇️
latest-uploader-overall	`98.09% <96.20%> (-0.02%)`	⬇️
unit	`98.09% <96.20%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`96.16% <95.94%> (-0.03%)`	⬇️
OutsideTasks	`97.92% <ø> (ø)`

Files	Coverage Δ
tasks/tests/unit/test_flush_repo.py	`100.00% <100.00%> (ø)`
tasks/flush_repo.py	`95.53% <95.94%> (-4.47%)`	⬇️

codecov · 2024-01-30T17:13:59Z

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (2cc687b) 98.08% compared to head (09cf82a) 98.07%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
- Coverage   98.08%   98.07%   -0.01%     
==========================================
  Files         406      406              
  Lines       31477    31536      +59     
==========================================
+ Hits        30873    30929      +56     
- Misses        604      607       +3

Flag	Coverage Δ
integration	`98.09% <96.20%> (-0.02%)`	⬇️
latest-uploader-overall	`98.09% <96.20%> (-0.02%)`	⬇️
unit	`98.09% <96.20%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`96.09% <95.94%> (-0.01%)`	⬇️
OutsideTasks	`97.92% <ø> (ø)`

Files	Coverage Δ
tasks/tests/unit/test_flush_repo.py	`100.00% <100.00%> (ø)`
tasks/flush_repo.py	`97.32% <95.94%> (-2.68%)`	⬇️

This change has been scanned for critical changes. Learn more

adrian-codecov · 2024-01-31T19:27:02Z

tasks/flush_repo.py

-        log.info("Deleting repo contents", extra=dict(repoid=repoid))
-        repo = db_session.query(Repository).filter_by(repoid=repoid).first()
+    @sentry_sdk.trace
+    def _delete_archive(self, repo: Repository) -> int:
        archive_service = ArchiveService(repo)
        deleted_archives = archive_service.delete_repo_files()


Should this fn be async? I suppose we don't necessarily care to wait for the files to be deleted, but we don't know for certain if the call succeeded otherwise right?

The implementation of the ArchiveService (and underlying storage systems) is not async. Even if I wanted to make it async refactoring all that service is faaaar from the scope of this ticket.

I actually think we would benefit from making it async. Same goes for the DB interactions (there are certain queries we could parallelize there), but we are constrained by the interfaces we have at hand.

Yea agree, not part of this scope, just found it interesting that the archive service wasn't aysnc in the first place, makes sense

adrian-codecov · 2024-01-31T19:27:37Z

tasks/flush_repo.py

-            CommitReport.commit_id.in_(commit_ids)
-        )
+    @sentry_sdk.trace
+    def _delete_reports(self, db_session: Session, report_ids, repoid: int):


Ty for all these helper methods, a lot more organized + easy to read

adrian-codecov · 2024-01-31T19:34:01Z

tasks/flush_repo.py

+
+        self._delete_commit_details(db_session, commit_ids, repoid)
+
+        # TODO: Component comparison


Would we still leave the commit and component comparisons existing in the DB? I think since commit comparisons rely on commit ids existing, that the delete_commits fn below would fail wouldnt it?

Yes this is true. It's not the only piece of data that we should be removing but aren't (LabelAnalysisRequestProcessingError has the same issue).

But I wanted to be very intentional with these changes and limit them to a refactor.
This for a couple of reasons that include:

Checking all the missing pieces of data, adding them in and adding tests for all of that consumes a lot more time.

These changes are meant to give more information so we can make better decisions on how to improve the feature overall. We might come to the conclusion that we need to tear the whole thing apart and start from scratch. In this case the extra effort I'd have to put to add the missing details would be lost.

Currently there's no "epic" or "plan" on how to tackle this flow. This needs to be discussed and prioritized. I personally think that this knowledge that some data is not being deleted when it should needs to factor in on the "how big this project is gonna be?"

So while you are absolutely right and we should (and will) fix it I hope we can ignore the fact that this is partially broken for now. It is one of the reasons I left the comment there.
Also this comment captures some of the data that we know is missing, in case you're worried it'll be lost.

Yeah big time, didn't mean to imply we needed to add those as yeah this should be contained to a refactor, was mostly trying to see if what I said made sense, still learning the ropes here 😅. And ty writing the things we're missing somewhere too 👌

adrian-codecov

Some comments to take a peek at, hmu when it's ready for a 2nd review!

giovanni-guidini assigned matt-codecov Jan 30, 2024

giovanni-guidini mentioned this pull request Jan 30, 2024

"Erase Repo Contents" has a bug, or at least is not working immediately codecov/engineering-team#1031

Closed

adrian-codecov reviewed Jan 31, 2024

View reviewed changes

adrian-codecov approved these changes Jan 31, 2024

View reviewed changes

Merge branch 'main' into gio/refactor-flush-repo

09cf82a

giovanni-guidini merged commit 1cb285c into main Jan 31, 2024
19 of 31 checks passed

giovanni-guidini deleted the gio/refactor-flush-repo branch January 31, 2024 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: refactor FlushRepoTask #246

refactor: refactor FlushRepoTask #246

giovanni-guidini commented Jan 30, 2024

codecov-staging bot commented Jan 30, 2024 •

edited

Loading

codecov-qa bot commented Jan 30, 2024 •

edited

Loading

codecov-public-qa bot commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading

adrian-codecov Jan 31, 2024

giovanni-guidini Jan 31, 2024

adrian-codecov Jan 31, 2024

adrian-codecov Jan 31, 2024

adrian-codecov Jan 31, 2024

giovanni-guidini Jan 31, 2024

adrian-codecov Jan 31, 2024

adrian-codecov left a comment


		self._delete_commit_details(db_session, commit_ids, repoid)

		# TODO: Component comparison

refactor: refactor FlushRepoTask #246

refactor: refactor FlushRepoTask #246

Conversation

giovanni-guidini commented Jan 30, 2024

Legal Boilerplate

codecov-staging bot commented Jan 30, 2024 • edited Loading

Codecov Report

codecov-qa bot commented Jan 30, 2024 • edited Loading

Codecov Report

codecov-public-qa bot commented Jan 30, 2024 • edited Loading

Codecov Report

codecov bot commented Jan 30, 2024 • edited Loading

Codecov Report

adrian-codecov Jan 31, 2024

Choose a reason for hiding this comment

giovanni-guidini Jan 31, 2024

Choose a reason for hiding this comment

adrian-codecov Jan 31, 2024

Choose a reason for hiding this comment

adrian-codecov Jan 31, 2024

Choose a reason for hiding this comment

adrian-codecov Jan 31, 2024

Choose a reason for hiding this comment

giovanni-guidini Jan 31, 2024

Choose a reason for hiding this comment

adrian-codecov Jan 31, 2024

Choose a reason for hiding this comment

adrian-codecov left a comment

Choose a reason for hiding this comment

codecov-staging bot commented Jan 30, 2024 •

edited

Loading

codecov-qa bot commented Jan 30, 2024 •

edited

Loading

codecov-public-qa bot commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading