-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: refactor FlushRepoTask #246
Conversation
These changes refactor the FlushRepoTask for 2 reasons: 1. Progress reports through the task. logging in the task was virtually inexistent short of the "task called" one but there was no indication on the progress of the task 2. Performance metrics through sentry_sdk.trace this will allow us to understand where the bottlenecks of this task are and provide better insight for improving it
Codecov ReportAttention: @@ Coverage Diff @@
## main #246 +/- ##
==========================================
- Coverage 98.10% 98.09% -0.02%
==========================================
Files 375 375
Lines 30776 30835 +59
==========================================
+ Hits 30193 30247 +54
- Misses 583 588 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #246 +/- ##
==========================================
- Coverage 98.10% 98.09% -0.02%
==========================================
Files 375 375
Lines 30776 30835 +59
==========================================
+ Hits 30193 30247 +54
- Misses 583 588 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov Report
@@ Coverage Diff @@
## main #246 +/- ##
==========================================
- Coverage 98.10% 98.09% -0.02%
==========================================
Files 375 375
Lines 30776 30835 +59
==========================================
+ Hits 30193 30247 +54
- Misses 583 588 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #246 +/- ##
==========================================
- Coverage 98.08% 98.07% -0.01%
==========================================
Files 406 406
Lines 31477 31536 +59
==========================================
+ Hits 30873 30929 +56
- Misses 604 607 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
This change has been scanned for critical changes. Learn more |
log.info("Deleting repo contents", extra=dict(repoid=repoid)) | ||
repo = db_session.query(Repository).filter_by(repoid=repoid).first() | ||
@sentry_sdk.trace | ||
def _delete_archive(self, repo: Repository) -> int: | ||
archive_service = ArchiveService(repo) | ||
deleted_archives = archive_service.delete_repo_files() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this fn be async? I suppose we don't necessarily care to wait for the files to be deleted, but we don't know for certain if the call succeeded otherwise right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of the ArchiveService
(and underlying storage systems) is not async. Even if I wanted to make it async refactoring all that service is faaaar from the scope of this ticket.
I actually think we would benefit from making it async. Same goes for the DB interactions (there are certain queries we could parallelize there), but we are constrained by the interfaces we have at hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea agree, not part of this scope, just found it interesting that the archive service wasn't aysnc in the first place, makes sense
CommitReport.commit_id.in_(commit_ids) | ||
) | ||
@sentry_sdk.trace | ||
def _delete_reports(self, db_session: Session, report_ids, repoid: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ty for all these helper methods, a lot more organized + easy to read
|
||
self._delete_commit_details(db_session, commit_ids, repoid) | ||
|
||
# TODO: Component comparison |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we still leave the commit and component comparisons existing in the DB? I think since commit comparisons rely on commit ids existing, that the delete_commits fn below would fail wouldnt it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is true. It's not the only piece of data that we should be removing but aren't (LabelAnalysisRequestProcessingError
has the same issue).
But I wanted to be very intentional with these changes and limit them to a refactor.
This for a couple of reasons that include:
- Checking all the missing pieces of data, adding them in and adding tests for all of that consumes a lot more time.
- These changes are meant to give more information so we can make better decisions on how to improve the feature overall. We might come to the conclusion that we need to tear the whole thing apart and start from scratch. In this case the extra effort I'd have to put to add the missing details would be lost.
- Currently there's no "epic" or "plan" on how to tackle this flow. This needs to be discussed and prioritized. I personally think that this knowledge that some data is not being deleted when it should needs to factor in on the "how big this project is gonna be?"
So while you are absolutely right and we should (and will) fix it I hope we can ignore the fact that this is partially broken for now. It is one of the reasons I left the comment there.
Also this comment captures some of the data that we know is missing, in case you're worried it'll be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah big time, didn't mean to imply we needed to add those as yeah this should be contained to a refactor, was mostly trying to see if what I said made sense, still learning the ropes here 😅. And ty writing the things we're missing somewhere too 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments to take a peek at, hmu when it's ready for a 2nd review!
These changes refactor the FlushRepoTask for 2 reasons:
logging in the task was virtually inexistent short of the "task called" one
but there was no indication on the progress of the task
this will allow us to understand where the bottlenecks of this task are and
provide better insight for improving it
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.