Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to stop indexing a repo #2923

Open
GregSutcliffe opened this issue Oct 8, 2024 · 6 comments
Open

Ability to stop indexing a repo #2923

GregSutcliffe opened this issue Oct 8, 2024 · 6 comments
Assignees
Labels
admin Administrative/housekeeping/community tasks devops Development Operations documentation Updates documentation feature-request Request for a new feature in Augur

Comments

@GregSutcliffe
Copy link
Collaborator

Is your feature request related to a problem? If so, please describe the problem:
I have a specific but changing list of repos I need to track data on, and a limited number of API keys. Accordingly, I would like to be able to stop indexing repos in some manner.

I've had a look through the docs, and I can't see anything in the CLI or UI that allows me to do this, or to delete the data for a repo that may no longer be relevant to me.

Potential solutions:
The obvious solution would be to add a command to the augur db since that already has commands for add-repos and for listing groups. A UI solution is nice but not essential, I think.

Additional context:
This is actually relevant even for a day-1 fresh install, as Augur starts up with a pre-seeded set of repos to index. While this makes sense to give it something to operate on, at some point in time the user is likely to want to stop indexing those example repos.

@sgoggins sgoggins moved this to Backlog in Augur TSC Oct 31, 2024
@sgoggins
Copy link
Member

sgoggins commented Nov 4, 2024

Hi @GregSutcliffe : This is logic for deleting repositories. Its a bit of a dangerous operation, so I haven't ever automated it before. A better strategy would be to permeate logic to be able to remove a repository from collection circulation, but there are in fact also reasons to delete entirely. One case is when a repository is added twice, which cannot occur anymore, but could occur with a confluence of events in the past (repo moved, but added at the new location before move logic that already exists is executed. This loophole is closed (we think)).

ALTER TABLE "augur_data"."pull_request_message_ref" 
  DROP CONSTRAINT "fk_pull_request_message_ref_message_1",
  ADD CONSTRAINT "fk_pull_request_message_ref_message_1" FOREIGN KEY ("msg_id") REFERENCES "augur_data"."message" ("msg_id") ON DELETE CASCADE ON UPDATE CASCADE DEFERRABLE INITIALLY DEFERRED;

select * from repo where repo_id in 
(
235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 


select * from augur_operations.collection_status where repo_id in 
(235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 


delete from issue_message_ref where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from pull_request_review_message_ref where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from pull_request_message_ref  where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

COMMIT; 

delete from repo_info where repo_id in (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from augur_operations.collection_status where repo_id in (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from augur_operations.user_repos where repo_id in 
(235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from issue_assignees where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from releases where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from pull_request_reviews where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from pull_request_files where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from pull_request_commits where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from pull_requests where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from repo_badging where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from issues  where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from repo_deps_libyear  where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from repo_deps_scorecard  where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from repo_dependencies where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from augur_operations.collection_status where repo_id in (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

commit; 
delete from commits where repo_id in  (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from repo_labor where repo_id in (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

select from pull_request_message_ref  where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from pull_request_message_ref cascade where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

COMMIT; 

ALTER TABLE "augur_data"."pull_request_review_message_ref" 
  DROP CONSTRAINT "fk_pull_request_review_message_ref_message_1",
  ADD CONSTRAINT "fk_pull_request_review_message_ref_message_1" FOREIGN KEY ("msg_id") REFERENCES "augur_data"."message" ("msg_id") ON DELETE CASCADE ON UPDATE CASCADE DEFERRABLE INITIALLY DEFERRED;

delete from message cascade where repo_id in (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697) ; 

commit;                                                            
																															
delete from pull_request_review_message_ref cascade where repo_id in 
 (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

delete from repo cascade where repo_id in (235219,196224,195980,196007,195987,196185,196178,196186,196099,196110,196111,196165,196101,196100,196102,196103,196104,196105,196107,196108,196109,196145,196147,194697); 

COMMIT; 

@sgoggins sgoggins added feature-request Request for a new feature in Augur documentation Updates documentation admin Administrative/housekeeping/community tasks devops Development Operations labels Nov 4, 2024
@GregSutcliffe
Copy link
Collaborator Author

@sgoggins 100% agree with the danger, and I agree with the idea of stopping the indexer as an intermediate step. Do you have notes for that as well? Or is it just removing it from the repo table?

@cdolfi
Copy link

cdolfi commented Nov 4, 2024

I am a fan of the idea of stopping the indexer (with some way to note that it is stopped). There might be a scenario where you need to pause repo(s) if you dont have enough keys to support the collection of them all

@GregSutcliffe
Copy link
Collaborator Author

Good point @cdolfi. That's probably more important than deletion anyway - disk is cheap for storing old data, but keys are limited.

@sgoggins would you agree? Happy to work on this

@sgoggins
Copy link
Member

sgoggins commented Nov 4, 2024

@GregSutcliffe : There are two dimensions for leaving the old repository there and simply not keeping it visible.

First, what we are mostly discussing, which is stopping collection for a repo if we no longer have interest. Second, do we want to presume that this "deleted repo" should also not be displayed in APIs, or other front end indicators? I presume yes on both, but am checking with our shared understanding.

For the display there are two considerations then as well:

  • "General Delete" would make the repo unavailable for everyone
  • "Delete from my list" would make it unavailable for logged you, as the logged in user who deleted it.

Most of the cases on a shared instance would be, in all likelihood, "Delete from my list". This is certainly the easiest use case to implement because it actually does not change collection behavior at all.

Of course, we also know from our discussions that we do want a user with elevated privileges of some kind to remove the repository from collection entirely, in order to preserve API key usage. Is this then a third case where we want to keep the repository in a "visible" state, but just not continue to collect on it?

Any of these conditions can, I think, be handled with "state bits" on the repository record itself augur_data.repo and, in these case of user scoped repos only, augur_operations.user_repos.

@GregSutcliffe
Copy link
Collaborator Author

@sgoggins so I think...

  • "delete for me" is indeed trivial, as you say.
  • "delete entirely" is messy
  • "stop collection" is the important one, due to limited API key availability.

So, let's focus on the latter - how would we do this? I assume removing it from the repo table would mean we can't show the old data, so do we need a new "collect? BOOL" state bit on the repo table?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
admin Administrative/housekeeping/community tasks devops Development Operations documentation Updates documentation feature-request Request for a new feature in Augur
Projects
Status: Backlog
Development

No branches or pull requests

3 participants