Suggestions to ingest data automatically #69

jaimergp · 2020-04-20T17:16:37Z

@apayne97, @henriberger and I have been talking about solutions to incorporate information from the Thorne Lab in a more automated way. We have come with this "ideal" pipeline:

Tier 1) Create a script that can diff their PDB IDs with our PDB IDs. Report the set difference for a human to review which new ones are worth adding.

Tier 2) Create a GitHub Actions pipeline that does this automatically either with an hourly cronjob or, if technically possible, after every push to the Thorne Lab repo

Tier 3) Add bot features to GHA to submit the PRs needed for each new candidate PDB ID. A human reviews it, editing the information as needed, and merges or rejects it. The closed PRs serve as a history on what we have tried so we don't resubmit twice.

Let us know if you have feedback!

Lnaden · 2020-04-20T17:40:20Z

I like this idea. The first one would not be too hard to do. The second one I would want to be careful about due to the possibility of pinging people watching this repo every time it makes a PR, but could be done relatively easily. Same concern with the 3rd, but I don't think I see the difference between 2 and 3, could you elaborate?

jaimergp · 2020-04-21T07:02:12Z

Option (2) only notifies a selected pool of users, say by writing a comment on a specific issue.

Option (3) would create the adequate PRs (one per PDB id?), with an automatically generated file template filled by the new information upstream.

About the notification noise... I guess we can have a fork of this repo somewhere else where those branches are created and then it's up to the human(s) to create the PR or not? I am not really sure if I like that though... I am inclined to say I am not.

I don't know if there are API ways to selectively notify only some people, but if you are subscribed to this repo, you'll get everything anyway.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions to ingest data automatically #69

Suggestions to ingest data automatically #69

jaimergp commented Apr 20, 2020

Lnaden commented Apr 20, 2020

jaimergp commented Apr 21, 2020

Suggestions to ingest data automatically #69

Suggestions to ingest data automatically #69

Comments

jaimergp commented Apr 20, 2020

Lnaden commented Apr 20, 2020

jaimergp commented Apr 21, 2020