Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/revise update flow #191

Open
ajparsons opened this issue Oct 7, 2024 · 0 comments
Open

Fix/revise update flow #191

ajparsons opened this issue Oct 7, 2024 · 0 comments

Comments

@ajparsons
Copy link
Contributor

Currently the data update system is a bit tangled.

I'd like to get it to a place where:

  • There is a clear source of truth for most up to date data.
  • Reduce technical complexity of party update etc.
  • More integrity checks on manual and automatic updates.

The source of truth is because I'm basing other flows (politican-data repo and twfy-votes database) off the github people.json. This could be switched to the one hosted by twfy, but in principle they should be the same.

Here's where I think we are at the moment:

---
title: Politician data flow (current)
---
flowchart TD


github["GitHub Parlparse"]
mysoc["git.mysociety Parlparse"]
twfy["twfy-live Parlparse"]
automatic(("Auto updates"))


github -->|Github Action| mysoc
mysoc -.->|Broken mirror| github
twfy -.->|No auto push| mysoc
mysoc --> twfy

automatic --> twfy

linkStyle 1,2 stroke:red,color:red;
Loading

As as result - you get updates pooling in the last stage that don't make their way back to GitHub.

This can be declogged by pushing back to github, as the mirroring action sorts out the rest. So should try and do this regularly for the moment.

My instinct (just because I'm generally using GitHub Actions to handle dataset updates) would be to move the automatic updates (as they now rely on external APIs rather than the transcripts) to GitHub Actions rather than needing a flow to and from the server (this doesn't need to be consistent but new update scripts might start there).

Just automating the pushing would fix the main current problem - but a github based flow would mean we could add some automatic testing - and prevent errors creeping back into the people.json.

So something like:

---
title: Politician data flow (GitHub centric)
---
flowchart TD


github["GitHub Parlparse"]
mysoc["git.mysociety Parlparse"]
twfy["twfy-live Parlparse"]
automatic(("Auto updates"))
manual(("Manual updates"))
validator{"Validation test action"}

manual --> PR --> validator --> github
automatic -->|"GithHub Action"| validator

github -->|GitHub Action| mysoc
mysoc -->|Auto pull at start of updates| twfy
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant