Replace .csv in usmart api #223

KarenJewell · 2023-02-18T15:31:40Z

Is your feature request related to a problem? Please describe.
Currently, the usmart api script (usmart.py) outputs to a .csv which is then ingested into merge_data.py. This should be converted to a .json output to keep consistent with the rest of the project.

Describe the solution you'd like

Change the output of usmart.py to be a .json file
Update merge_data.py to ingest the new .json file
format code with Black before PR
update docs (architecture.md and datapipeline.png)

Describe alternatives you've considered
None

Additional context
Original ticket triggering this change is #163

paul-bradbeer · 2023-10-15T16:43:30Z

@KarenJewell please assign

JackGilmore · 2023-10-15T17:59:46Z

@paul-bradbeer It's all yours! Don't forget to check out our docs at https://docs.opendata.scot and don't hesitate to give us a shout here or on Slack if you have any questions.

paul-bradbeer · 2023-10-27T07:02:25Z

@JackGilmore @KarenJewell Good docs - thanks. Also the walkthrough video crystallised things clearly. I've been preoccupied of late, but I will have more time in November. In the meantime, it strikes me that there are a number of these csv to json type conversions and so it seems likely the first one done will serve as a pattern for imitation to some degree - perhaps some common machinery will benefit. JSON will be slightly bulkier than CSV, but I don't think the data volume is sufficient for this to be a concern - let me know if you disagree. Standardising on JSON will also allow for easier representation of hierarchical structure if needed.

JackGilmore · 2023-10-29T13:52:04Z

@paul-bradbeer Good shout. I did create a write_json method for processor.py when I was writing the scraper for the Scottish Parliament portal (scottish_parliament.py) which is probably an OK start but it really just takes the list of dict objects I created and json.dumps them.

Ideally once I get time (or if someone gets there before me!) I'd like to declare some proper classes for the JSON schema we want to use with the aim of saving people from having to write their own all the time.

paul-bradbeer · 2023-11-04T08:56:43Z

@JackGilmore JSON schema good if they all look the same or are very similar; just checking that now.

JackGilmore · 2023-11-05T13:42:46Z

@paul-bradbeer Can't find your comment anymore but I've updated the Contributor Guide with better guidance for forking and creating a branch. When we wrote the docs we must've been viewing the pages as members of the OpenDataScotland GitHub org meaning we've got slightly more elevated permissions to write directly to the repo compared to contributors that will need to fork instead 🤦‍♂️
https://docs.opendata.scot/contribute/contribution-guide/

paul-bradbeer · 2023-11-05T18:38:21Z

@JackGilmore removed it once I figured out that what I was commenting on was out of date. Forking in the usual way is fine.

JackGilmore · 2023-11-11T21:35:13Z

@paul-bradbeer Kept going one some work with the common schema stuff and built some common classes for datasets and file resources in the branch I've got at the moment for building a Stagecoach scraper as part of #120

Can see more info here if you fancy a look:

Classes file in branch: https://github.com/OpenDataScotland/the_od_bods/tree/120_stagecoach_scraper/classes
Open PR for new scraper: Stagecoach scraper #258

paul-bradbeer · 2023-11-12T16:01:08Z

@JackGilmore Today I am submitting my pull request for the completed works, including a dataclass object as a sort of schema.

I can't see how this process works if issues are allocated to people and then others go off and work on the same thing. It seems to be a very inefficient way to do things resulting in duplication.

I will submit my pull request; do with it as you please but my spare time is nearly up.

The solution is a pattern for immitation with a Json and Csv decorator class - so folks can see which are json and which are csv
It allows a mix and match approach.

paul-bradbeer · 2023-11-12T16:26:07Z

@JackGilmore Draft pull request #260 created. See what you make of it.

paul-bradbeer · 2023-11-14T13:08:42Z

@JackGilmore the pull request doesn't seem to be linked. Are you able to link pull request 260 to this issue; I don't seem to be able to do that.

JackGilmore · 2023-11-25T16:17:54Z

@paul-bradbeer Thanks for your PR and apologies for the delay in getting back to this. The last couple of weeks have been a tad busy.

Please also accept my apologies if you thought I was duplicating your work. I had been prototyping a few things for the pipeline and hadn't realised that you were working on a common decorator class as part of your work. Nevertheless, your classes actually look a lot more organised and make better sense than what I was playing about with, so thank you! Your contribution is greatly appreciated and we're always looking for feedback on how we can improve the contribution process so I'll take this on board and see how we can update the way we coordinate issues and PRs.

I'm aiming to try to review this by the end of next week.

paul-bradbeer · 2023-11-26T19:49:10Z

@JackGilmore Thanks. I wasn't sure how much overlap there would be and I probably should have checked your PR to see but I was running out of spare time. Hopefully you can make some use of it.

KarenJewell added good first issue Good for newcomers back end labels Feb 18, 2023

KarenJewell mentioned this issue Feb 18, 2023

Replace csv storage formats #163

Closed

KarenJewell removed the back end label Sep 16, 2023

JackGilmore assigned paul-bradbeer Oct 15, 2023

JackGilmore linked a pull request Nov 25, 2023 that will close this issue

Issue223 #260

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace .csv in usmart api #223

Replace .csv in usmart api #223

KarenJewell commented Feb 18, 2023

paul-bradbeer commented Oct 15, 2023

JackGilmore commented Oct 15, 2023

paul-bradbeer commented Oct 27, 2023

JackGilmore commented Oct 29, 2023

paul-bradbeer commented Nov 4, 2023

JackGilmore commented Nov 5, 2023

paul-bradbeer commented Nov 5, 2023

JackGilmore commented Nov 11, 2023

paul-bradbeer commented Nov 12, 2023

paul-bradbeer commented Nov 12, 2023

paul-bradbeer commented Nov 14, 2023

JackGilmore commented Nov 25, 2023

paul-bradbeer commented Nov 26, 2023

Replace .csv in usmart api #223

Replace .csv in usmart api #223

Comments

KarenJewell commented Feb 18, 2023

paul-bradbeer commented Oct 15, 2023

JackGilmore commented Oct 15, 2023

paul-bradbeer commented Oct 27, 2023

JackGilmore commented Oct 29, 2023

paul-bradbeer commented Nov 4, 2023

JackGilmore commented Nov 5, 2023

paul-bradbeer commented Nov 5, 2023

JackGilmore commented Nov 11, 2023

paul-bradbeer commented Nov 12, 2023

paul-bradbeer commented Nov 12, 2023

paul-bradbeer commented Nov 14, 2023

JackGilmore commented Nov 25, 2023

paul-bradbeer commented Nov 26, 2023