Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pointless Requests #1

Open
Multivalence opened this issue Sep 10, 2021 · 1 comment
Open

Pointless Requests #1

Multivalence opened this issue Sep 10, 2021 · 1 comment

Comments

@Multivalence
Copy link

The way the nikel API currently gets data is from data dumps that is manually inputted by the maintainer occasionally. Because of this, It is faster to just take the data dump from here rather than sending HTTP requests to the API. Could an auto updating data system be implemented by using web scraping and some of UofT's public API's? This could make it so data is always up to date and would provide a reason to use the API rather than just getting data from this repo.

I understand that the UofT Websites and Public API's keep changing which would require frequent maintenance to the API. I'm happy to assist in the maintenance if this is the case.

Thank you!

@darenliang
Copy link
Member

Thanks for your concerns.

I agree its usually better and faster to use the data dump. But it's usually a matter of preference because if web extensions are using the dataset, it might be preferential to request data via a web API. The nikel-core web server does a good job fetching data from the datasets and employs a cache to speed up lookups if necessary. Cloudflare is also used to add a light edge cache for frequently requested data.

Regarding the auto updating data system, I would like for that to be the case. The datasets parsing is really messy and riddled with edge cases. I've tried updating the datasets from time to time, but after a while it often requires manually changing the parsing logic due to the changes on UofT's end.

Rewriting the parser might be the best option we have at the moment but that'll require lots of work (discovering new data sources / making the parsing more accurate). The current parser uses a combination of json requests, html parsing and selenium. I'm hoping that the we don't need to parse html pages and use selenium but that might not be possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants