Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local caching for crawled pages to enhance development efficiency #861

Open
matecsaj opened this issue Jan 5, 2025 · 1 comment
Open
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@matecsaj
Copy link
Contributor

matecsaj commented Jan 5, 2025

It would be a huge time saver during development if the website only needed to be crawled once. While iterating on my data extraction code, I wish I could load the pages from my local disk instead of the Internet.

During development, repeatedly crawling the website to test data extraction code is time-consuming and inefficient. A caching mechanism that stores previously downloaded pages on the local disk would be a huge time saver. Instead of fetching the pages from the Internet, the development environment could simply load them from local storage.

This approach would not only speed up the iterative development process but also reduce load on the website's server. Such a feature would be particularly useful for debugging and refining data extraction scripts.

@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 5, 2025
@vdusek
Copy link
Collaborator

vdusek commented Jan 6, 2025

Hi, this sounds reasonable, while it is not currently on our roadmap. We will revise it in the future.

@vdusek vdusek added the enhancement New feature or request. label Jan 6, 2025
@vdusek vdusek changed the title Is there an option to cache downloaded pages locally? Add local caching for crawled pages to enhance development efficiency Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants