You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have long scrapes with long sequences of actions needed to be taken during playwright scrape. I have handled most of the problems with scrape and established long runes. I am now facing issues with lost connection with the playwright process. We sure, can’t do much about process that died but please help me ensure that such a request ends up in errorback so we can properly handle it and continue scraping.
Minimum spider setup is described in the minimal_spider_setyp.txt
Route cause of error is:
/opt/scrapy_enviroment/lib/python3.11/site-packages/playwright/driver/playwright.sh: line 6: 2323044 Hangup "$PLAYWRIGHT_NODEJS_PATH" "$SCRIPT_PATH/package/lib/cli/cli.js" "$@"
/opt/scrapy_enviroment/lib/python3.11/site-packages/playwright/driver/playwright.sh: line 6: 2323042 Hangup "$PLAYWRIGHT_NODEJS_PATH" "$SCRIPT_PATH/package/lib/cli/cli.js" "$@"
To be able to raise awareness about that we have used ScrapyPlaywrightMemoryUsageExtension and we caught it as shown in example inital_error.txt
We have extended ScrapyPlaywrightMemoryUsageExtension to be able to try/catch such exceptions. We have attempted to raise some scrapy playwright known error to be able to route it back to errorback function that should handle remaining and proceed with scrape.
Can you please evaluate our CustomScrapyPlaywrightMemoryUsageExtension and advise if IgnoreRequest is a suitable exception and suggest what we can do moving forward? We are debugging the current solution as I am reporting this now.
I have long scrapes with long sequences of actions needed to be taken during playwright scrape. I have handled most of the problems with scrape and established long runes. I am now facing issues with lost connection with the playwright process. We sure, can’t do much about process that died but please help me ensure that such a request ends up in errorback so we can properly handle it and continue scraping.
Minimum spider setup is described in the minimal_spider_setyp.txt
Route cause of error is:
/opt/scrapy_enviroment/lib/python3.11/site-packages/playwright/driver/playwright.sh: line 6: 2323044 Hangup "$PLAYWRIGHT_NODEJS_PATH" "$SCRIPT_PATH/package/lib/cli/cli.js" "$@"
/opt/scrapy_enviroment/lib/python3.11/site-packages/playwright/driver/playwright.sh: line 6: 2323042 Hangup "$PLAYWRIGHT_NODEJS_PATH" "$SCRIPT_PATH/package/lib/cli/cli.js" "$@"
To be able to raise awareness about that we have used ScrapyPlaywrightMemoryUsageExtension and we caught it as shown in example inital_error.txt
We have extended ScrapyPlaywrightMemoryUsageExtension to be able to try/catch such exceptions. We have attempted to raise some scrapy playwright known error to be able to route it back to errorback function that should handle remaining and proceed with scrape.
Can you please evaluate our CustomScrapyPlaywrightMemoryUsageExtension and advise if IgnoreRequest is a suitable exception and suggest what we can do moving forward? We are debugging the current solution as I am reporting this now.
minimal_spider_setyp.txt
inital_error.txt
custom_memusage_extension.txt
The text was updated successfully, but these errors were encountered: