Handling lost connection with Playwright process that make scrape hangs in error #331

milan-cp-dev · 2024-12-23T14:26:41Z

I have long scrapes with long sequences of actions needed to be taken during playwright scrape. I have handled most of the problems with scrape and established long runes. I am now facing issues with lost connection with the playwright process. We sure, can’t do much about process that died but please help me ensure that such a request ends up in errorback so we can properly handle it and continue scraping.

Minimum spider setup is described in the minimal_spider_setyp.txt

Route cause of error is:
/opt/scrapy_enviroment/lib/python3.11/site-packages/playwright/driver/playwright.sh: line 6: 2323044 Hangup "$PLAYWRIGHT_NODEJS_PATH" "$SCRIPT_PATH/package/lib/cli/cli.js" "$@"
/opt/scrapy_enviroment/lib/python3.11/site-packages/playwright/driver/playwright.sh: line 6: 2323042 Hangup "$PLAYWRIGHT_NODEJS_PATH" "$SCRIPT_PATH/package/lib/cli/cli.js" "$@"

To be able to raise awareness about that we have used ScrapyPlaywrightMemoryUsageExtension and we caught it as shown in example inital_error.txt

We have extended ScrapyPlaywrightMemoryUsageExtension to be able to try/catch such exceptions. We have attempted to raise some scrapy playwright known error to be able to route it back to errorback function that should handle remaining and proceed with scrape.

Can you please evaluate our CustomScrapyPlaywrightMemoryUsageExtension and advise if IgnoreRequest is a suitable exception and suggest what we can do moving forward? We are debugging the current solution as I am reporting this now.

minimal_spider_setyp.txt
inital_error.txt
custom_memusage_extension.txt

milan-cp-dev · 2024-12-23T21:04:16Z

It seams that above handles such problems and scrape can continue.

milan-cp-dev mentioned this issue Dec 24, 2024

[BUG] hang when node is crash or killed microsoft/playwright-python#1779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling lost connection with Playwright process that make scrape hangs in error #331

Handling lost connection with Playwright process that make scrape hangs in error #331

milan-cp-dev commented Dec 23, 2024

milan-cp-dev commented Dec 23, 2024

Handling lost connection with Playwright process that make scrape hangs in error #331

Handling lost connection with Playwright process that make scrape hangs in error #331

Comments

milan-cp-dev commented Dec 23, 2024

milan-cp-dev commented Dec 23, 2024