-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
take a screenshot in process_spider_exception #309
Comments
Interesting question. I've encountered two limitations while trying to make this work:
Full example: import asyncio
import logging
import scrapy
from playwright.async_api import Page
from scrapy import signals
from scrapy.crawler import Crawler
class HandleExceptionMiddleware:
@classmethod
def from_crawler(cls, crawler: Crawler):
return cls(crawler)
def __init__(self, crawler: Crawler) -> None:
crawler.signals.connect(self.spider_idle, signal=signals.spider_idle)
self.screenshot_taken = asyncio.Event()
def spider_idle(self, spider):
if not self.screenshot_taken.is_set():
raise scrapy.exceptions.DontCloseSpider()
def process_spider_exception(self, response, exception, spider):
logging.info("Caught exception: %s", exception.__class__)
page: Page = response.meta["playwright_page"]
asyncio.create_task(self.take_screenshot(page=page))
return []
async def take_screenshot(self, page: Page):
await page.screenshot(path="example_exception.png", full_page=True)
self.screenshot_taken.set()
await page.close()
class HandleExceptionSpider(scrapy.Spider):
name = "exception"
custom_settings = {
"SPIDER_MIDDLEWARES": {HandleExceptionMiddleware: 100},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
"DOWNLOAD_HANDLERS": {
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
},
}
def start_requests(self):
yield scrapy.Request(
url="https://example.org",
meta={"playwright": True, "playwright_include_page": True},
)
def parse(self, response, **kwargs):
logging.info("Received response for %s", response.url)
1 / 0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there a way to take a screenshot for a
process_spider_exception
error?I can't figure out how to access the page object in that middleware.
The text was updated successfully, but these errors were encountered: