scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor) #145

jeroenvermunt · 2022-12-06T23:36:09Z

I am running my spider as follows:

def run_spider():
    '''see https://stackoverflow.com/questions/41495052/scrapy-reactor-not-restartable'''
    
    def f(q):
        try:
            runner = crawler.CrawlerRunner(get_project_settings())
            deferred = runner.crawl()
            deferred.addBoth(lambda _: reactor.stop())
            reactor.run()
            q.put(None)
        except Exception as e:
            q.put(e)
            
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    result = q.get()
    p.join()

    if result is not None:
        raise result

This way I can run multiple scrapers with running into the error mentioned in the stackoverflow post. However, I now get the error scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)

this is my settings file:

# playwright
DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

# TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
# TWISTED_REACTOR = "twisted.internet.epollreactor.EPollReactor"

I tried all versions of commenting/uncommenting the twisted reactor, but these all yield the same results.

The text was updated successfully, but these errors were encountered:

jeroenvermunt · 2022-12-06T23:42:18Z

I tried a similar solution as mentioned in #131 by using

    if sys.modules.get("twisted.internet.reactor", False):
        del sys.modules["twisted.internet.reactor"]
    
    scrapy.utils.reactor.install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')

This simply does not work:

RuntimeError: There is no current event loop in thread 'MainThread'.

Gallaecio · 2022-12-07T07:47:40Z

Instead of deleting "twisted.internet.reactor" from sys.modules, you could try not importing reactor at the module level in the first place, and instead import it where you are actually using it, after you call install_reactor.

elacuesta closed this as not planned Won't fix, can't repro, duplicate, stale Jul 13, 2023

elacuesta added the Stale label Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor) #145

scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor) #145

jeroenvermunt commented Dec 6, 2022

jeroenvermunt commented Dec 6, 2022

Gallaecio commented Dec 7, 2022

scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor) #145

scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor) #145

Comments

jeroenvermunt commented Dec 6, 2022

jeroenvermunt commented Dec 6, 2022

Gallaecio commented Dec 7, 2022