Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor) #145

Closed
jeroenvermunt opened this issue Dec 6, 2022 · 2 comments
Labels

Comments

@jeroenvermunt
Copy link

I am running my spider as follows:

def run_spider():
    '''see https://stackoverflow.com/questions/41495052/scrapy-reactor-not-restartable'''
    
    def f(q):
        try:
            runner = crawler.CrawlerRunner(get_project_settings())
            deferred = runner.crawl()
            deferred.addBoth(lambda _: reactor.stop())
            reactor.run()
            q.put(None)
        except Exception as e:
            q.put(e)
            
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    result = q.get()
    p.join()

    if result is not None:
        raise result

This way I can run multiple scrapers with running into the error mentioned in the stackoverflow post. However, I now get the error scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)

this is my settings file:

# playwright
DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

# TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
# TWISTED_REACTOR = "twisted.internet.epollreactor.EPollReactor"

I tried all versions of commenting/uncommenting the twisted reactor, but these all yield the same results.

@jeroenvermunt
Copy link
Author

I tried a similar solution as mentioned in #131 by using

    if sys.modules.get("twisted.internet.reactor", False):
        del sys.modules["twisted.internet.reactor"]
    
    scrapy.utils.reactor.install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')

This simply does not work:

RuntimeError: There is no current event loop in thread 'MainThread'.

@Gallaecio
Copy link
Contributor

Instead of deleting "twisted.internet.reactor" from sys.modules, you could try not importing reactor at the module level in the first place, and instead import it where you are actually using it, after you call install_reactor.

@elacuesta elacuesta closed this as not planned Won't fix, can't repro, duplicate, stale Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants