Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve forbidden domain handling #13

Closed
wants to merge 2 commits into from

Conversation

Gallaecio
Copy link
Contributor

@Gallaecio Gallaecio commented Nov 8, 2023

I tested it manually with a raw spider defined on a https://github.com/zytedata/zyte-spider-templates-project project with the 2 setting changes needed:

from zyte_spider_templates import BaseSpider, EcommerceSpider

from pydantic import BaseModel
from scrapy import Request, Spider
from scrapy_spider_metadata import Args


class Params(BaseModel):
    pass


class TestSpider(EcommerceSpider, Args[Params]):
    name = "test"

    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = super(BaseSpider, cls).from_crawler(crawler, *args, **kwargs)
        return spider

    def start_requests(self):
        for url in (
            # Also tested adding an extra, non-forbidden URL.
            "https://forbidden.domain.example",
        ):
            yield Request(
                url=url,
                callback=self.parse_navigation,
            )

Should I also figure out a way to test it? It feels like writing the test will take longer than implementing the feature took, so I am not sure whether or not it is worth it. And if we do, should we aim to test both possible close paths (_maybe_close calls)? Would tests that create the 2 middlewares and “play” with them work, as opposed to running an actual spider on a mock local website?

@codecov-commenter
Copy link

codecov-commenter commented Nov 8, 2023

Codecov Report

Merging #13 (a6fc3db) into main (ca4dfb9) will decrease coverage by 4.64%.
The diff coverage is 35.89%.

❗ Current head a6fc3db differs from pull request most recent head e835bcc. Consider uploading reports for the commit e835bcc to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #13      +/-   ##
==========================================
- Coverage   98.56%   93.92%   -4.64%     
==========================================
  Files           9        9              
  Lines         488      527      +39     
==========================================
+ Hits          481      495      +14     
- Misses          7       32      +25     
Files Coverage Δ
zyte_spider_templates/middlewares.py 63.85% <35.89%> (-24.79%) ⬇️

self._send_signal(_start_requests_processed, count=count)


class ForbiddenDomainDownloaderMiddleware:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving this logic to scrapy-zyte-api?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants