Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to check that a proxy is really being used? #48

Open
ravillarreal opened this issue Aug 16, 2018 · 3 comments
Open

How to check that a proxy is really being used? #48

ravillarreal opened this issue Aug 16, 2018 · 3 comments

Comments

@ravillarreal
Copy link

ravillarreal commented Aug 16, 2018

In the process_request function the proxy is passed to the request only if has an proxy_user_pass, otherwise only print that the proxy is beign used and which are left. That means that a proxy like https://176.37.14.252:8080 does not work?

This is the function:

def process_request(self, request, spider):
     # Don't overwrite with a random one (server-side state for IP)
     if 'proxy' in request.meta:
         if request.meta["exception"] is False:
             return
     request.meta["exception"] = False
     if len(self.proxies) == 0:
         raise ValueError('All proxies are unusable, cannot proceed')

     if self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS:
         proxy_address = random.choice(list(self.proxies.keys()))
     else:
         proxy_address = self.chosen_proxy

     proxy_user_pass = self.proxies[proxy_address]

     if proxy_user_pass:
         request.meta['proxy'] = proxy_address
         basic_auth = 'Basic ' + base64.b64encode(proxy_user_pass.encode()).decode()
         request.headers['Proxy-Authorization'] = basic_auth
     else:
         log.debug('Proxy user pass not found')
     log.debug('Using proxy <%s>, %d proxies left' % (
             proxy_address, len(self.proxies)))
@ravillarreal ravillarreal changed the title How to check that a proxy is being used? How to check that a proxy is really being used? Aug 16, 2018
@schiz0phr3ne
Copy link

schiz0phr3ne commented Aug 25, 2018

I made a test with this middleware : without proxy_user_pass (I don't have one to test with), proxy is not used :

import scrapy

class MyipSpider(scrapy.Spider):
    name = 'myip'
    start_urls = ['http://www.mon-ip.com]

    def parse(self, response):
        for in in response.xpath('//*[@id="PageG"]'):
            yield {
                'ip': ip.xpath('p[3]/span[2]//text()').extract_first(),
            }

gives :
2018-08-28 15:17:10 [scrapy.proxies] DEBUG : Using proxy <https://pro.xy.add.ress:port>, x proxies left [...] 2018-08-28 15:17:10 [scrapy.core.scraper] DEBUG : Scraped from <200 http://www.mon-ip.com> {'ip': 'my.ip.add.ress'}

@schiz0phr3ne
Copy link

This change works : https://github.com/aivarsk/scrapy-proxies/pull/43/files

@BriungRi
Copy link

BriungRi commented Nov 9, 2019

bump on schizophrene's PR. I was able to use that change and verify that my requests were indeed using a proxy's IP and not my own local IP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants