Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection parameters not working #5

Open
drprabhakar opened this issue Oct 20, 2015 · 14 comments
Open

Connection parameters not working #5

drprabhakar opened this issue Oct 20, 2015 · 14 comments

Comments

@drprabhakar
Copy link

I have given the following in my scrapy settings.py file
RABBITMQ_CONNECTION_PARAMETERS = {'host': 'amqp://username:password@rabbitmqserver', 'port':5672}
But I am getting the following error:
raise exceptions.AMQPConnectionError(error)
pika.exceptions.AMQPConnectionError: [Errno 11003] getaddrinfo failed

How can I use rabbit MQ server with my credentials?

@rdcprojects
Copy link

I doubt these settings will work in this library. Try passing it pika.credentials.Credentials object. That is how it expects in connection.py

@drprabhakar
Copy link
Author

I am not sure how can I pass it through settings.py file.
Can you please assist how can I give that pika.credentials.Credentials object in settings.py file?

@drprabhakar
Copy link
Author

I have connected to my RabbitMQ using pika.credentials.Credentials object.
But I am receiving the following error
''' pika.exceptions.ChannelClosed: (404, "NOT_FOUND - no queue 'multidomain:requests' in vhost '/'") '''
Any suggestion for this?

@rdcprojects
Copy link

Can you create the queue manually and give it a try?

@drprabhakar
Copy link
Author

I have created a queue 'multidomain' manually in Rabbit MQ and tried, getting the same error.

Do you mean to create the queue from scrapy spider?

@rdcprojects
Copy link

The queue is "multidomain:requests".

@drprabhakar
Copy link
Author

I tried with queue name as "multidomain:requests" and getting below error in the path "\scrapy_rabbitmq\queue.py"
return response.message_count
exceptions.AttributeError: 'Method' object has no attribute 'message_count'

It seems that scheduler is not working as expected.

Is there any fix for this?

@rdcprojects
Copy link

Try my fork. I've fixed these issues.

@drprabhakar
Copy link
Author

I have worked with your fork (rdcprojects/scrapy-rabbitmq) and I run my scrapy spider. For testing my script, I have just crawled a field from a URL and print that.

I am getting the following error
" cPickle.BadPickleGet: 116"

Is there anything I have to do with my scrapy spider?

@rdcprojects
Copy link

Can you provide full traceback?

@drprabhakar
Copy link
Author

2015-10-21 15:21:50+0530 [multidomain] INFO: Spider opened
2015-10-21 15:21:50+0530 [multidomain] DEBUG: Resuming crawl (1 request
s scheduled)
2015-10-21 15:21:50+0530 [multidomain] INFO: Crawled 0 pages (at 0 page
s/min), scraped 0 items (at 0 items/min)
2015-10-21 15:21:50+0530 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6
023
2015-10-21 15:21:50+0530 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080

2015-10-21 15:21:50+0530 [-] Unhandled Error
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\scrapy\crawler.py", line 93, in st
art
self.start_reactor()
File "C:\Python27\lib\site-packages\scrapy\crawler.py", line 130, in s
tart_reactor
reactor.run(installSignalHandlers=False) # blocking call
File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 11
92, in run
self.mainLoop()
File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 12
01, in mainLoop
self.runUntilCurrent()
--- ---
File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 82
4, in runUntilCurrent
call.func(_call.args, *_call.kw)
File "C:\Python27\lib\site-packages\scrapy\utils\reactor.py", line 41,
in call
return self._func(_self._a, *_self._kw)
File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 107,
in _next_request
if not self._next_request_from_scheduler(spider):
File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 134,
in _next_request_from_scheduler
request = slot.scheduler.next_request()
File "C:\Python27\lib\site-packages\scrapy_rabbitmq\scheduler.py", lin
e 73, in next_request
request = self.queue.pop()
File "C:\Python27\lib\site-packages\scrapy_rabbitmq\queue.py", line 70
, in pop
return self._decode_request(body)
File "C:\Python27\lib\site-packages\scrapy_rabbitmq\queue.py", line 29
, in _decode_request
return request_from_dict(pickle.loads(encoded_request), self.spider)

    cPickle.BadPickleGet: 116

@rdcprojects
Copy link

I think we'll have to dig deeper into the library to make it work. You can get in touch with the folks at IRC channel if you want to continue working on the library. Hope this helps!!

@drprabhakar
Copy link
Author

Thanks for the information.
Please confirm that whether the URLs in RabbitMQ queue should be in specific format(i.e. The message in the should be like "http://www.domain.com/query" or ["http://www.domain.com/query"] or http://www.domain.com/query

Because I just want to confirm that there should not be any issues in RabbitMQ queue.

@rdcprojects
Copy link

You can check the scrapy documentation about how URLs are stored in requests queue. There's some encoding / serialization being used. I'm not completely sure about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants