Crawling binary files #21

joshuambg · 2018-12-04T08:29:28Z

supercrawler is picking up ALL links on a page. If there are links to movie files, images, or any large files it will add these URLs to the queue. The urls get passed to request which tries to download them.

brendonboshell · 2018-12-04T09:00:25Z

I want the keep the ability to download binary files, but I know it could be problematic downloading large binary data. What behaviour do you expect here? Maybe a max file size, or an event handler that inspects the headers and can cancel a request?

…

On Tue, 4 Dec 2018, 08:29 joshua-mbg ***@***.*** wrote: supercrawler is picking up ALL links on a page. If there are links to movie files, images, or any large files it will add these URLs to the queue. The urls get passed to request which tries to download them. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#21>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA6EofZYkvG3HUocsSXvg1u7t4X5hxxTks5u1jJpgaJpZM4ZAJbH> .

cbess · 2019-02-19T13:20:58Z

I have run into the same problem. I'm working on a fix for this issue.

cbess · 2019-12-08T02:22:56Z

I finally addressed this issue. I believe it is resolved with #45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawling binary files #21

Crawling binary files #21

joshuambg commented Dec 4, 2018

brendonboshell commented Dec 4, 2018 via email

cbess commented Feb 19, 2019

cbess commented Dec 8, 2019

Crawling binary files #21

Crawling binary files #21

Comments

joshuambg commented Dec 4, 2018

brendonboshell commented Dec 4, 2018 via email

cbess commented Feb 19, 2019

cbess commented Dec 8, 2019