-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JobFunnel: Failed to scrape jobs for IndeedScraperUSAEng #137
Comments
Also receiving this. Monster working fine but Indeed fails every time, even with different search keywords. Using DEBUG logging, I was able to get the URL it was trying to hit and it seemed fine. Environment:
|
Thanks for opening an issue, I think we have some long outstanding issues with Parsing of the search URL for certain queries, if you are open to sharing your search URLs from logs it would be very helpful to identify what the issue is. We current have CI for the US Indeed scraper but it only performs a basic search. Additionally, can you confirm that you are able to obtain results (non advertisement results) for the search you are performing on the Indeed website? |
Sure. My jobfunnel has also been failing the Monster scrape the past few days (using crontab to run once daily). I would also try to debug if I could but I'm not very familiar with running python projects and I couldnt figure out how to run from PyCharm with the source 😅 I also used the URL: https://www.indeed.com/jobs?q=Software&l=tulsa%2C+OK&radius=25&limit=50&filter=0 |
Ok, yeah looks like we need to improve the url parsing! Can you try instead searching for two separate keywords, like this:
|
Oh i see that you tried with a single keyword as well, ok. I think this might be some other issue. One thing to try is to use current master of this repo. You can do that by installing it in place with, |
Ah ok, thanks for being so responsive, we’ll have to take a deeper look. If you are feeling confident I invite you to break execution in the scraper where we collect the number of pages of results from the search url, I suspect the issue is there since it ends up scraping no jobs. |
Unfortunately PyCharm doesn't work for this project due to use of abstract base classes. The best way to debug is to add a then you have access to a complete python interpreter, i.e. |
You should be able to debug modules, such as jobfunnel, in pycharm like this: |
Resolves an issue where indeed responses are not being decoded correctly. Might resolve issue PaulMcInnis#137
Resolves an issue where indeed responses are not being decoded correctly. Might resolve issue PaulMcInnis#137
If anyone reading this that has the time and knowledge can I ask you to write a step by step example of how to debug this code? |
RE pycharm, users have had issues using it with this repository in the past due to the ABC implementation: #90 (comment) I highly recommend just adding the line NOTE: to use pdb with |
Thanks for the quick reply. I apologize if my questions seem lazy (I have very little experience with python) but how do I run the code with test parameters (location, keywords) from local cloned repository? |
totally fine, happy to help! You should be able to run with test parameters by doing this:
|
Running What I'm trying to do is:
|
Right i recommend doing this to have a test version of jobfunnel:
When done you can exit virtualenv with |
Ok so i think the best place to start is
|
Resolves an issue where indeed responses are not being decoded correctly. Might resolve issue #137
didn't mean to close this abruptly but I think the encoding was causing this. Please pull the latest changes and try, but this has resolved the issue on my end. |
Description
Running
funnel
withload -s settings_USA.yml
gives the following error:Environment
Would like to debug further but not sure how to do it.
The text was updated successfully, but these errors were encountered: