Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Germany_German Support #132

Closed
wants to merge 1 commit into from
Closed

Germany_German Support #132

wants to merge 1 commit into from

Conversation

lucky7xz
Copy link

Duplicated changes made for previous support expansions to enable scraping on indeed and monster with ".de" domain.


There are some problems though:

  1. IP ban?
    After a certain threshold of listings is processed (somewhere between 25 and 75), another query would yield this output:
    (Note that enabling a vpn allows for further queries)

[2021-01-28 13:55:51,824] [INFO] JobFunnel: Scraping local providers with: ['IndeedScraperGEGer', 'MonsterScraperGEGer']
[2021-01-28 13:55:53,348] [INFO] IndeedScraperGEGer: Found 2 pages of search results for query=python
[2021-01-28 13:55:53,731] [INFO] IndeedScraperGEGer: Scraped 0 job listings from search results pages
[2021-01-28 13:55:53,735] [ERROR] JobFunnel: Failed to scrape jobs for IndeedScraperGEGer
[2021-01-28 13:55:53,737] [INFO] MonsterScraperGEGer: No get() or set() will be done for Job attrs: ['REMOTENESS']
[2021-01-28 13:55:54,605] [ERROR] JobFunnel: Failed to scrape jobs for MonsterScraperGEGer
[2021-01-28 13:55:54,605] [INFO] JobFunnel: Completed all scraping, found 0 new jobs.
[2021-01-28 13:55:54,625] [WARNING] JobFunnel: No new jobs were added to CSV.

  1. After the Indeed scraper is done, this error message appears. I've tried multiple province/city configurations.

C:\Users\Lucky\Scripts>funnel load -s my_settings.yaml
[2021-01-28 13:02:27,031] [INFO] JobFunnel: Scraping local providers with: ['IndeedScraperGEGer', 'MonsterScraperGEGer']
[2021-01-28 13:02:28,423] [INFO] IndeedScraperGEGer: Found 1 pages of search results for query=python
[2021-01-28 13:02:28,987] [INFO] IndeedScraperGEGer: Scraped 23 job listings from search results pages
100%|##################################################################################| 23/23 [00:30<00:00, 1.33s/it]
[2021-01-28 13:02:59,601] [INFO] MonsterScraperGEGer: No get() or set() will be done for Job attrs: ['REMOTENESS']
[2021-01-28 13:03:00,363] [ERROR] JobFunnel: Failed to scrape jobs for MonsterScraperGEGer
Traceback (most recent call last):
File "C:\Users\Lucky\Scripts\funnel-script.py", line 11, in
load_entry_point('JobFunnel==3.0.1', 'console_scripts', 'funnel')()
File "C:\Users\Lucky\AppData\Roaming\Python\Python38\site-packages\jobfunnel_main_.py", line 28, in main
job_funnel.run()
File "C:\Users\Lucky\AppData\Roaming\Python\Python38\site-packages\jobfunnel\backend\jobfunnel.py", line 114, in run
scraped_jobs_dict = self.scrape()
File "C:\Users\Lucky\AppData\Roaming\Python\Python38\site-packages\jobfunnel\backend\jobfunnel.py", line 244, in scrape
self._check_for_inter_scraper_validity(
File "C:\Users\Lucky\AppData\Roaming\Python\Python38\site-packages\jobfunnel\backend\jobfunnel.py", line 220, in _check_for_inter_scraper_validity
raise ValueError(
ValueError: Inter-scraper key-id duplicate! 22e7f67c9200c7ce

Duplicated changes made for previous support expansions for indeed and monster.
@thebigG
Copy link
Collaborator

thebigG commented Jan 28, 2021

Thanks so much for German support!

That issue looks like #123. We are currently working on fixing this. I haven't gotten around to fix it. I'll try my best to fix it in the coming days so you can merge the changes and hopefully then we'll have fully-functiomal German support.

Really excited about this new addition 👍

@codecov-io
Copy link

Codecov Report

Merging #132 (8f536ae) into master (e509ef4) will decrease coverage by 0.22%.
The diff coverage is 58.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #132      +/-   ##
==========================================
- Coverage   36.17%   35.95%   -0.23%     
==========================================
  Files          22       22              
  Lines        1454     1488      +34     
==========================================
+ Hits          526      535       +9     
- Misses        928      953      +25     
Impacted Files Coverage Δ
jobfunnel/backend/scrapers/base.py 39.87% <ø> (+0.88%) ⬆️
jobfunnel/backend/scrapers/indeed.py 25.40% <ø> (-1.59%) ⬇️
jobfunnel/backend/scrapers/registry.py 100.00% <ø> (ø)
jobfunnel/resources/defaults.py 100.00% <ø> (ø)
jobfunnel/backend/scrapers/monster.py 27.10% <28.57%> (+0.06%) ⬆️
jobfunnel/backend/tools/tools.py 29.87% <100.00%> (ø)
jobfunnel/resources/enums.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e509ef4...8f536ae. Read the comment docs.

@lucky7xz
Copy link
Author

Thanks so much for German support!

That issue looks like #123. We are currently working on fixing this. I haven't gotten around to fix it. I'll try my best to fix it in the coming days so you can merge the changes and hopefully then we'll have fully-functiomal German support.

Really excited about this new addition 👍

My pleasure, truly.
I want to say that I'm new to github. This is actually my first commit so don't really know what I'm doing thb. Nonetheless, I'm interested in what the problems are exactly and how they can be/are fixed. Luxembourg support would be really cool as well. The official languages are ENG, GER & FR over there, so if this works out, adding it should be no problem I think. And maybe I can hack that one on my own :)

Would really appreciate it if you could notify me with what you've done when you have it working. 🚀

Copy link
Owner

@PaulMcInnis PaulMcInnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great first PR, I have submitted some things for you to review, I think you may be able to have less code in these new classes, as in many cases they seem to be the same as the base class where they are inherited from.

Once we discuss you can commit any changes, and I'll review again :)

'''Scrapes jobos from indeed.de
'''
def _get_search_url(self, method: Optional[str] = 'get') -> str:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove _get_search_url and _get_num_search_result_pages here as these are defined by BaseIndeedScraper

f'page={page}&' if page > 1 else '',
self.query,
self.config.search_config.city.replace(' ', '-'),
self._convert_radius(self.config.search_config.radius)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does German's Monster allow searching by state? If so I think perhaps this is missing here.

@PaulMcInnis
Copy link
Owner

Closing this since the feature is being provided by PR #136.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants