-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Germany_German Support #132
Conversation
Duplicated changes made for previous support expansions for indeed and monster.
Thanks so much for German support! That issue looks like #123. We are currently working on fixing this. I haven't gotten around to fix it. I'll try my best to fix it in the coming days so you can merge the changes and hopefully then we'll have fully-functiomal German support. Really excited about this new addition 👍 |
Codecov Report
@@ Coverage Diff @@
## master #132 +/- ##
==========================================
- Coverage 36.17% 35.95% -0.23%
==========================================
Files 22 22
Lines 1454 1488 +34
==========================================
+ Hits 526 535 +9
- Misses 928 953 +25
Continue to review full report at Codecov.
|
My pleasure, truly. Would really appreciate it if you could notify me with what you've done when you have it working. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great first PR, I have submitted some things for you to review, I think you may be able to have less code in these new classes, as in many cases they seem to be the same as the base class where they are inherited from.
Once we discuss you can commit any changes, and I'll review again :)
'''Scrapes jobos from indeed.de | ||
''' | ||
def _get_search_url(self, method: Optional[str] = 'get') -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove _get_search_url
and _get_num_search_result_pages
here as these are defined by BaseIndeedScraper
f'page={page}&' if page > 1 else '', | ||
self.query, | ||
self.config.search_config.city.replace(' ', '-'), | ||
self._convert_radius(self.config.search_config.radius) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does German's Monster allow searching by state? If so I think perhaps this is missing here.
Closing this since the feature is being provided by PR #136. |
Duplicated changes made for previous support expansions to enable scraping on indeed and monster with ".de" domain.
There are some problems though:
After a certain threshold of listings is processed (somewhere between 25 and 75), another query would yield this output:
(Note that enabling a vpn allows for further queries)
[2021-01-28 13:55:51,824] [INFO] JobFunnel: Scraping local providers with: ['IndeedScraperGEGer', 'MonsterScraperGEGer']
[2021-01-28 13:55:53,348] [INFO] IndeedScraperGEGer: Found 2 pages of search results for query=python
[2021-01-28 13:55:53,731] [INFO] IndeedScraperGEGer: Scraped 0 job listings from search results pages
[2021-01-28 13:55:53,735] [ERROR] JobFunnel: Failed to scrape jobs for IndeedScraperGEGer
[2021-01-28 13:55:53,737] [INFO] MonsterScraperGEGer: No get() or set() will be done for Job attrs: ['REMOTENESS']
[2021-01-28 13:55:54,605] [ERROR] JobFunnel: Failed to scrape jobs for MonsterScraperGEGer
[2021-01-28 13:55:54,605] [INFO] JobFunnel: Completed all scraping, found 0 new jobs.
[2021-01-28 13:55:54,625] [WARNING] JobFunnel: No new jobs were added to CSV.
C:\Users\Lucky\Scripts>funnel load -s my_settings.yaml
[2021-01-28 13:02:27,031] [INFO] JobFunnel: Scraping local providers with: ['IndeedScraperGEGer', 'MonsterScraperGEGer']
[2021-01-28 13:02:28,423] [INFO] IndeedScraperGEGer: Found 1 pages of search results for query=python
[2021-01-28 13:02:28,987] [INFO] IndeedScraperGEGer: Scraped 23 job listings from search results pages
100%|##################################################################################| 23/23 [00:30<00:00, 1.33s/it]
[2021-01-28 13:02:59,601] [INFO] MonsterScraperGEGer: No get() or set() will be done for Job attrs: ['REMOTENESS']
[2021-01-28 13:03:00,363] [ERROR] JobFunnel: Failed to scrape jobs for MonsterScraperGEGer
Traceback (most recent call last):
File "C:\Users\Lucky\Scripts\funnel-script.py", line 11, in
load_entry_point('JobFunnel==3.0.1', 'console_scripts', 'funnel')()
File "C:\Users\Lucky\AppData\Roaming\Python\Python38\site-packages\jobfunnel_main_.py", line 28, in main
job_funnel.run()
File "C:\Users\Lucky\AppData\Roaming\Python\Python38\site-packages\jobfunnel\backend\jobfunnel.py", line 114, in run
scraped_jobs_dict = self.scrape()
File "C:\Users\Lucky\AppData\Roaming\Python\Python38\site-packages\jobfunnel\backend\jobfunnel.py", line 244, in scrape
self._check_for_inter_scraper_validity(
File "C:\Users\Lucky\AppData\Roaming\Python\Python38\site-packages\jobfunnel\backend\jobfunnel.py", line 220, in _check_for_inter_scraper_validity
raise ValueError(
ValueError: Inter-scraper key-id duplicate! 22e7f67c9200c7ce