Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a scraper for german indeed #136

Merged
merged 9 commits into from
Sep 21, 2021

Conversation

marchbnr
Copy link
Contributor

Add a scraper for german indeed

Description

Implemented a scraper implementation for the german indeed website.
The functionality includes a subset of the changes suggested in #132 by @Luckyz7 and the required changes were commented. A different locale name was used to align more closely with iso codes.

Context of change

Please add options that are relevant and mark any boxes that apply.

  • Software (software that runs on the PC)
  • Library (library that runs on the PC)
  • Tool (tool that assists coding development)
  • Other

Type of change

Please mark any boxes that apply.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Tested manually using a configuration in the demo directory.

Checklist:

Please mark any boxes that have been completed.

  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

@thebigG
Copy link
Collaborator

thebigG commented Mar 13, 2021

This is awesome, thanks so much for the contribution! But do you mind adding a test for your new demo file to https://github.com/PaulMcInnis/JobFunnel/blob/master/.github/workflows/ci.yml?

When you add it, github Actions will automatically test your Germany scraper every time there is a push :).

@marchbnr
Copy link
Contributor Author

This is awesome, thanks so much for the contribution! But do you mind adding a test for your new demo file to https://github.com/PaulMcInnis/JobFunnel/blob/master/.github/workflows/ci.yml?

When you add it, github Actions will automatically test your Germany scraper every time there is a push :).

Thanks for the feedback. I have added a test run in the ci build, but the pipeline already fails due to existing issues.

@thebigG
Copy link
Collaborator

thebigG commented Mar 20, 2021

Thanks for adding it to the CI. Yes, we have issues in the CI. In fact me and @PaulMcInnis have been talking about this on #133. I thought I had fixed the issue with #134, but sadly it looks like it wasn't fixed completely :(. I'll try to look into it when I get time.

thebigG added 2 commits March 20, 2021 16:39
-incoming_jobs_dict was the same value as jobs dictionary which is why _check_for_inter_scraper_validity was failing. Monster was comparing itself
@codecov-io
Copy link

Codecov Report

Merging #136 (b818f58) into master (728849f) will increase coverage by 0.28%.
The diff coverage is 25.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #136      +/-   ##
==========================================
+ Coverage   36.12%   36.41%   +0.28%     
==========================================
  Files          22       26       +4     
  Lines        1456     1494      +38     
==========================================
+ Hits          526      544      +18     
- Misses        930      950      +20     
Impacted Files Coverage Δ
jobfunnel/backend/jobfunnel.py 0.00% <0.00%> (ø)
jobfunnel/backend/scrapers/base.py 39.39% <0.00%> (+0.88%) ⬆️
jobfunnel/backend/scrapers/indeed.py 25.80% <ø> (-1.19%) ⬇️
jobfunnel/backend/scrapers/registry.py 100.00% <ø> (ø)
jobfunnel/resources/defaults.py 100.00% <ø> (ø)
jobfunnel/resources/enums.py 100.00% <100.00%> (ø)
jobfunnel/backend/tools/__init__.py 100.00% <0.00%> (ø)
jobfunnel/backend/__init__.py 100.00% <0.00%> (ø)
jobfunnel/__init__.py 100.00% <0.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 446e9e0...b818f58. Read the comment docs.

@thebigG
Copy link
Collaborator

thebigG commented Mar 20, 2021

I think at least now we won't have job_id issues anymore. What's left is deciding what to do about the error code when we don't find any jobs.

@marchbnr
Copy link
Contributor Author

Cool, thank you for your changes! I think the status code should not be dependent on the number of results. One possibility to do this differently is by tracking for errors along the way and setting the status code after execution accordingly. However this is not related to the current feature, so I would do this in another pull request, if you agree.

Copy link
Owner

@PaulMcInnis PaulMcInnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just some minor comments, but I think the improved keying will help us avoid issues in the future.

We should also rev up the major version as we are introducing a breaking change for users who wish to continue an existing search by changing the keying.

demo/settings_DE.yaml Outdated Show resolved Hide resolved
demo/settings_DE.yaml Outdated Show resolved Hide resolved
@PaulMcInnis
Copy link
Owner

also once this goes in we should cut a new release as I think some of our recent issues are resolved by the current master.

@PaulMcInnis
Copy link
Owner

OK, going to merge this and cut a release with removed brotli encoding for all the other scrapers as well since it seems to be causing issues all around. Was a bit pre-emptive with 3.0.2

@PaulMcInnis PaulMcInnis merged commit ba39160 into PaulMcInnis:master Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants