Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs fixes, adjustments, and cleaning. #84

Merged
merged 6 commits into from
Jul 12, 2020

Conversation

bunsenmurder
Copy link
Collaborator

@bunsenmurder bunsenmurder commented Jul 10, 2020

Description

As the title implies, this pull request encompasses a few bug fixes, a few adjustments and some code cleaning. Major changes I've made will be described here; any minor details or specifics can be found within the commit messages.

Fixes

  • For all providers except Glassdoor Dynamic, I changed how keywords and cities were encoded into the queries being sent to the job providers. This fixes issue Improved search keyword encoding with support for exact phrase #80.
  • Previously, futures within the delay_threader function were being deleted whether a blurb was parsed or not.

Adjustments

  • Previously, we would assign headers to each get/post request. Now the headers are set as the default headers for each providers session object (reference). If different headers are needed for a specific request, they can be overridden by setting the headers when making that request.
  • For Glassdoor_static, we only need to make a POST request initially when querying the Glassdoor location api and making an initial search. I've replaced any unnecessary POST requests with GET requests.
  • Any errors in the delay_threader are captured by the logger.

Context of change

Please add options that are relevant and mark any boxes that apply.

  • Software (software that runs on the PC)
  • Library (library that runs on the PC)
  • Tool (tool that assists coding development)
  • Other

Type of change

Please mark any boxes that apply.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • Ran the funnel using quoted searches and comparing to actual results on each providers website.

Checklist:

Please mark any boxes that have been completed.

  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

- For Indeed and Monster, the query string was not properly encoded when a quoted phrase with spaces in-between words were provided. The fix was to encode all spaces with the proper character(+/-). This issue and fix also applied to city names.
- For GlassDoorStatic, the query string was encoded for a URL and returned improper results. Since this class searches using a JSON payload, the solution was to combine the keywords with a space instead.
-The old query construction function was moved from GlassDoorBase to GlassDoorDynamic to prevent the dynamic scraper class from breaking.

Fixes issues PaulMcInnis#80.
- Removed unused requests imports
- Changed URL strings that had http in them to https
- Set provider header dictionary as the default headers on the provider's session object. Setting headers on the actual post/get method call is only necessary for temporarily overriding the session headers on an individual request.
- Adjusted search_page_for_job_soups method for GlassDoorStatic class so that it uses GET instead of POST. Sending payload data when we already have the search page URL is unnecessary and can lead to bot detection measures activating more frequently.
- Updated test URL to test for https instead of http
- Previously futures would be deleted whether they finished parsing or not.
- Added code to delete the HTML page after it's  parsed.
- Added code to log any errors during blurb retrieval and parsing.
@codecov-commenter
Copy link

Codecov Report

Merging #84 into master will decrease coverage by 0.02%.
The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #84      +/-   ##
==========================================
- Coverage   58.37%   58.34%   -0.03%     
==========================================
  Files          13       13              
  Lines        1146     1150       +4     
==========================================
+ Hits          669      671       +2     
- Misses        477      479       +2     
Impacted Files Coverage Δ
jobfunnel/glassdoor_dynamic.py 16.37% <0.00%> (-0.15%) ⬇️
jobfunnel/monster.py 28.14% <33.33%> (+0.53%) ⬆️
jobfunnel/jobfunnel.py 47.70% <50.00%> (+0.02%) ⬆️
jobfunnel/glassdoor_static.py 28.57% <60.00%> (+0.68%) ⬆️
jobfunnel/glassdoor_base.py 57.89% <66.66%> (-0.73%) ⬇️
jobfunnel/indeed.py 87.32% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a246cb...7f27900. Read the comment docs.

Copy link
Owner

@PaulMcInnis PaulMcInnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent! :shipit:

@bunsenmurder
Copy link
Collaborator Author

Thanks @PaulMcInnis ! Btw, the codecov bot doesn't seem to like some my changes. Is there anything specific I need to add when making pull requests?

@PaulMcInnis
Copy link
Owner

PaulMcInnis commented Jul 12, 2020 via email

@bunsenmurder bunsenmurder merged commit aac4008 into PaulMcInnis:master Jul 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants