Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter mapping #151

Merged
merged 8 commits into from
Nov 30, 2023
Merged

Conversation

Gallaecio
Copy link
Contributor

@Gallaecio Gallaecio commented Nov 17, 2023

Resolves #143.

  • _params.py now contains a dictionary of top-level Zyte API request params with metadata about them, and it should be extended as new parameters are added upstream in the future. Related changes:
    • "jobPosting" is now recognized as an extraction type (affects whether or not httpResponseBody and httpResponseHeaders are enabled by default).
    • "GET" is now recognized as the default value of "httpRequestMethod", and stripped from request parameters if manually defined.
  • Fingerprinting changes:
    • *Options.extractFrom is now taken into account to determine whether or not to keep the URL fragment. The fragment is no longer kept just because automatic extraction is used, browserHtml must also be explicitly requested as extraction source.
    • Matching "httpRequestBody" and "httpRequestText" values do not cause different fingerprints.
    • The following additional keys are ignored for the sake of fingerprints: sessionContext, sessionContextParameters, cookieManagement, requestCookies, responseCookies.
    • ⚠️ device remains taken into account for fingerprinting (the default behavior for any unknown key). I am not completely sure, though. If it only affects requests (and not e.g. the viewport), maybe it would be best to ignore it during fingerprinting.
    • ⚠️ I wonder if we should also ignore httpResponseHeaders (True/False), since we already take into account other outputs it can be combined with, in line how responseCookies is now ignored.

To do:

@Gallaecio Gallaecio requested review from kmike and wRAR November 17, 2023 12:48
Copy link

codecov bot commented Nov 17, 2023

Codecov Report

Merging #151 (4943f47) into main (b892453) will increase coverage by 0.08%.
Report is 2 commits behind head on main.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #151      +/-   ##
==========================================
+ Coverage   98.83%   98.92%   +0.08%     
==========================================
  Files          10       10              
  Lines         687      744      +57     
==========================================
+ Hits          679      736      +57     
  Misses          8        8              
Files Coverage Δ
scrapy_zyte_api/_params.py 100.00% <100.00%> (ø)
scrapy_zyte_api/_request_fingerprinter.py 97.56% <100.00%> (+0.19%) ⬆️
scrapy_zyte_api/responses.py 99.00% <ø> (+0.04%) ⬆️

... and 3 files with indirect coverage changes

scrapy_zyte_api/_params.py Outdated Show resolved Hide resolved
scrapy_zyte_api/_params.py Outdated Show resolved Hide resolved
@Gallaecio Gallaecio merged commit 702cc63 into scrapy-plugins:main Nov 30, 2023
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review fingerprinting after the addition of new keys
3 participants