Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SwitchPage #103
base: master
Are you sure you want to change the base?
Add SwitchPage #103
Changes from 2 commits
f012a0b
62cd1ed
51114e4
31fc060
7817813
36ad056
b5d61ea
6778aa3
073c4ab
b51f056
308c4bf
51bf31f
828a84b
fc7867f
b32f92f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a test for use cases when a
MultiLayoutPage
subclass also uses multipleMultiLayoutPage
s underneath. I think this could be a use case when users import POs from different packages and repacking them.Might also be worth creating a doc about this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about the expected behavior of the implementing framework (e.g. scrapy-poet) when the Switch PO has chosen the PO to parse the page.
Specifically, would the frameworks need to check if the chosen PO is declared in any of the rules'
instead_of
parameter?Approach 1
One use case I can think of is:
In this example, the last resort is Automatic Extraction (e.g. by ML models). If the
instead_of
parameters are supported, then it would be of great convenience to the user to simply rely on the other overrides stored in the registry to return a more apt PO.Approach 2
Alternatively, users can simply do:
(Reference: #26)
In this case, it'd be up to the implementing framework to decide if the
instead_of
parameters are used, or perhaps some other means of declaring and using the fallback PO via some user-defined settings.Approach 3
The other approach is not to use the
instead_of
parameters at all since it should exactly follow what PO class the user has returned.Other approaches
Could perhaps be a combination of the ideas above to give a finer grain of control to the user; or perhaps completely something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no clear idea on what the best approach here is. I think your questions aligns a bit with one of my open question:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to support overrides for candidate page object classes, at least in the first version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky since dependencies are dynamic.
For example:
Providing all three of the dependencies should solve it but it could be expensive.
There should be a way for the Switch/Select/Choose PO to declare the dependency of the PO to use. This won't work in the current way of how dependencies are provided since the Switch/Select/Choose PO is already built as an instance and getting another dependency input is something we need to work on.
Alternatively, I'm thinking if we should glance at the perspective of solving this via overrides with some slight tweaks. For example:
The
web_poet.DynamicPO
is simply a sentinel class which serves as a stand-in while we still don't know the final PO to use.If this is present, then the behavior of
ApplyRule
changes a bit:instead_of
parameter and resolve any dependencies that it needs (e.g.httpResponseBody
).switch()
method (or other names we can think of for this)use
parameterApplyRule
as usualThe concept of overrides could fit in here since the Switch/Select/Choose PO is essentially overridden by the PO it has selected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed solution indeed works best for a scenario where the inputs are the same for all layout-specific page objects, and where the required layout is likely to be random (i.e. A/B testing scenario, as opposed to special URLs where the same URL is likely to get the same layout when you refresh).
In this random scenario, requesting httpResponseBody to determine the layout, determining that the layout is 2, and then fetching browserHtml to parse it with layout 2, would not work, because by the time you get layout 2 the response may be for layout 1.
I think an approach in line with the current proposal may still make sense, at least for some scenarios like the A/B test one.
I wonder if it would make sense to implement 2 different solutions for the 2 different scenarios. Specially because to allow for incremental request of inputs based on their underlying cost, we will probably need a significantly more complicated approach.