-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #249 - Add strip to get() and getall() #260
base: master
Are you sure you want to change the base?
Issue #249 - Add strip to get() and getall() #260
Conversation
Codecov Report
@@ Coverage Diff @@
## master #260 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 4 4
Lines 138 138
Branches 29 29
=========================================
Hits 138 138 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
parsel/selector.py
Outdated
@typing.overload | ||
def get(self, strip: bool) -> str: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? (I am not saying it is not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is, I just saw the overloads above this line and thought it was required
@typing.overload
def get(self, default: None = None) -> Optional[str]:
pass
@typing.overload
def get(self, default: str) -> str:
pass
I removed it on the last push. I don't understand the whole overload completely, but I feel those 2 are also unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is related to typing, when parameter types affect output type, but I do not think it applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removing it broke the pipeline for type checks.
No overload variant of "get" of "SelectorList" matches argument type "bool"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you added a new argument to the method you need to add them to all overloads too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And those two overloads are needed because the method either can return None if default wasn't passed or it returns the default instead, so this describes the typing info more clearly.
parsel/selector.py
Outdated
@@ -200,12 +200,15 @@ def re_first( | |||
return el | |||
return default | |||
|
|||
def getall(self) -> List[str]: | |||
def getall(self, strip: bool = False) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s make it keyword-only.
def getall(self, strip: bool = False) -> List[str]: | |
def getall(self, *, strip: bool = False) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same on get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we apply this to typing.overload
of get
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding it to get
but then this would require some type check on default
entry to validate it is a str
, either way the behavior looks fine without it for get
parsel/selector.py
Outdated
@typing.overload | ||
def get(self, strip: bool) -> str: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you added a new argument to the method you need to add them to all overloads too.
parsel/selector.py
Outdated
@typing.overload | ||
def get(self, strip: bool) -> str: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And those two overloads are needed because the method either can return None if default wasn't passed or it returns the default instead, so this describes the typing info more clearly.
""" | ||
Call the ``.get()`` method for each element is this list and return | ||
their results flattened, as a list of strings. | ||
""" | ||
return [x.get() for x in self] | ||
data = [x.get() for x in self] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just pass strip
to get()
instead of essentially reimplementing the new code from get() here?
(granted, the code is almost trivial, but still)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x is a Selector, at that point neither default nor strip exist anymore
Only SelectorList recognizes those two fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About adding the strip
argument to the other overloads, wouldn't that remove the purpose of the overload on those?
Wouldn't having an overloaded default + strip
be just going back to the non-overloaded method signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x is a Selector, at that point neither default nor strip exist anymore
Aren’t you also adding strip
to Selector
? Can’t you use data = [x.get(strip=True) for x in self]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That ended up looking harder than it should when I tried.
Also as far as I could see, default also is not on the Selector, so for standardization, I believe adding it would also be required. And in that case I don't understand why it wasn't added up until now, it gives me the feeling it is not a trivial task given the current implementation of get on Selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are changing Selector.get() to have strip
and it already has default
so I think there shouldn't be any problems with using it, and it indeed doesn't look too complicated to me :)
Also if you have issues with typing I can hep, I'm working on improving parsel typing right now.
this_doesnt_work = sel.xpath("//ul/li[position()>1]")[0].get(default="6") # Selector doesnt have default field
sel.xpath(...).get(default=...)
works in normal parsel without your changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About adding the strip argument to the other overloads, wouldn't that remove the purpose of the overload on those?
No, the purpose of those overloads is to show that the type of the return value depends on whether default
is None or not.
Wouldn't having an overloaded default + strip be just going back to the non-overloaded method signature?
I believe just adding strip
to the existing overloads is enough, no new overloads are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sel.xpath(...).get(default=...)
uses SelectorList
, not Selector
. The problem is that by indexing [0]
then you go to a Selector
, and it does not have default
.
but either way I still have to give it a look
Also, it seems a little unintuitive to be using the [0]
when doing a get operation, as you are already asking the SelectorList
to get the first instance it found. I used it on my tests because I saw some of the tests use this kind of indexing to prove Selector
behaves correctly, and in this case, I believe you should be able to do both default
and strip
directly on a Selector, but you can't for a Selector
at the moment so maybe this needed further discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, you are right. Selector.get()
is much simpler than SelectorList.get()
. Well, I think it makes sense and is easy to add strip=
to Selector.get()
.
it seems a little unintuitive to be using the [0] when doing a get operation, as you are already asking the SelectorList to get the first instance it found
Yeah, it's probably unusual to write <SelectorList>[0].get()
. It's OK to have a Selector and call a get) on it though.
I believe you should be able to do both default and strip directly on a Selector, but you can't for a Selector at the moment so maybe this needed further discussion.
Not so sure about default=, but maybe there are use cases where it's useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep this as is, strip for Selector.get() is less useful and can be implemented separately.
@@ -200,30 +200,36 @@ def re_first( | |||
return el | |||
return default | |||
|
|||
def getall(self) -> List[str]: | |||
def getall(self, *, strip: bool = False) -> List[str]: | |||
""" | |||
Call the ``.get()`` method for each element is this list and return | |||
their results flattened, as a list of strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new option needs to be mentioned in the docstring.
def get(self, default: Optional[str] = None) -> Optional[str]: | ||
def get( | ||
self, default: Optional[str] = None, strip: bool = False | ||
) -> Optional[str]: | ||
""" | ||
Return the result of ``.get()`` for the first element in this list. | ||
If the list is empty, return the default value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new option needs to be mentioned in the docstring
Hey! Is there a use case for this feature if #127 is merged, do you have some examples? I'm asking because |
@kmike By the time I saw #127 I had most of the bulk of the strip logic coded. I do0n't think there is a common use case for this PR compared to 127 besides maybe being a more straight forward pre processing step of the text where only one thing happens (strip). |
Resolves #249