Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve type hints for str vs bytes #72

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

aronbierbaum
Copy link

No description provided.

Comment on lines 70 to 72
_StrOrBytes = Union[str, bytes]
_ValueType = Union[str, bytes, QName]
_InputDictAnyStr = Dict[_StrOrBytes, _StrOrBytes]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type aliases for parameters that allow str or bytes on both Python 2 and Python 3.

) -> _Element: ...
def write(
self,
file: _FileSource,
encoding: _AnyStr = ...,
method: _AnyStr = ...,
encoding: Optional[_StrOrBytes] = ...,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be marked as Optional since None is allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, even the unicode (or str) type is allowed as value. Although passing the string name "unicode" is probably a better idea, so your proposal seems sufficient. In fact, I'd rather see this deprecated and accept only strings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the global tostring function this only accepts unicode or bytes.

elm_tree = etree.fromstring("<test/>").getroottree()
print(type(elm_tree))

with open("test.xml", "wb") as out_file:
   elm_tree.write(out_file, encoding = str)
Traceback (most recent call last):
  File "C:\Source\p5\taccs\unicode_xml.py", line 82, in <module>
    elm_tree.write(out_file, encoding = str)
  File "src/lxml/etree.pyx", line 2055, in lxml.etree._ElementTree.write
TypeError: unbound method str.upper() needs an argument

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's start the deprecation by making sure users pass only strings.

pretty_print: bool = ...,
xml_declaration: Any = ...,
with_tail: Any = ...,
standalone: bool = ...,
doctype: _StrOrBytes = ...,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated for new parameters.

@aronbierbaum
Copy link
Author

@scoder Who should I contact to get reviews of this pull request?

Copy link
Member

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good improvement.

I like the idea of separating input and output declarations.
https://en.wikipedia.org/wiki/Robustness_principle

lxml-stubs/etree.pyi Show resolved Hide resolved
) -> _Element: ...
def write(
self,
file: _FileSource,
encoding: _AnyStr = ...,
method: _AnyStr = ...,
encoding: Optional[_StrOrBytes] = ...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, even the unicode (or str) type is allowed as value. Although passing the string name "unicode" is probably a better idea, so your proposal seems sufficient. In fact, I'd rather see this deprecated and accept only strings.

encoding: _AnyStr = ...,
method: _AnyStr = ...,
encoding: Optional[_StrOrBytes] = ...,
method: str = ...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be unicode in Py2, essentially the same as encoding above (but not None).

Copy link
Author

@aronbierbaum aronbierbaum Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, changed to _StrOrBytes. Also updated the tostring overrides.

compression: int = ...,
exclusive: bool = ...,
inclusive_ns_prefixes: List[Union[str, bytes]] = ...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this different from the declaration below in write_c14n?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to be Iterable[_StrOrBytes] like write_c14n.

Comment on lines +315 to +316
def __setitem__(self, key: _TagName, value: _ValueType) -> None: ...
def __delitem__(self, key: _TagName) -> None: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good catch.

_ListAnyStr = Union[List[str], List[bytes]]
_DictAnyStr = Union[Dict[str, str], Dict[bytes, bytes]]
_StrOrBytes = Union[str, bytes]
_ValueType = Union[str, bytes, QName]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a very generic and unclear name. Isn't it just the same as _TagName?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct it has the same type as _TagName but it felt odd using _TagName for both parameters below. I have no problem changing it if you prefer. We could also change this to _ValueType = _TagName so that we maintain readability.

def set(self, key: _TagName, value: _ValueType) -> None: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that both are different for the _Attrib dict (and .get()/.set()), but the first really is a _TagName.

def update(
self,
sequence_or_dict: Union[
_Attrib, Mapping[_AnyStr, _AnyStr], Sequence[Tuple[_AnyStr, _AnyStr]]
_Attrib, Mapping[_ValueType, _ValueType], Sequence[Tuple[_ValueType, _ValueType]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key and value are not quite the same here, I think. The key would probably accept a QName, i.e. should be _TagName.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use _TagName for the key and first tuple value.

@@ -409,7 +429,7 @@ class XMLParser(_FeedParser):
class HTMLParser(_FeedParser):
def __init__(
self,
encoding: Optional[_AnyStr] = ...,
encoding: Optional[_StrOrBytes] = ...,
Copy link
Member

@scoder scoder Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this appears quite a few times, it seems worth another alias, e.g. _Identifier.

Comment on lines 713 to 715
def end(self, tag: _StrOrBytes) -> None: ...
def pi(self, target: _StrOrBytes, data: Optional[_StrOrBytes] = ...) -> Any: ...
def start(self, tag: _StrOrBytes, attrib: Dict[_StrOrBytes, _StrOrBytes]) -> None: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth using _TagName for the tag argument.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated start and end to take tag: _TagName

lxml-stubs/etree.pyi Outdated Show resolved Hide resolved
@aronbierbaum
Copy link
Author

@scoder Could you take another look at this, I believe I have handled most issues but there are a few outstanding questions.

@scoder
Copy link
Member

scoder commented May 4, 2023

Triggering a fresh test run

@scoder scoder closed this May 4, 2023
@scoder scoder reopened this May 4, 2023
@scoder
Copy link
Member

scoder commented Jan 8, 2024

Triggering a fresh test run

@scoder scoder closed this Jan 8, 2024
@scoder scoder reopened this Jan 8, 2024
@scoder
Copy link
Member

scoder commented Jan 8, 2024

I've fixed the dependency induced test failures in the master branch, but this PR still has a couple of own failures. Could you please see if you can resolve them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants