Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply normalization consistently in VLenBytes #570

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

folded
Copy link

@folded folded commented Aug 28, 2024

None and 0 are treated like a 0 length string when computing lengths, and the same normalization should be applied to the value passed to PyBytes_AS_STRING. If this is not done, an assertion is hit in the python runtime (when compiled in debug mode).

TODO:

  • Unit tests and/or doctests in docstrings
  • Tests pass locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • Docs build locally
  • GitHub Actions CI passes
  • Test coverage to 100% (Codecov passes)

None and 0 are treated like a 0 length string when computing lengths, and the same normalization should be applied to the value passed to PyBytes_AS_STRING. If this is not done, an assertion is hit in the python runtime (when compiled in debug mode).
@@ -250,7 +250,10 @@ class VLenBytes(Codec):
l = lengths[i]
store_le32(<uint8_t*>data, l)
data += 4
encv = PyBytes_AS_STRING(values[i])
b = values[i]
if b is None or b == 0: # treat these as missing value, normalize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like on line 231? It seems like l = 0 in this case, there is nothing to copy, and these operations are wasted. I might be wrong...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't pass a bytes object, you will hit this assert when calling PyBytes_AS_STRING: https://github.com/python/cpython/blob/40fff90ae3d46843bb9d27c6a53ef61c861a3bb4/Include/cpython/bytesobject.h#L21

it would be equivalent to skip the PyBytes_AS_STRING and memcpy entirely if l = 0. Shall I update the PR to do that instead?

Copy link

codecov bot commented Aug 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.91%. Comparing base (a8f6efb) to head (d3a8fc0).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #570   +/-   ##
=======================================
  Coverage   99.91%   99.91%           
=======================================
  Files          59       59           
  Lines        2334     2334           
=======================================
  Hits         2332     2332           
  Misses          2        2           

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants