-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detect breaking changes #394
base: main
Are you sure you want to change the base?
Conversation
The CI currently reports 58 breaking changes between 0.5.1 and 976c0ea. (I'm trying to understand in which PR these breaking changes were introduced) When test_api.py was first written, it detected just 1 breaking change related to the backward incompatible Daft change: Example stack trace:
|
Which of these breaking change types should we include in our unit test? I've currently removed ATTRIBUTE_CHANGED_VALUE type because it was creating a lot of noise in the output. Example:
|
|
Thanks for setting this up @syun64. This looks great. I think we can just give it a try after the 0.6.0 release and see how noisy it is. |
Bumps [getdaft](https://github.com/Eventual-Inc/Daft) from 0.2.14 to 0.2.15. - [Release notes](https://github.com/Eventual-Inc/Daft/releases) - [Commits](Eventual-Inc/Daft@v0.2.14...v0.2.15) --- updated-dependencies: - dependency-name: getdaft dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [moto](https://github.com/getmoto/moto) from 5.0.1 to 5.0.2. - [Release notes](https://github.com/getmoto/moto/releases) - [Changelog](https://github.com/getmoto/moto/blob/master/CHANGELOG.md) - [Commits](getmoto/moto@5.0.1...5.0.2) --- updated-dependencies: - dependency-name: moto dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.9 to 9.5.10. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.5.9...9.5.10) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Make the snapshot creation part of the `Transaction` This is also how it is done in Java, and I really like it since it allows you to easily queue up updates in a transaction. For example, an update to the schema. * Extend the API
@syun64 I was on a merging spree, can you rebase once more? 😓 |
…credentials/remote signing (apache#436) * Send X-Iceberg-Access-Delegation header to signal support for vended credentials/remote signing Clients can optionally send this header to signal which delegated access pattern it can support. At this point the iceberg-python client can support `vended-credentials` and `remote-signing`, thus we can always send this header. Addtional details about this header can be found in the REST OpenAPI spec: https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L1459-L1483 * Update rest.py
* Refresh Auth token on expiry * Check call count * Add test to cover retry logic * Update poetry.lock with tenacity * Fix tests for Python <= 3.9
* Reuse commit-uuid as the write-uuid * Fix conflicts * Cleanup * cleanup
* update name-mapping * Update __init__.py Co-authored-by: Fokko Driesprong <[email protected]> * Update pyiceberg/table/name_mapping.py Co-authored-by: Fokko Driesprong <[email protected]> * validation mode after * type --------- Co-authored-by: Fokko Driesprong <[email protected]>
* Feat: Add fail_if_exists param to create_table * create create_table_if_not_exists method * fix reset test * fix mypy check
Looks like we broke something already 😸 Can we make a list to allow breaking changes? Similar to https://github.com/apache/parquet-mr/blob/d8396086b3e3fefc6829f8640917c3bbde0fa9c4/pom.xml#L581-L606 |
Sure! Thank you for the suggestion. I took a stab at implementing this with a YAML file - let me know what you think! |
exclude: | ||
- obj_path: pyiceberg.avro.decoder_fast.CythonBinaryDecoder | ||
kind: CLASS_REMOVED_BASE | ||
- obj_path: pyiceberg.table.create_mapping_from_schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems correct, but not exactly what I expected: fd9dc88#diff-23e8153e0fd497a9212215bd2067068f3b56fa071770c7ef326db3d3d03cee9bL89
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's interesting that it treats imports from other modules to be public objects as well
# | ||
|
||
# The format of this file is documented at | ||
# https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
kind: OBJECT_REMOVED | ||
- obj_path: pyiceberg.avro.decoder_fast.BinaryDecoder | ||
kind: OBJECT_REMOVED | ||
- obj_path: pyiceberg.avro.decoder_fast.CythonBinaryDecoder.tell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that the Cython stuff didn't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting pretty interesting!
As you mentioned, from the diff it looks like CythonBinaryDecoder wasn't updated, but when I run help() on pyiceberg.avro.decoder_fast.CythonBinaryDecoder, I'm loading:
class CythonBinaryDecoder(builtins.object)
| Implement a BinaryDecoder that reads from an in-memory buffer.
|
| Methods defined here:
|
| __reduce__ = __reduce_cython__(...)
|
| __setstate__ = __setstate_cython__(...)
|
| read(self, n: 'int')
| Read n bytes.
|
| read_boolean(self) -> 'bool'
| Reads a value from the stream as a boolean.
|
| A boolean is written as a single byte
| whose value is either 0 (false) or 1 (true).
|
| read_bytes(self)
| Bytes are encoded as a long followed by that many bytes of data.
|
| read_double(self)
| Reads a value from the stream as a double.
|
| A double is written as 8 bytes.
| The double is converted into a 64-bit integer using a method equivalent to
| Java's doubleToLongBits and then encoded in little-endian format.
|
| read_float(self)
| Reads a value from the stream as a float.
|
| A float is written as 4 bytes.
| The float is converted into a 32-bit integer using a method equivalent to
| Java's floatToIntBits and then encoded in little-endian format.
...
Which might explain why methods like read_double
, read_float
are being reported to having their types changed, whereas read_boolean
is being reported to having maintained the same interface.
Maybe there was a change that made Griffe take this definition into account, and it wasn't prior to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, so it looks like this false positive is being generated because we are using two separate approaches for loading the Package in griffe. We use load_git
to load pyiceberg's latest release directly from Git, and then we use load
to load the local implementation of pyiceberg with the coder's current change. Tracking the outputs of CythonBinaryDecoder for the two methods yields different interpretations of the Cython class's interface:
>>> current_git = griffe.load_git("pyiceberg")
>>> current_git['avro']['decoder_fast']['CythonBinaryDecoder'].relative_filepath
PosixPath('/tmp/griffe-worktree-iceberg-python-HEAD-v1wuhe7z/griffe_HEAD/pyiceberg/avro/decoder_fast.pyi')
>>> current_git.all_members['avro']['decoder_fast']['CythonBinaryDecoder'].all_members
{'__init__': Function('__init__', 21, 22), 'tell': Function('tell', 24, 25), 'read': Function('read', 27, 28), 'read_boolean': Function('read_boolean', 30, 31), 'read_int': Function('read_int', 33, 34), 'read_ints': Function('read_ints', 36, 37), 'read_int_bytes_dict': Function('read_int_bytes_dict', 39, 40), 'read_bytes': Function('read_bytes', 42, 43), 'read_float': Function('read_float', 45, 46), 'read_double': Function('read_double', 48, 49), 'read_utf8': Function('read_utf8', 51, 52), 'skip': Function('skip', 54, 55), 'skip_int': Function('skip_int', 57, 58), 'skip_boolean': Function('skip_boolean', 60, 61), 'skip_float': Function('skip_float', 63, 64), 'skip_double': Function('skip_double', 66, 67), 'skip_bytes': Function('skip_bytes', 69, 70), 'skip_utf8': Function('skip_utf8', 72, 73)}
>>> current_local = griffe.load("pyiceberg")
>>> current_local['avro']['decoder_fast']['CythonBinaryDecoder'].relative_filepath
PosixPath('pyiceberg/avro/decoder_fast.cpython-38-x86_64-linux-gnu.so')
>>> current_local.all_members['avro']['decoder_fast']['CythonBinaryDecoder'].all_members
{'__doc__': Attribute('__doc__', None, None), '__new__': Function('__new__', None, None), '__pyx_vtable__': Attribute('__pyx_vtable__', None, None), '__reduce__': Function('__reduce__', None, None), '__setstate__': Function('__setstate__', None, None), 'read': Function('read', None, None), 'read_boolean': Function('read_boolean', None, None), 'read_bytes': Function('read_bytes', None, None), 'read_double': Function('read_double', None, None), 'read_float': Function('read_float', None, None), 'read_int': Function('read_int', None, None), 'read_int_bytes_dict': Function('read_int_bytes_dict', None, None), 'read_ints': Function('read_ints', None, None), 'read_utf8': Function('read_utf8', None, None), 'skip': Function('skip', None, None), 'skip_boolean': Function('skip_boolean', None, None), 'skip_bytes': Function('skip_bytes', None, None), 'skip_double': Function('skip_double', None, None), 'skip_float': Function('skip_float', None, None), 'skip_int': Function('skip_int', None, None), 'skip_utf8': Function('skip_utf8', None, None), 'tell': Function('tell', None, None)}
So it looks like only the local load
is reading the .so, whereas load_git is reading the pyi file.
Maybe we exclude cython APIs from the test while we figure out how to resolve this issue? @Fokko
Hey @syun64 Thanks for adding the yaml, that looks neat. What's your gist of Griffe? It looks like there are already some false positives. |
Thank you for all the feedback @Fokko ! My general impression with griffe is that it's working pretty well, and after carefully reviewing and adding even more breaking changes we've introduced since the last release, I feel convinced that we should include some implementation of this tool. There are some edge cases that I thought were worth summarizing:
Outside of these two edge cases, I think all the exclusions listed are very accurate, and I think the test will require the contributors to think twice about making a breaking change, or to document an intended breaking change into the yaml file so we developers are able to trace down breaking changes a lot easier. Current pyiceberg-0.6.0 exclusion list:
|
@jaychia 'suggestion -> reorganize modules to the top level |
Implement #334