-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pd.ExcelFile closes stream on destruction #32544
BUG: pd.ExcelFile closes stream on destruction #32544
Conversation
506fe6e
to
086c102
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's some disagreement over https://github.com/pandas-dev/pandas/pull/32544/files#r389428225. Can you resolve that?
Keep in mind that if we're backporting this to 1.0.2 we want the changes to be as minimal as necessary.
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -302,6 +302,7 @@ I/O | |||
timestamps with ``version="2.0"`` (:issue:`31652`). | |||
- Bug in :meth:`read_csv` was raising `TypeError` when `sep=None` was used in combination with `comment` keyword (:issue:`31396`) | |||
- Bug in :class:`HDFStore` that caused it to set to ``int64`` the dtype of a ``datetime64`` column when reading a DataFrame in Python 3 from fixed format written in Python 2 (:issue:`31750`) | |||
- Bug in :class:`ExcelFile` where the stream passed into the function was closed by the destructor. (:issue:`31467`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a regression? If so, it should be in 1.0.2.rst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to v1.0.2.rst
Regarding the disagreement: undoing the change makes the test fail on the file leak check.
The only way this test will not leak a file, is by undoing this fix. So either we keep this bug as is and leave the test unchanged, or we fix the bug and fix the broken test.
Please note that it is the test code itself that is leaking the file descriptor, not the ExcelFile
class
15eb236
to
6ae4029
Compare
str_path = os.path.join("test1" + read_ext) | ||
with open(str_path, "rb") as f: | ||
x = pd.read_excel(f, "Sheet1", index_col=0) | ||
del x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to construct this test without the del
call? This test might not be doing anything, since del
doesn't really mean __del__
gets called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__del__
will get called in practice when the last reference is del
eted. This is needed to trigger the erroneous behavior.
This is literally the code in the OP's issue. (except I replaced ExcelFile
with read_excel
, it seems)
@@ -114,7 +114,7 @@ def test_to_excel_with_openpyxl_engine(ext, tmpdir): | |||
df2 = DataFrame({"B": np.linspace(1, 20, 10)}) | |||
df = pd.concat([df1, df2], axis=1) | |||
styled = df.style.applymap( | |||
lambda val: "color: %s" % ("red" if val < 0 else "black") | |||
lambda val: "color: %s" % "red" if val < 0 else "black" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this related to the rest of the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the fix @roberthdevries
wb._archive.close() | ||
|
||
if hasattr(self.io, "close"): | ||
self.io.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentionally removed? Wondering if there is any engine that needs this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bug fix
You've got a conflict, if you can merge master and fix it please. |
…destruction Regression test added Bonus: refactor closing of openpyxl file into openpyxl class instead of anti-pattern check in ExcelFile.close() for openpyxl engine
This reverts commit 002f1e0.
6ae4029
to
7609675
Compare
Rebased to master |
…lFile-closes-stream-on-destruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged master to fix CI and the merge conflict. Should be good to go.
Thanks @roberthdevries. |
…32657) Co-authored-by: Robert de Vries <[email protected]>
* FIX: pandas.ExcelFile should not close stream passed as parameter on destruction Regression test added
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff