Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.ExcelFile closes stream on destruction #32544

Conversation

roberthdevries
Copy link
Contributor

@roberthdevries roberthdevries changed the title Fix 31467 pd.excel file closes stream on destruction Fix #31467: pd.ExcelFile closes stream on destruction Mar 8, 2020
@roberthdevries roberthdevries force-pushed the fix-31467-pd.ExcelFile-closes-stream-on-destruction branch from 506fe6e to 086c102 Compare March 8, 2020 22:11
pandas/tests/io/excel/test_readers.py Show resolved Hide resolved
pandas/tests/io/excel/test_openpyxl.py Outdated Show resolved Hide resolved
@WillAyd WillAyd added the IO Excel read_excel, to_excel label Mar 9, 2020
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's some disagreement over https://github.com/pandas-dev/pandas/pull/32544/files#r389428225. Can you resolve that?

Keep in mind that if we're backporting this to 1.0.2 we want the changes to be as minimal as necessary.

@@ -302,6 +302,7 @@ I/O
timestamps with ``version="2.0"`` (:issue:`31652`).
- Bug in :meth:`read_csv` was raising `TypeError` when `sep=None` was used in combination with `comment` keyword (:issue:`31396`)
- Bug in :class:`HDFStore` that caused it to set to ``int64`` the dtype of a ``datetime64`` column when reading a DataFrame in Python 3 from fixed format written in Python 2 (:issue:`31750`)
- Bug in :class:`ExcelFile` where the stream passed into the function was closed by the destructor. (:issue:`31467`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a regression? If so, it should be in 1.0.2.rst.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to v1.0.2.rst
Regarding the disagreement: undoing the change makes the test fail on the file leak check.
The only way this test will not leak a file, is by undoing this fix. So either we keep this bug as is and leave the test unchanged, or we fix the bug and fix the broken test.
Please note that it is the test code itself that is leaking the file descriptor, not the ExcelFile class

pandas/io/excel/_base.py Outdated Show resolved Hide resolved
pandas/tests/io/excel/test_openpyxl.py Outdated Show resolved Hide resolved
@roberthdevries roberthdevries force-pushed the fix-31467-pd.ExcelFile-closes-stream-on-destruction branch 2 times, most recently from 15eb236 to 6ae4029 Compare March 10, 2020 19:26
str_path = os.path.join("test1" + read_ext)
with open(str_path, "rb") as f:
x = pd.read_excel(f, "Sheet1", index_col=0)
del x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to construct this test without the del call? This test might not be doing anything, since del doesn't really mean __del__ gets called

Copy link
Contributor Author

@roberthdevries roberthdevries Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__del__ will get called in practice when the last reference is deleted. This is needed to trigger the erroneous behavior.
This is literally the code in the OP's issue. (except I replaced ExcelFile with read_excel, it seems)

pandas/tests/io/excel/test_readers.py Show resolved Hide resolved
@@ -114,7 +114,7 @@ def test_to_excel_with_openpyxl_engine(ext, tmpdir):
df2 = DataFrame({"B": np.linspace(1, 20, 10)})
df = pd.concat([df1, df2], axis=1)
styled = df.style.applymap(
lambda val: "color: %s" % ("red" if val < 0 else "black")
lambda val: "color: %s" % "red" if val < 0 else "black"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related to the rest of the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@jreback jreback added this to the 1.0.2 milestone Mar 11, 2020
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix @roberthdevries

pandas/io/excel/_base.py Show resolved Hide resolved
wb._archive.close()

if hasattr(self.io, "close"):
self.io.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally removed? Wondering if there is any engine that needs this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bug fix

@datapythonista datapythonista changed the title Fix #31467: pd.ExcelFile closes stream on destruction BUG: pd.ExcelFile closes stream on destruction Mar 11, 2020
@datapythonista
Copy link
Member

You've got a conflict, if you can merge master and fix it please.

@roberthdevries roberthdevries force-pushed the fix-31467-pd.ExcelFile-closes-stream-on-destruction branch from 6ae4029 to 7609675 Compare March 11, 2020 14:39
@roberthdevries
Copy link
Contributor Author

Rebased to master

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged master to fix CI and the merge conflict. Should be good to go.

@TomAugspurger TomAugspurger merged commit 9c85af8 into pandas-dev:master Mar 12, 2020
@TomAugspurger
Copy link
Contributor

Thanks @roberthdevries.

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 12, 2020
TomAugspurger pushed a commit that referenced this pull request Mar 12, 2020
@roberthdevries roberthdevries deleted the fix-31467-pd.ExcelFile-closes-stream-on-destruction branch March 12, 2020 21:35
SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
* FIX: pandas.ExcelFile should not close stream passed as parameter on destruction

Regression test added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.ExcelFile closes stream on destruction in pandas 1.0.0
6 participants