-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1 streaming read test #652
Conversation
@christianknoepfle is there a reason not to use the v1 data source? It feels like the v2 data source needs a lot of work to fix the streaming support - the v2 code is just far too eager to shut all the underlying resources before they are even iterated over. |
v2 gives me folder and partitioning and works in the same way as csv/json etc. it would allow us to simplify our code base. |
|
||
class MaxNumRowsSuite extends AnyWordSpec with DataFrameSuiteBase with Matchers { | ||
|
||
"excel v2 and maxNumRows" can { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be excel v1 ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - fixed
@pjfanning the integration tests actually do test streaming: |
@nightscape so the related PRs (#651, #653) try to fix the v2 data source streaming and the existing tests didn't catch the issue. This PR uses the same xlsx that causes the v2 data source problem just for completeness. PRs #651, #653 need this new 6.7Mb file anyway. Do you think 6.7Mb is going to be an issue? |
It would be interesting to see if the integration tests would have caught the issue in v2. val reader = spark.read.excel(dataAddress = s"'$sheetName'!A1", header = header) to val reader = spark.read.format("excel").option("dataAddress", s"'$sheetName'!A1").option("header", header) here and see if it catches the issue as well? |
Generally I don't think the 6.7Mb themselves are going to hurt, but they go in a different direction than I had intended the project to go: I fear that if we add Excel files then this will become the standard to test bugs (as it already seems to happen), one has to take extra care of applying it to all combinations of options and it doesn't take the writing side into account. |
I'll close this because it is part of #653 anyway |
Test is based on @christianknoepfle work in in #651 - I just feel it useful to test that the v1 data source handles streaming ok.