The spark driver has stopped unexpectedly and is restarting AFTER using excel lib #689

marcoaafernandes · 2022-12-01T14:32:56Z

marcoaafernandes
Dec 1, 2022

In a part of my code I read several tabs of a spreadsheet and it started to fill up the driver node memory of my cluster. For what reason does this happen?

When I read excel the data is not distributed in the worker nodes? Why when I use maxRowsInMemory = 20 the problem is solved? Does it help to read the data on demand?

Answered by nightscape

Dec 1, 2022

The maxRowsInMemory uses a streaming reader.
The v1 version (the one you're using if you do a .format("com.crealytics.spark.excel")) actually reads all rows into memory on the driver and only then calls parallelize to distribute the data to workers.
The v2 version (.format("excel")) reads directly on the workers.

View full answer

nightscape · 2022-12-01T16:33:11Z

nightscape
Dec 1, 2022
Maintainer

The maxRowsInMemory uses a streaming reader.
The v1 version (the one you're using if you do a .format("com.crealytics.spark.excel")) actually reads all rows into memory on the driver and only then calls parallelize to distribute the data to workers.
The v2 version (.format("excel")) reads directly on the workers.

1 reply

marcoaafernandes Dec 1, 2022
Author

Can you point me to some document about streaming reader?

Does the parallelize method exist in pyspark? how to use it?

nightscape · 2022-12-02T00:50:06Z

nightscape
Dec 2, 2022
Maintainer

Spark-excel uses the excel-streaming-reader library to read data without having all data in memory.
The parallelize method is from Spark. You can pass a collection of objects to it and it will put them into an RDD.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The spark driver has stopped unexpectedly and is restarting AFTER using excel lib #689

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

The spark driver has stopped unexpectedly and is restarting AFTER using excel lib #689

marcoaafernandes Dec 1, 2022

Replies: 2 comments · 1 reply

nightscape Dec 1, 2022 Maintainer

marcoaafernandes Dec 1, 2022 Author

nightscape Dec 2, 2022 Maintainer

marcoaafernandes
Dec 1, 2022

Replies: 2 comments 1 reply

nightscape
Dec 1, 2022
Maintainer

marcoaafernandes Dec 1, 2022
Author

nightscape
Dec 2, 2022
Maintainer