-
Notifications
You must be signed in to change notification settings - Fork 146
Examples: Handle Empty Column Names
Quang Hoang Xuan edited this page Sep 16, 2021
·
3 revisions
Purpose: An simple example with excel file, that data does not started from the first row, and some column names are missing
Credit: to @datadu-de for sharing this example
You can follow the linked issue #366 or access the sample file from: df_empty_headers_not_working.xlsx
For this excel file, there are some columns without header and needed data is not from the first row, as we can see in this screenshot:
To load this excel file, just need to set the dataAddress from cell A8, Spark Excel will generate unique names for missing column headers
df = spark.read.format("excel") \
.option("header", True) \
.option("inferSchema", False) \
.option("dataAddress", "A8") \
.load(f"/<change to your local path>/df_empty_headers_not_working.xlsx")
Now, we can check our laded data frame and its schema
df.show(10)
df.printSchema()
df.count()