Skip to content

Examples: Handle Empty Column Names

Quang Hoang Xuan edited this page Sep 16, 2021 · 3 revisions

Purpose: An simple example with excel file, that data does not started from the first row, and some column names are missing

Credit: to @datadu-de for sharing this example

Excel file

You can follow the linked issue #366 or access the sample file from: df_empty_headers_not_working.xlsx

For this excel file, there are some columns without header and needed data is not from the first row, as we can see in this screenshot: Excel file screenshot, from @datadu-de

To load this excel file, just need to set the dataAddress from cell A8, Spark Excel will generate unique names for missing column headers

df = spark.read.format("excel") \
   .option("header", True) \
   .option("inferSchema", False) \
   .option("dataAddress", "A8") \
   .load(f"/<change to your local path>/df_empty_headers_not_working.xlsx")

Now, we can check our laded data frame and its schema

df.show(10)
df.printSchema()
df.count()

References

#366 empty column names blocks reading columns