-
I am trying to check the speed up of SQL queries on a 48 GB CSV file. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Hi @SoumyaB57, can you provide more detailed information about your job ? Like the SQL queries you ran. |
Beta Was this translation helpful? Give feedback.
-
Sure. |
Beta Was this translation helpful? Give feedback.
-
I have found the issue |
Beta Was this translation helpful? Give feedback.
I have found the issue
In my Java Code I am reading the CSV file with the following code
Dataset<Row> df = sqlContext.read() .format("com.databricks.spark.csv") .option("inferSchema", "true") .option("header", "true") .load("countries.csv");
But it had a problem as it has some undefined behavior and it is scanning CSV file with CPU in some queries and that was very slow compared to GPU
Then I have changed the mode of reading the CSV file and it worked
Dataset<Row> df = spark.read().format("csv").option("header","true").load("countries.csv");