-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-31860 Test suite for the Parquet plugin #18980
base: candidate-9.8.x
Are you sure you want to change the base?
HPCC-31860 Test suite for the Parquet plugin #18980
Conversation
905c5b2
to
d148f19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilhan2316 I took a quick first look and had some questions and comments.
There were a few repeated style issues that I only pointed out a single example of, but make sure to check over all the files for things like ending newlines, correct copyright headers, etc.
Also, don't forget to remove any changes not associated with this PR.
Back to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilhan2316 I think it is looking good. Don't be put off by the number of comments. Feel free to ask any questions that may come up.
Once you address these changes please take a look through the rest of the files to see if my comments apply in any other places. Once you are done I will look through the rest of the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilhan2316 A few more comments, but I think it is close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilhan2316 I just took a look at your ECL files and have a few comments.
|
||
outputDataset := ParquetIO.Read(recordLayout, filePath); | ||
|
||
compareDatasets := IF(COUNT(importedDataset) = COUNT(outputDataset), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment still stands.
############################################################################## */ | ||
|
||
//class=parquet | ||
//Cover's data type's supported by ECL and arrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment still stands.
@ilhan2316 This will need to be rebased to the latest version of 9.8.x to pull in the changes from this JIRA. It contains changes to data type sizes and could cause some of your key files to change. |
Signed-off-by: Ilhan Gelle <[email protected]>
Signed-off-by: Ilhan Gelle <[email protected]>
Signed-off-by: Ilhan Gelle <[email protected]>
Signed-off-by: Ilhan Gelle <[email protected]>
f50a03e
to
92a925a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilhan2316 Don't be put off by the number of comments it is getting close. Feel free to leave questions on GitHub and I can add examples and clarification.
Some files used in your test cases are missing from the PR as well as files that are included but not referenced in your ECL. I have included the list below. Also, make sure you check back to the PR to see the results of the smoketests. Some of the test cases are failing.
Files I didn't see a reference to in the ECL:
diverse.parquet
edgecase1.parquet
intial.parquet
integertest.parquet
medium.parquet
small.parquet
time3.parquet
updated.parquet
Files missing that were referenced:
single.parquet
multi_1_of_3.parquet
multi_2_of_3.parquet
multi_3_of_3.parquet
Missing key files:
parquet_types.xml
parquet_schema.xml
############################################################################## */ | ||
|
||
//class=parquet | ||
//fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still marked fail. Try running it again as the fix should be in the latest version of 9.8.x.
ParquetIO.HivePartition.Write( | ||
smallData, | ||
rowSize, // Number of rows per file | ||
'/var/lib/HPCCSystems/mydropzone/hive_partitioned/', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Std.File.GetDefaultDropZone() + '/regress/parquet/hive_partitioned/' to avoid cluttering up the dropzone root directory and assuming the platform is installed in root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this file, I have fixed it in terms of the partitioning I believe, but this is the current error that I am still trying to figure out : assert(!"Unknown copy source type") failed - file: hqlcpp.cpp, line 11772
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't look like an issue with the plugin. Perhaps your ECL is still incorrect. I noted a few issues in my comments.
ParquetIO.DirectoryPartition.Write( | ||
smallData, // Data to write | ||
rowSize, // Number of rows per file | ||
'/var/lib/HPCCSystems/mydropzone/dir_partitioned/', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as line 43.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilhan2316 Before I look at the changes could you please remove the files that shouldn't be there. You can refer to my earlier comment for the list of files that should be removed.
@ilhan2316 it looks like there are many of my comments that are still unresolved. Make sure you are going through the "Files Changed" section and opening each file to view my comments. Also, it helps me if you respond to the comments with what you changed, if you agree or disagree, or what your thoughts are rather than just resolving the thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilhan2316 I know this is a lot of comments, but I think they are overall easier to fix than they have previously been. If you want me to provide some ECL for these tests let me know and I can help you. Don't hesitate to ask questions either!
basePath := Std.File.GetDefaultDropZone() + 'regress/parquet/'; | ||
|
||
// Define partition keys as string arrays | ||
hivePartitionKey := ['city']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could easily be the cause of your error. The FUNCTIONMACROs for Write expect a string and you are passing an array. The way to pass in multiple keys is like: 'year;month;day'
rowSize := 1024; // Increased buffer size | ||
|
||
// Define base path | ||
basePath := Std.File.GetDefaultDropZone() + 'regress/parquet/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be '/regress/parquet/'.
SEQUENTIAL( | ||
OUTPUT(singleDataset, NAMED('singleDataset')), // Output for the single file | ||
OUTPUT(multiDataset, NAMED('multiDataset')) // Output for the combined multi-part files | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline character at the end of the file.
|
||
setOfIntegerResult := IF(COUNT(setOfIntegerCompareResult(isEqual = FALSE)) = 0, 'Pass', 'Fail: SET OF INTEGER data mismatch'); | ||
|
||
// ======================== REAL8 ======================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than create a new section for each type variation you should just add an additional column to the existing REAL dataset on line 92.
|
||
setOfUnicodeResult := IF(COUNT(setOfUnicodeCompareResult(isEqual = FALSE)) = 0, 'Pass', 'Fail: SET OF UNICODE data mismatch'); | ||
|
||
// ======================== INTEGER8 ======================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. Add repeating types to the first section of that type as an additional field. See the DATA section as an example on line 173.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid making too many comments I haven't explicitly marked each occurrence, but there were a lot of the same cases that could be commoned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test can be deleted. The write functionality is thoroughly tested by the other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file can be deleted and the functionality tested in parquetTypes.ecl (See my comment there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I agree with you on this.
ParquetIO.HivePartition.Write( | ||
smallData, | ||
rowSize, // Number of rows per file | ||
'/var/lib/HPCCSystems/mydropzone/hive_partitioned/', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't look like an issue with the plugin. Perhaps your ECL is still incorrect. I noted a few issues in my comments.
Type of change:
Checklist:
Smoketest:
Testing: