-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#228] Skeleton code to implement unified tabular processing #229
base: main
Are you sure you want to change the base?
Conversation
} | ||
|
||
@Override | ||
public Set<Row> getRows(TableSchema tableSchema,String sourceResourceUri, Mode outputMode){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not make sense to refactor to separate method of TabularReader !!!
Reason:
There is some nontrivial logic that references specification. The same logic will be then duplicated in:
ExcelReader.getRows()
HtmlReader.getRows()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reverted this change, now this is done in TabularModule, as it was before. It is done after getting Row Statements in order to be possible to get the number of rows without creating new list reader.
String[] header = listReader.getHeader(true); // skip the header (can't be used with CsvListReader) | ||
TabularReader tabularReader = new CSVReader(listReader); | ||
|
||
List<String> header = tabularReader.getHeader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if we have test support for skipHeader in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you know if you need to read header or not so why reading the header here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Header is readed here to check if the input resource is empty.
for (String columnTitle : header) { | ||
String columnName = normalize(columnTitle); | ||
boolean isDuplicate = !columnNames.add(columnName); | ||
outputColumns = tabularReader.getOutputColumns(header); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put it into separate method TabularModule.setOutputColumns(tableSchema, header) and then access outputColumns using something like tableSchema.getColumns()
...
the research question is whether it should not be rather tableSchema.setOuputColums(header)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going over code it seems to me that it makes sense to have tableSchema.setOuputColums(header)
but not sure about name ... maybe addOuptutColumns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatch out line tableSchema.adjustProperties(hasInputSchema, outputColumns, sourceResource.getUri());
... it was not set to the tableSchema before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is done in tableSchema, there are methods tableSchema.setOutputColumns(header,sourceResource.getUri(),dataPrefix,hasInputSchema);
and tableSchema.getOutputColumns();
@blcham Please have a look at the current implementation of the #228 .
while the expected output is:
This can be fixed with a control |
|
@blcham Now, in the current implementation in this PR, empty cells are ignored in the same way both in excel and html files, like here:
|
Yes, it makes sense like that, and for the first non-header row, we should generate:
|
rowNumber++; | ||
|
||
for (int i = 0; i < header.size(); i++) { | ||
// 4.6.8.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we are mixing logic with concrete reader, aren't we? we can discuss.
@@ -73,6 +75,54 @@ public void setValueUrl(String valueUrl) { | |||
tabularModuleUtils.setVariable(this.valueUrl, valueUrl, value -> this.valueUrl = value, "valueUrl"); | |||
} | |||
|
|||
private transient List<Column> outputColumns; | |||
|
|||
public void setOutputColumns(List<String> header,String sourceResourceUri,String dataPrefix,boolean hasInputSchema) throws UnsupportedEncodingException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe here what this method does, we will put it as documentation for the method for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we have to change completely the file, i cannot really see the differences between previous version and the new one.
You could just remove some values from the previous version and it would show the history in a nice way right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do we have an option to use concrete URL instead of blank nodes?
There is only way to continue on this. Make small PRs that can be merged immediately without breaking existing implementation. It should be easy to review (at most 15 mins for me). |
f7f7266
to
a308b3a
Compare
Not complete -- tests are failing here.
…m TabularReader because it doesn't require reading the actual file.
a308b3a
to
9730de4
Compare
Implements #228