-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-530] Xlsx duplicate header error fix #288
base: develop
Are you sure you want to change the base?
[SVCS-530] Xlsx duplicate header error fix #288
Conversation
The tabular renderer will no longer overwrite the values for headers that have the same name. Instead it will rename all duplicated headers in the format `name (1)`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to our discussion, check the style as well.
iteration = 0 | ||
while increased_name in fields: | ||
iteration += 1 | ||
if iteration > 5000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set iteration cap as a default argument and use a lower number for testing.
assert sheet[1][0] == {'Name': 1.0, 'Dup (1)': 2.0, 'Dup (2)': 3.0, | ||
'Dup (3)': 4.0, 'Dup (4)': 5.0, 'Not Dup': 6.0} | ||
|
||
# After demo it was suggested the iteration cap be raised. The value ended up to be about 5,000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested above, use a default arg for iterations
and set it lower. You can then use this for this test instead of having to make a file to iterate 5000 times.
…ular-file-renderer into feature/xlsx-duplicate-column-names-fix
@cslzchen , added max_iterations variable for testing. Re-enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and move to PCR 🎆 🎆
|
||
|
||
def xlsx_xlrd(fp): | ||
"""Read and convert a xlsx file to JSON format using the xlrd library | ||
def xlsx_xlrd(fp, max_iterations=5000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h/t @AddisonSchiller, PR looks good. I will take over and rebase it up-to-date.
Ticket
https://openscience.atlassian.net/browse/SVCS-530
Purpose
The xlsx renderer tool for the tabular renderer was overwriting column values if there were duplicate names in the header names.
Changes
The tabular renderer will no longer overwrite the values
for headers that have the same name. Instead it will rename
all duplicated headers in the format
name (1)
There is a very unlikely case where, if after searching for a name for 5000 iterations, it will use a UUID instead of a count.
Added some tests (One is commented out because of how hard it is to test)
Side effects
None that I know of
QA Notes
There is a zip file on the JIRA ticket with files to test with.