Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVCS-530] Xlsx duplicate header error fix #288

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

AddisonSchiller
Copy link
Contributor

Ticket

https://openscience.atlassian.net/browse/SVCS-530

Purpose

The xlsx renderer tool for the tabular renderer was overwriting column values if there were duplicate names in the header names.

Changes

The tabular renderer will no longer overwrite the values
for headers that have the same name. Instead it will rename
all duplicated headers in the format name (1)
There is a very unlikely case where, if after searching for a name for 5000 iterations, it will use a UUID instead of a count.

Added some tests (One is commented out because of how hard it is to test)

Side effects

None that I know of

QA Notes

There is a zip file on the JIRA ticket with files to test with.

The tabular renderer will no longer overwrite the values
for headers that have the same name. Instead it will rename
all duplicated headers in the format `name (1)`
Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to our discussion, check the style as well.

iteration = 0
while increased_name in fields:
iteration += 1
if iteration > 5000:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set iteration cap as a default argument and use a lower number for testing.

assert sheet[1][0] == {'Name': 1.0, 'Dup (1)': 2.0, 'Dup (2)': 3.0,
'Dup (3)': 4.0, 'Dup (4)': 5.0, 'Not Dup': 6.0}

# After demo it was suggested the iteration cap be raised. The value ended up to be about 5,000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested above, use a default arg for iterations and set it lower. You can then use this for this test instead of having to make a file to iterate 5000 times.

@coveralls
Copy link

coveralls commented Nov 16, 2017

Coverage Status

Coverage increased (+0.3%) to 68.28% when pulling 6ae4062 on AddisonSchiller:feature/xlsx-duplicate-column-names-fix into 8bb2dd4 on CenterForOpenScience:develop.

@AddisonSchiller
Copy link
Contributor Author

@cslzchen , added max_iterations variable for testing. Re-enabled uuid test. Also a few minor style changes (renaming vars etc)

@coveralls
Copy link

coveralls commented Nov 16, 2017

Coverage Status

Coverage increased (+0.3%) to 68.336% when pulling e4fec47 on AddisonSchiller:feature/xlsx-duplicate-column-names-fix into 8bb2dd4 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 16, 2017

Coverage Status

Coverage increased (+0.3%) to 68.318% when pulling e4fec47 on AddisonSchiller:feature/xlsx-duplicate-column-names-fix into 8bb2dd4 on CenterForOpenScience:develop.

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and move to PCR 🎆 🎆



def xlsx_xlrd(fp):
"""Read and convert a xlsx file to JSON format using the xlrd library
def xlsx_xlrd(fp, max_iterations=5000):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h/t @AddisonSchiller, PR looks good. I will take over and rebase it up-to-date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants